Inconsistent tokenization and BLEU scores between AutoTokinzer and NllbTokenizerFast

### System Info

### System Info
- `transformers` version: 5.0.0
- Platform: macOS-26.3.1-arm64-arm-64bit
- Python version: 3.10.19
- PyTorch version: 2.10.0

### Information
I've been evaluating `facebook/nllb-200-distilled-600M` across 36 different language pairs and ran into a significant discrepancy depending on which tokenizer class is instantiated. 

When using `NllbTokenizerFast` versus `AutoTokenizer`, the resulting BLEU scores are drastically different for the exact same generation parameters. 

For example:
* **`swe_Latn` -> `fra_Latn`**: Drops from ~43.35 BLEU (Fast) to ~9.02 BLEU (Auto).
* **`spa_Latn` -> `fra_Latn`**: Jumps from ~33.97 BLEU (Dast) to ~53.25 BLEU (Auto).

To understand the massive gap in BLEU scores, I inspected the raw token outputs. I noticed that `AutoTokenizer` completely ignores the `src_lang` argument and drops the routing prefix. 

However, when testing this on a second machine, both `AutoTokenizer` and `NllbTokenizerFast` produced the exact same output. After comparing the environments, I realized the only variable was the presence of the `sentencepiece` library:

* **With `sentencepiece` installed:** `AutoTokenizer` fails to prepend the `src_lang` token and appends an `<unk>` token at the end
* **Without `sentencepiece`:** `AutoTokenizer` and `NllbTokenizerFast` produce the same tokens
### BLEU Score Heatmaps

Here is the side-by-side comparison of the 36 language pairs.

| `NllbTokenizerFast` | `AutoTokenizer` |
| :---: | :---: |
| ![NllbTokenizer Heatmap](https://github.com/user-attachments/assets/d2e212f3-f8c0-4416-a0d0-ea21d9f15c5d) | ![AutoTokenizer Heatmap](https://github.com/user-attachments/assets/f4994f33-7d7c-4a42-981a-bd9acc8cf935) |

### Who can help?

@ArthurZucker @itazap 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```python
from transformers import AutoTokenizer, NllbTokenizerFast

model_name = "facebook/nllb-200-distilled-600M"

tokenizer_auto = AutoTokenizer.from_pretrained(model_name) 
tokenizer_fast = NllbTokenizerFast.from_pretrained(model_name)

sample_text = "i måndags meddelade forskare från stanford university school of medicine att man tagit fram ett nytt diagnostiskt verktyg..."

tokenizer_fast.src_lang = "swe_Latn"
tokenizer_auto.src_lang = "swe_Latn"

inputs_auto = tokenizer_auto(sample_text, src_lang="swe_Latn", return_tensors="pt")
inputs_fast = tokenizer_fast(sample_text, src_lang="swe_Latn", return_tensors="pt")


print("NllbTokenizerFast")
print("Input IDs:", inputs_fast['input_ids'][0].tolist())
print("Tokens:", tokenizer_fast.convert_ids_to_tokens(inputs_fast['input_ids'][0]))

print("\n AutoTokenizer")
print("Input IDs:", inputs_auto['input_ids'][0].tolist())
print("Tokens:", tokenizer_auto.convert_ids_to_tokens(inputs_auto['input_ids'][0]))
```



### Expected behavior

#### With `sentencepiece` installed

NllbTokenizerFast:
Input IDs: [256167, 30, 3471, 6486, 10056, 117348, 14909, 11507, 463, 13861, 11651, 13056, 181155, 26958, 452, 150992, 1763, 492, 207809, 9520, 3288, 55723, 30650, 5536, 25424, 138458, 5733, 2]
Tokens: ['swe_Latn', '▁i', '▁må', 'nda', 'gs', '▁medde', 'lade', '▁forsk', 'are', '▁från', '▁stan', 'ford', '▁university', '▁school', '▁of', '▁medicine', '▁att', '▁man', '▁tagit', '▁fram', '▁ett', '▁nytt', '▁diag', 'nost', 'iskt', '▁verkt', 'yg', '</s>']

AutoTokenizer:
Input IDs: [30, 3471, 6486, 10056, 117348, 14909, 11507, 463, 13861, 11651, 13056, 181155, 26958, 452, 150992, 1763, 492, 207809, 9520, 3288, 55723, 30650, 5536, 25424, 138458, 5733, 2, 3]
Tokens: ['▁i', '▁må', 'nda', 'gs', '▁medde', 'lade', '▁forsk', 'are', '▁från', '▁stan', 'ford', '▁university', '▁school', '▁of', '▁medicine', '▁att', '▁man', '▁tagit', '▁fram', '▁ett', '▁nytt', '▁diag', 'nost', 'iskt', '▁verkt', 'yg', '</s>', '<unk>']

#### Without `sentencepiece` installed (For me the expected results)

NllbTokenizerFast:
Input IDs: [256167, 30, 3471, 6486, 10056, 117348, 14909, 11507, 463, 13861, 11651, 13056, 181155, 26958, 452, 150992, 1763, 492, 207809, 9520, 3288, 55723, 30650, 5536, 25424, 138458, 5733, 2]
Tokens: ['swe_Latn', '▁i', '▁må', 'nda', 'gs', '▁medde', 'lade', '▁forsk', 'are', '▁från', '▁stan', 'ford', '▁university', '▁school', '▁of', '▁medicine', '▁att', '▁man', '▁tagit', '▁fram', '▁ett', '▁nytt', '▁diag', 'nost', 'iskt', '▁verkt', 'yg', '</s>']

AutoTokenizer:
Input IDs: [256167, 30, 3471, 6486, 10056, 117348, 14909, 11507, 463, 13861, 11651, 13056, 181155, 26958, 452, 150992, 1763, 492, 207809, 9520, 3288, 55723, 30650, 5536, 25424, 138458, 5733, 2]
Tokens: ['swe_Latn', '▁i', '▁må', 'nda', 'gs', '▁medde', 'lade', '▁forsk', 'are', '▁från', '▁stan', 'ford', '▁university', '▁school', '▁of', '▁medicine', '▁att', '▁man', '▁tagit', '▁fram', '▁ett', '▁nytt', '▁diag', 'nost', 'iskt', '▁verkt', 'yg', '</s>']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent tokenization and BLEU scores between AutoTokinzer and NllbTokenizerFast #44993

System Info

System Info

Information

BLEU Score Heatmaps

Who can help?

Information

Tasks

Reproduction

Expected behavior

With `sentencepiece` installed

Without `sentencepiece` installed (For me the expected results)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inconsistent tokenization and BLEU scores between AutoTokinzer and NllbTokenizerFast #44993

Description

System Info

System Info

Information

BLEU Score Heatmaps

Who can help?

Information

Tasks

Reproduction

Expected behavior

With sentencepiece installed

Without sentencepiece installed (For me the expected results)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

With `sentencepiece` installed

Without `sentencepiece` installed (For me the expected results)