[Bug] GlmMoeDsa crashes on second forward pass — stale indexer cache

## System Info

- `transformers` version: 5.3.0
- Platform: Linux
- Python version: 3.13.5
- PyTorch version: 2.8.0+cu128

## Who can help?

@Rocketknight1

## Information

- [x] The official example scripts
- [x] My own modified scripts

## Reproduction

GlmMoeDsa models crash on any second forward pass. The DSA indexer's `_cached_keys` and `_cached_indices` persist between calls and cause shape mismatches or out-of-bounds scatter indices.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("yujiepan/glm-5-tiny-random", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("yujiepan/glm-5-tiny-random")

inputs = tokenizer("Hello", return_tensors="pt").to(model.device)

# First forward: OK
out1 = model(**inputs)
print(out1.logits.shape)  # torch.Size([1, 1, 154880])

# Second forward: CRASH
out2 = model(**inputs)  # AcceleratorError: CUDA error: device-side assert triggered
```

Same issue with `yujiepan/glm-moe-dsa-tiny-random`.

## Error

With `CUDA_LAUNCH_BLOCKING=1`:

```
  File ".../transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 414, in forward
    index_mask.scatter_(-1, topk_indices, 0.0)  # [B, S, T]
torch.AcceleratorError: CUDA error: device-side assert triggered
```

The underlying issue is at `modeling_glm_moe_dsa.py:198`:
```python
k_cached = torch.cat([self._cached_keys, k], dim=1)  # [B, T, D]
```

On the second forward call, `self._cached_keys` still holds stale state from the first call, leading to shape mismatches or invalid indices.

## Expected behavior

The model should be callable multiple times without error. The DSA indexer should either reset its cache between forward passes or not use persistent state for inference without KV cache.

## Additional context

This is related to other known GlmMoeDsa indexer issues (#44360, #44263). The stale cache issue compounds with those bugs — even if the indexer logic is fixed, the persistent cache between calls will continue to cause problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] GlmMoeDsa crashes on second forward pass — stale indexer cache #44995

System Info

Who can help?

Information

Reproduction

Error

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] GlmMoeDsa crashes on second forward pass — stale indexer cache #44995

Description

System Info

Who can help?

Information

Reproduction

Error

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions