Skip to content

Initialize bias to zero#2450

Merged
tianyu-l merged 1 commit intopytorch:mainfrom
rthekini-aws:initialize-bias
Feb 28, 2026
Merged

Initialize bias to zero#2450
tianyu-l merged 1 commit intopytorch:mainfrom
rthekini-aws:initialize-bias

Conversation

@rthekini-aws
Copy link
Contributor

Router.init_weights initializes self.gate.weight via trunc_normal_ but never initializes self.gate.bias. Under torch.use_deterministic_algorithms(True), PyTorch's fill_uninitialized_memory fills the bias with NaN, which poisons all router scores and produces NaN loss from step 1.

Also defensively initializes FeedForward biases (not currently triggered since bias=False by default).

Root cause

nn.Linear allocates bias with torch.empty. Normally this contains finite garbage that gets overwritten during training. With fill_uninitialized_memory=True (enabled by deterministic mode), uninitialized memory is filled with NaN to surface exactly this kind of bug.

Fix

Zero-initialize biases in Router.init_weights and FeedForward.init_weights.

Testing

Verified with gpt_oss debugmodel (NGPU=1 MODULE=gpt_oss CONFIG=gpt_oss_debugmodel) and --debug.deterministic across seeds 0, 42, 123, 999 — all produce converging loss where previously every step was NaN.

Seed Step 1 Step 5
0 8.133 4.454
42 8.087 4.285
123 7.982 4.408
999 8.138 4.330

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 27, 2026
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failing unrelated

@rthekini-aws
Copy link
Contributor Author

@tianyu-l, @wwwjn Are there any steps I need to take to merge this?

@tianyu-l tianyu-l merged commit d6a9434 into pytorch:main Feb 28, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants