feat(moe): add use_expert_bias config for optional expert biases by eous · Pull Request #2214 · pytorch/torchtitan

eous · 2026-01-08T23:50:11Z

Add support for optional expert biases (mlp1_bias, mlp2_bias) in GptOssGroupedExperts, required for loading GPT-OSS pretrained models.

Changes to GptOssGroupedExperts:

Add use_expert_bias parameter (default=True for GPT-OSS)
Add compute_dtype parameter for configurable compute precision
Add caching (_cached_tp_degree, _is_dtensor) for performance
Add _get_tp_degree() method with cache lookup
Proper None checks in _run_experts_for_loop and _run_experts_grouped_mm
ScaleBiasForward custom autograd for TP bias scaling

Changes to expert_parallel.py:

Handle optional bias distribution in GptossTensorParallel
Handle optional bias distribution in GptossExpertTensorParallel
Invalidate caches after parallelization

Changes to MoEArgs:

Add use_expert_bias: bool = False field

Config updates:

Set use_expert_bias=True for GPT-OSS 20B/120B models

Add support for optional expert biases (mlp1_bias, mlp2_bias) in GptOssGroupedExperts, required for loading GPT-OSS pretrained models. Changes to GptOssGroupedExperts: - Add use_expert_bias parameter (default=True for GPT-OSS) - Add compute_dtype parameter for configurable compute precision - Add caching (_cached_tp_degree, _is_dtensor) for performance - Add _get_tp_degree() method with cache lookup - Proper None checks in _run_experts_for_loop and _run_experts_grouped_mm - ScaleBiasForward custom autograd for TP bias scaling Changes to expert_parallel.py: - Handle optional bias distribution in GptossTensorParallel - Handle optional bias distribution in GptossExpertTensorParallel - Invalidate caches after parallelization Changes to MoEArgs: - Add use_expert_bias: bool = False field Config updates: - Set use_expert_bias=True for GPT-OSS 20B/120B models

Copilot

Pull request overview

This PR adds support for optional expert biases in GPT-OSS MoE models, which is required for loading pretrained GPT-OSS models. The changes enable models to optionally include mlp1_bias and mlp2_bias parameters based on a configurable flag.

Key changes:

Added use_expert_bias configuration flag (default=False) to control whether expert biases are used
Added compute_dtype parameter for configurable compute precision in grouped matrix multiplications
Implemented caching mechanism (_cached_tp_degree, _is_dtensor) to avoid repeated isinstance checks and device mesh lookups

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
torchtitan/models/moe/moe.py	Added `use_expert_bias` field to MoEArgs dataclass with clear documentation
torchtitan/models/gpt_oss/model/moe.py	Modified GptOssGroupedExperts to support optional biases, added compute_dtype parameter, implemented caching, and updated ScaleBiasForward with proper type annotations
torchtitan/models/gpt_oss/infra/expert_parallel.py	Updated tensor parallel distribution to handle optional biases and invalidate caches after parallelization
torchtitan/models/gpt_oss/init.py	Enabled `use_expert_bias=True` for 20B and 120B GPT-OSS model configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T23:53:13Z

torchtitan/models/gpt_oss/model/moe.py

            if (
-                not isinstance(self.mlp1_weight, DTensor)
-                # pyrefly: ignore [not-iterable]
+                not self._is_dtensor
                or "ep" not in self.mlp1_weight.device_mesh.mesh_dim_names
            ):


Potential AttributeError when checking device_mesh on non-DTensor. The condition not self._is_dtensor or "ep" not in self.mlp1_weight.device_mesh.mesh_dim_names will attempt to access .device_mesh even when self._is_dtensor is False due to Python's evaluation order in logical OR expressions. This should be rewritten to short-circuit properly, such as: if not self._is_dtensor: followed by elif "ep" not in self.mlp1_weight.device_mesh.mesh_dim_names:

Refactor conditional to make short-circuit logic explicit and add comment explaining when device_mesh access is safe.

tianyu-l

existing implementation already supports expert bias, what is this PR trying to achieve?

eous · 2026-01-20T23:07:14Z

existing implementation already supports expert bias, what is this PR trying to achieve?

Apologies, this appears to be an experiment I was running that got sweeped into its own change. Closing it out.

Copilot AI review requested due to automatic review settings January 8, 2026 23:50

eous requested review from fegin, tianyu-l, wconstab and wwwjn as code owners January 8, 2026 23:50

Copilot started reviewing on behalf of eous January 8, 2026 23:50 View session

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 8, 2026

Copilot AI reviewed Jan 8, 2026

View reviewed changes

eous mentioned this pull request Jan 8, 2026

feat(gpt-oss): Add CPU offload optimizer, differential LR/WD, and more #2205

Open

fix: clarify DTensor check to avoid potential AttributeError

a052e82

Refactor conditional to make short-circuit logic explicit and add comment explaining when device_mesh access is safe.

tianyu-l requested changes Jan 20, 2026

View reviewed changes

eous closed this Jan 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(moe): add use_expert_bias config for optional expert biases#2214

feat(moe): add use_expert_bias config for optional expert biases#2214
eous wants to merge 2 commits intopytorch:mainfrom
eous:moe-expert-bias

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

tianyu-l left a comment

Uh oh!

eous commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

eous commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants