Skip to content

Conversation

@3outeille
Copy link
Member

@3outeille 3outeille commented Dec 9, 2025

This PR needs the following #7 to function properly. Otherwise, torch.compile + tensor parallel will not work

Expecting

There is no performance drop

Testing methodology

  • 373M llama model (config is created in debug_local.sh)
  • Fixed in torchtitan the SDPA backend
 self.sdpa_backends = [
        # SDPBackend.CUDNN_ATTENTION,
        # SDPBackend.FLASH_ATTENTION,
        SDPBackend.EFFICIENT_ATTENTION,
        # SDPBackend.MATH,
    ]
 ./tooling_dev/debug_local.sh debugperf_large
 ./tooling_dev/debug_local.sh debugperf_large --compile

Results

  • v4.55.4

    • python ./tooling_dev/test_hf_integration.py compare_throughput --torchtitan_dir v4.55.4/llama3/debugperf_large --hf_dir v4.55.4/meta-llama/Llama-3.2-1B/debugperf_large --hide_status
      image

    • with torch.compile: python ./tooling_dev/test_hf_integration.py compare_throughput --torchtitan_dir v4.55.4_compile/llama3/debugperf_large --hf_dir v4.55.4_compile/meta-llama/Llama-3.2-1B/debugperf_large --hide_status
      image

  • v4.57.1

    • python ./tooling_dev/test_hf_integration.py compare_throughput --torchtitan_dir v4.57.1/llama3/debugperf_large --hf_dir v4.57.1/meta-llama/Llama-3.2-1B/debugperf_large --hide_status

      image
    • with torch.compile: python ./tooling_dev/test_hf_integration.py compare_throughput --torchtitan_dir v.4.57.1_compile/llama3/debugperf_large --hf_dir v.4.57.1_compile/meta-llama/Llama-3.2-1B/debugperf_large --hide_status

      image

…flags for compilation and selective checkpointing)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants