Skip to content

Accumulate n_tokens_seen from fwd/bwd step to calculate MFU/Throughput#12

Merged
dmahan93 merged 1 commit intodev-updated-againfrom
fix/mfu_cal
Oct 29, 2025
Merged

Accumulate n_tokens_seen from fwd/bwd step to calculate MFU/Throughput#12
dmahan93 merged 1 commit intodev-updated-againfrom
fix/mfu_cal

Conversation

@ighoshsubho
Copy link

@ighoshsubho ighoshsubho commented Oct 28, 2025

  • Fixed accum ntokens_since_last_log of n_tokens_seen from forward_backward_step
image
  • The MetricsProcessor calculates it like so
time_delta = time.perf_counter() - self.time_last_log

# tokens per second per device, abbreviated as tps
tps = self.ntokens_since_last_log / (
    time_delta * self.parallel_dims.non_data_parallel_size
)
# model FLOPS utilization
# For its definition and calculation, please refer to the PaLM paper:
# https://arxiv.org/abs/2204.02311
mfu = 100 * self.num_flops_per_token * tps / self.gpu_peak_flops
tflops = self.num_flops_per_token * tps / 1e12

@ighoshsubho ighoshsubho requested a review from dmahan93 October 28, 2025 14:51
@dmahan93
Copy link

LGTM! Thank you!

@dmahan93 dmahan93 merged commit 683e90f into dev-updated-again Oct 29, 2025
xrsrke pushed a commit that referenced this pull request Feb 13, 2026
Accumulate n_tokens_seen from fwd/bwd step to calculate MFU/Throughput
xrsrke pushed a commit that referenced this pull request Feb 25, 2026
Accumulate n_tokens_seen from fwd/bwd step to calculate MFU/Throughput
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants