Fix word-level timestamp overflow in Whisper chunked transcription #1483

neonwatty · 2025-12-13T11:15:45Z

Summary

Fixes whisper-large-v3-turbo_timestamped has broken timestamps #1357
Clamps word-level timestamps to the actual chunk_len to prevent timestamps from exceeding audio duration
When the model outputs timestamps near the 30s boundary (e.g., 29.98s) for shorter final chunks, timestamps would incorrectly exceed the audio duration

Root Cause

When using chunk_length_s=30 with word-level timestamps (return_timestamps: 'word'), the Whisper model can output timestamps up to ~29.98s (the maximum representable timestamp given time_precision = 30/1500 = 0.02). For a final chunk shorter than 30s, these timestamps would be added to the accumulated time_offset, causing the final timestamps to exceed the actual audio duration.

Solution

Track the actual chunk_len from the stride information and clamp raw token timestamps before adding time_offset. This ensures word-level timestamps never exceed the audio duration while preserving the existing behavior for segment-level timestamps.

Test plan

Added unit test that simulates the bug case (65s audio with 15s final chunk)
Verified fix with actual whisper-large-v3-turbo model on HuggingFace GPU
All existing tests pass
Build succeeds

…uggingface#1357) Clamp word-level timestamps to the actual chunk_len to prevent timestamps from exceeding audio duration when the model outputs timestamps near the 30s boundary for shorter final chunks.

Fix word-level timestamp overflow in Whisper chunked transcription (h…

b75d729

…uggingface#1357) Clamp word-level timestamps to the actual chunk_len to prevent timestamps from exceeding audio duration when the model outputs timestamps near the 30s boundary for shorter final chunks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix word-level timestamp overflow in Whisper chunked transcription #1483

Fix word-level timestamp overflow in Whisper chunked transcription #1483

neonwatty commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix word-level timestamp overflow in Whisper chunked transcription #1483

Are you sure you want to change the base?

Fix word-level timestamp overflow in Whisper chunked transcription #1483

Conversation

neonwatty commented Dec 13, 2025

Summary

Root Cause

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant