Description
StringView dictionary decoding currently goes through an intermediate index buffer: decode indices → gather views. For RLE runs (which are common), this roundtrip is unnecessary.
Proposed Changes
- For RLE runs, use
repeat_n to fill views directly, skipping the index buffer entirely
- Pre-reserve output views capacity before the decode loop, eliminating per-chunk reallocation
- Skip buffer management when all dictionary views are inlined (≤12 bytes)
- Pre-reserve offsets in ByteArray dictionary decoding
This eliminates the intermediate index buffer roundtrip for the common RLE case and reduces StringView dictionary decoding time by ~49% in benchmarks.
🤖 Generated with Claude Code