Reduce per-byte overhead in VLQ integer decoding#9584
Conversation
Read directly from the buffer slice instead of calling get_aligned for each byte, avoiding repeated offset calculations and bounds checks in the hot loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
etseidl
left a comment
There was a problem hiding this comment.
Looks sound to me. Thanks @Dandandan
| self.byte_offset = self.get_byte_offset(); | ||
| self.bit_offset = 0; | ||
|
|
||
| let buf = &self.buffer[self.byte_offset..]; |
There was a problem hiding this comment.
I was going to ask about bounds checking, but it seems like byte_offset won't ever get too big, so at worst this will return an empty slice, resulting in a return of None.
Which issue does this PR close?
Closes #9580
Rationale
The current VLQ decoder calls
get_alignedfor each byte, which involves repeated offset calculations and bounds checks in the hot loop.What changes are included in this PR?
Align to the byte boundary once, then iterate directly over the buffer slice, avoiding per-byte overhead from
get_aligned.Are there any user-facing changes?
No.
🤖 Generated with Claude Code