Skip to content

Reduce per-byte overhead in VLQ integer decoding#9584

Merged
Dandandan merged 1 commit intoapache:mainfrom
Dandandan:pr/vlq-decoding
Mar 24, 2026
Merged

Reduce per-byte overhead in VLQ integer decoding#9584
Dandandan merged 1 commit intoapache:mainfrom
Dandandan:pr/vlq-decoding

Conversation

@Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #9580

Rationale

The current VLQ decoder calls get_aligned for each byte, which involves repeated offset calculations and bounds checks in the hot loop.

What changes are included in this PR?

Align to the byte boundary once, then iterate directly over the buffer slice, avoiding per-byte overhead from get_aligned.

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Read directly from the buffer slice instead of calling get_aligned for
each byte, avoiding repeated offset calculations and bounds checks in
the hot loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 19, 2026
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4093148017-469-769vg 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/vlq-decoding (9098f72) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   pr_vlq-decoding
-----                                             ----                                   ---------------
arrow_reader_clickbench/async/Q1                  1.00   1086.3±5.25µs        ? ?/sec    1.00   1084.3±5.54µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.8±0.21ms        ? ?/sec    1.00      6.8±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.8±0.19ms        ? ?/sec    1.00      7.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.02     14.9±0.20ms        ? ?/sec    1.00     14.6±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.03     17.6±0.35ms        ? ?/sec    1.00     17.1±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.01     16.1±0.29ms        ? ?/sec    1.00     15.9±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.2±0.07ms        ? ?/sec    1.00      3.1±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     73.7±0.56ms        ? ?/sec    1.21    88.9±13.41ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.02     82.1±0.63ms        ? ?/sec    1.00     80.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    115.6±4.56ms        ? ?/sec    1.16   134.2±10.67ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.04    250.7±3.83ms        ? ?/sec    1.00    240.5±2.93ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.01     19.7±0.35ms        ? ?/sec    1.00     19.5±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.03     59.2±0.49ms        ? ?/sec    1.00     57.2±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.04     59.4±0.67ms        ? ?/sec    1.00     57.0±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.7±0.18ms        ? ?/sec    1.00     18.5±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.04     15.7±0.37ms        ? ?/sec    1.00     15.2±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.5±0.07ms        ? ?/sec    1.00      5.5±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.03     13.6±0.37ms        ? ?/sec    1.00     13.2±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.05     25.2±0.57ms        ? ?/sec    1.00     24.0±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.03      5.9±0.11ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.02      5.1±0.05ms        ? ?/sec    1.00      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.01      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1067.8±4.36µs        ? ?/sec    1.01   1079.9±8.70µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.7±0.16ms        ? ?/sec    1.00      6.6±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.7±0.18ms        ? ?/sec    1.00      7.7±0.11ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.03     14.8±0.20ms        ? ?/sec    1.00     14.4±0.21ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.03     17.3±0.36ms        ? ?/sec    1.00     16.8±0.27ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     16.1±0.31ms        ? ?/sec    1.00     16.0±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.03      3.0±0.05ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.02     72.0±0.66ms        ? ?/sec    1.00     70.7±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     81.3±0.73ms        ? ?/sec    1.00     79.2±0.30ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.4±0.84ms        ? ?/sec    1.00     95.8±0.49ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.01    228.1±1.62ms        ? ?/sec    1.00    226.3±2.77ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.4±0.34ms        ? ?/sec    1.00     19.4±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.6±0.70ms        ? ?/sec    1.00     55.9±0.37ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.04     57.9±0.71ms        ? ?/sec    1.00     55.7±0.50ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.3±0.22ms        ? ?/sec    1.00     18.3±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.04     14.9±0.43ms        ? ?/sec    1.00     14.4±0.55ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.4±0.07ms        ? ?/sec    1.00      5.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.9±0.40ms        ? ?/sec    1.00     12.9±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.06     24.2±0.67ms        ? ?/sec    1.00     22.9±0.43ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.03      5.7±0.13ms        ? ?/sec    1.00      5.5±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.01      4.9±0.07ms        ? ?/sec    1.00      4.8±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.04ms        ? ?/sec    1.00      3.5±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    866.4±2.49µs        ? ?/sec    1.01    872.6±2.22µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.2±0.07ms        ? ?/sec    1.00      5.2±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.02      6.2±0.07ms        ? ?/sec    1.00      6.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.03     22.2±0.65ms        ? ?/sec    1.00     21.5±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.17     28.8±1.08ms        ? ?/sec    1.00     24.6±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.01     23.2±0.28ms        ? ?/sec    1.00     23.1±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.03      2.8±0.04ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.04    124.8±0.78ms        ? ?/sec    1.00    120.3±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.03     99.0±1.03ms        ? ?/sec    1.00     96.1±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.04    147.3±1.23ms        ? ?/sec    1.00    141.9±1.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.02    278.5±8.61ms        ? ?/sec    1.00   272.7±14.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.01     27.5±0.45ms        ? ?/sec    1.00     27.2±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.03    109.7±0.89ms        ? ?/sec    1.00    106.1±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.05    107.8±0.87ms        ? ?/sec    1.00    103.0±0.49ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.03     19.2±0.16ms        ? ?/sec    1.00     18.7±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.02     22.6±0.40ms        ? ?/sec    1.00     22.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.04ms        ? ?/sec    1.01      7.0±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.01     11.5±0.19ms        ? ?/sec    1.00     11.3±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.02     21.0±0.34ms        ? ?/sec    1.00     20.7±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.02      5.4±0.10ms        ? ?/sec    1.00      5.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.05ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.4±0.04ms        ? ?/sec    1.00      4.4±0.03ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 783.1s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 705.5s
CPU sys 77.6s
Disk read 12.0 KiB
Disk write 1.3 GiB

branch

Metric Value
Wall time 789.6s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 722.8s
CPU sys 66.8s
Disk read 0 B
Disk write 171.4 MiB

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound to me. Thanks @Dandandan

self.byte_offset = self.get_byte_offset();
self.bit_offset = 0;

let buf = &self.buffer[self.byte_offset..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask about bounds checking, but it seems like byte_offset won't ever get too big, so at worst this will return an empty slice, resulting in a return of None.

@Dandandan Dandandan merged commit 980ea0b into apache:main Mar 24, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce per-byte overhead in VLQ integer decoding

3 participants