Skip to content

Conversation

@kevaundray
Copy link
Contributor

No description provided.

@kevaundray kevaundray changed the title chore(experiment): refactor blake2 chore(experiment): refactor blake2 input parsing Jul 18, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Jul 18, 2025

CodSpeed Performance Report

Merging #2734 will degrade performances by 6.92%

Comparing kevaundray:kw/blake2-experiments (fc38b65) with main (f4f4c38)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 169 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
blake2/4_rounds 3.5 µs 3.4 µs +3.19%
blake2/compress_12_rounds 3 µs 3.3 µs -6.92%

@kevaundray
Copy link
Contributor Author

running perfblake locally implied that this was faster, however when comparing to master using codspeed, its slightly slower: #2735 (comment)

@kevaundray
Copy link
Contributor Author

Adding 100K and 200K rounds to have larger inputs to confirm

let flags = set4(count_low(count), count_high(count), last_block, last_node);
let mut d = xor(loadu(iv_high), flags);

let block: &[u8; BLOCKBYTES] = std::mem::transmute(block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine as endiness is the same for all targets that have avx

Copy link
Member

@rakita rakita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rakita rakita marked this pull request as ready for review July 21, 2025 17:13
@rakita rakita merged commit b839677 into bluealloy:main Jul 21, 2025
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants