-
Notifications
You must be signed in to change notification settings - Fork 14.1k
speed up mem::swap #40454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
speed up mem::swap #40454
Changes from 1 commit
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
4bcfbc3
speed up mem::swap
djzin 85049e5
avoid recursion
djzin 5702f43
a new approach; ditch xor cuteness and maximize cache locality
djzin d1fec0d
fix typo
djzin 1daf589
add SWAP_BLOCK_SIZE constant
djzin 2816998
use simd blocks
djzin c6ca81a
change wording
djzin 165f366
optimize out stack alignment for sizes < 32
djzin ca2fa97
improve wording
djzin fcc970a
fix nit
djzin c6307a2
copy tail bytes better for aligned types
djzin 7475135
Merge remote-tracking branch 'upstream/master' into fast-swap
djzin d4d3f53
better respect alignment for copying tail
djzin 8a973df
restore old behaviour for sizes < 128
djzin b795b7b
restore old behaviour
djzin 83f1f11
hack around bug in emscripten
djzin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
a new approach; ditch xor cuteness and maximize cache locality
- Loading branch information
commit 5702f436aa6258119a32cbff31cc442d73b0d2c0
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it ever matters, but it occurs to me that calculating
remaslen % 16instead oflen - imight be more obvious to an optimizer, as knowing whatlen - iis requires reasoning about a loop. Similarly, it would be better to test whetherrem == 0than to testi < len, as it is more trivially determined at compile time.I doubt this matters in most cases, but I could see the optimizer failing when the size is large, since the loop might not be fully unrolled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it; looks like the generated assembly is the same either way.
lenhere is a constant known at compile time so I guess it just gets completely eliminated