-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8266431: Dual-Pivot Quicksort improvements (Radix sort) #27411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8266431: Dual-Pivot Quicksort improvements (Radix sort) #27411
Conversation
The main achievements - introduced Radix in parallel sorting which shows several times boost of performance and has linear complexity instead of n*ln(n) - improved mixed insertion sort (makes whole sorting faster) - improved merging sort for almost sorted data - optimized parallel sorting - improved step for pivot candidates and pivot partitioning - suggested better buffer allocation: if no memory, it is switched to in-place sorting with no OutOfMemoryError - minor javadoc and comment changes - extended existing tests - added tests for radix sort, heap sort, insertion sort - added benchmarking JMH tests - improved test coverage
👋 Welcome back VladimirIaroslavski! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@VladimirIaroslavski The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Sorting on a server with AVX512 instructions OS name: Fedora Linux 38 (Workstation Edition) OpenJDK version "24-beta" 2025-03-18 Sequential sorting
Parallel sorting
|
Sorting on a computer without AVX512 support OS name: Edition Windows 10 Pro OpenJDK 26-ea 2026-03-17 Sequential sorting
Parallel sorting
|
Nice work, Vladimir! Congratulations for all the hard work, needed to upgrade DPQS once again! I will look at this new PR. |
Please check your file encoding (windows vs jnix) see jcheck report! |
Webrevs
|
*/ | ||
private static final int MAX_RECURSION_DEPTH = 64 * DELTA; | ||
private static final int MAX_BUFFER_SIZE = | ||
(int) Math.min(Runtime.getRuntime().maxMemory() >>> 4, Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small suggestion: since Java 21, you may utilize Math.clamp
to avoid explicit (int)
cast:
private static final int MAX_BUFFER_SIZE =
Math.clamp(Runtime.getRuntime().maxMemory() >>> 4, 0, Integer.MAX_VALUE);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will apply Math.clamp().
* data, taking into account parallel context. | ||
*/ | ||
boolean isLargeRandom = | ||
// size > MIN_RADIX_SORT_SIZE && (sorter == null || bits > 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an outcommented line of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean isLargeRandom =
// size > MIN_RADIX_SORT_SIZE && (sorter == null || bits > 0) &&
size > MIN_RADIX_SORT_SIZE && (sorter != null && bits > 0) &&
(a[e1] > a[e2] || a[e2] > a[e3] || a[e3] > a[e4] || a[e4] > a[e5]);
This code runs Radix sort during parallel sort only.
If you want to use Radix sort during sequential or parallel sort also,
you need to switch to the first line.
Agree, both lines should contain explanations.
If we agree to move Radix sort out from this PR, these lines go away from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered a private static final boolean USE_RADIX_SORT_WITH_PARALLEL = false;
? Then we don't need to comment the code, and one could flip the behavior by altering the flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion, will try to apply
Do you have any sense on if parallelSort is widely used? There is a lot of code and complexity in this proposal and I think important to understand if it is worth it. |
It would be nice to have a high-level chart showing how the actual sorting algorithm is selected based on parallel/sequential, input size and whatever else. For an outsider, it's hard to grasp the whole picture. Is radix sort used for parallel sorting only? Sorry if such a chart already exists and I missed it. If improvements of parallel sorting and sequential sorting are at least partially independent, I think it may make sense to deliver them in separate PRs. This would really simplify the review process. |
Alan, Tagir, Thank you for your comments! New version of sources contains optimizations of sequential and parallel sorts, such as, improvements of insertion sort, better partitioning, better pivot selection, merging sort and counting sort (for byte/char/short). I think also it makes sense to separate this PR into two pull requests. Do you agree, if I move Radix sort out from this PR and re-run benchmarking? |
Yes, splitting this into two would be good as it allow both proposals to be discussed/reviewed on their own merit. |
* Moved Radix sort out from sorting
I added the chart to the first post, thank you for advice! |
I moved Radix sort out from sources and re-run benchmarking of parallel sorting. Parallel sorting (without Radix sort)
|
The goal of the PR is to improve both, sequential and parallel, sorting of primitives.
The main achievements
introduced Radix in parallel sorting which shows several times boost of performance and has linear complexity instead of n*ln(n)
improved mixed insertion sort (makes whole sorting faster)
improved merging sort for almost sorted data
optimized parallel sorting
improved step for pivot candidates and pivot partitioning
suggested better buffer allocation: if no memory, it is switched to in-place sorting with no OutOfMemoryError
minor javadoc and comment changes
extended existing tests
added tests for radix sort, heap sort, insertion sort
added benchmarking JMH tests
improved test coverage
The summary of benchmarking:
Sequential sorting (Arrays.sort)
byte: up to 50% faster
char: 4-7 times faster
short: 2-6 times faster
int: 1.2-5 times faster
long: 1.2-5 times faster
float: 1.2-5 times faster
double: 1.2-4 times faster
Parallel sorting (Arrays.parallelSort)
int: 1.2-9 times faster
long: 1.2-9 times faster
float: 1.2-4 times faster
double: 1.2-4 times faster
AVX512 support
Vamsi Parasa suggested faster sort routines by taking advantage of AVX512 instructions, see #14227, sources of sorting were modified. Therefore, I performed benchmarking of the final version (which includes optimizations by Vamsi Parasa and optimizations from my side) on a server with CPUs supported AVX512 instructions, no regression of performance was found, see detailed benchmarking results.
High-level chart showing how the actual sorting algorithm is selected
based on parallel/sequential and input size
int/long/float/double
if size < MAX_INSERTION_SORT_SIZE(=51) => [ mixed ] insertion sort
if array is almost sorted and size > MIN_MERGING_SORT_SIZE(=512) => merging sort
if recursion depth > MAX_RECURSION_DEPTH(=64) => heap sort
if random data => two pivots quicksort, else => one pivot quicksort
byte
if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
else => counting sort
char/short
if size > MIN_COUNTING_SORT_SIZE(640) => counting sort
if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
if recursion depth > MAX_RECURSION_DEPTH(=64) => counting sort
if random data => two pivots quicksort, else => one pivot quicksort
parallel sorting (int/long/float/double)
if size < MIN_PARALLEL_SORT_SIZE(3K) => sequential sort
invoke parallel merge sort, depth depends on parallelism level
then switch to parallel quicksort.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27411/head:pull/27411
$ git checkout pull/27411
Update a local copy of the PR:
$ git checkout pull/27411
$ git pull https://git.openjdk.org/jdk.git pull/27411/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27411
View PR using the GUI difftool:
$ git pr show -t 27411
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27411.diff
Using Webrev
Link to Webrev Comment