-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15962][SQL] Introduce implementation with a dense format for UnsafeArrayData #13680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
fb9a42d
add two implementations (sparse and dense) for UnsafeArrayData
kiszk d931428
fix failures of testsuite
kiszk 9777a2d
fix errors of unit tests
kiszk 000eda4
fix failures of unit tests
kiszk 804f081
make DenseID public
kiszk e6fb261
Use one implementation approach
kiszk a313084
fix test failures
kiszk 68d92f7
fix test failures
kiszk 7f2da14
update test suite
kiszk 2f26f6f
fix scala style error
kiszk ccef63c
revert changes
kiszk c4f1b5e
addressed comments
kiszk 34a5c6a
add benchmark
kiszk 7a77b20
fix scala style error
kiszk 7b0d4da
addressed comments
kiszk b4eac29
addressed comments
kiszk eecf6bd
fix parameters of Platform.OFFSET
kiszk d88a25a
update benchmark results
kiszk db15432
add test cases
kiszk 3fa7052
addressed comments
kiszk 4c094c2
addressed comments
kiszk 9887171
update test cases
kiszk 9fe7ad0
address comments
kiszk e4b4b52
address comments for test cases and benchmark
kiszk 585ca7b
addressed comments
kiszk 9933a06
addressed review comments
kiszk 919e832
fixed test failures
kiszk 0886e3a
update test suites
kiszk c385bf4
align each of variable length elements to 8 bytes
kiszk c8813db
fixed test failures
kiszk aa7cfdb
fixed test failures
kiszk 0b7867b
address review comments
kiszk ab9a16a
address review comments
kiszk 515701b
address review comments
kiszk 8169abd
change benchmark size
kiszk e356a79
addressed comments
kiszk 2ef6e3b
update performance results
kiszk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
addressed comments
- Loading branch information
commit b4eac29ebc8ea7b2c0e9e5717fbbbf13f653a4fb
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,8 +43,8 @@ | |
| * In the `values or offset` region, we store the content of elements. For fields that hold | ||
| * fixed-length primitive types, such as long, double, or int, we store the value directly | ||
| * in the field. For fields with non-primitive or variable-length values, we store a relative | ||
| * offset (w.r.t. the base address of the row) that points to the beginning of the variable-length | ||
| * field, and length (they are combined into a long). | ||
| * offset (w.r.t. the base address of the array) that points to the beginning of | ||
| * the variable-length field, and length (they are combined into a long). | ||
| * | ||
| * Instances of `UnsafeArrayData` act as pointers to row data stored in this format. | ||
| */ | ||
|
|
@@ -301,6 +301,7 @@ public boolean equals(Object other) { | |
| } | ||
| return false; | ||
| } | ||
|
|
||
| public void writeToMemory(Object target, long targetOffset) { | ||
| Platform.copyMemory(baseObject, baseOffset, target, targetOffset, sizeInBytes); | ||
| } | ||
|
|
@@ -387,50 +388,52 @@ public double[] toDoubleArray() { | |
| return values; | ||
| } | ||
|
|
||
| private static UnsafeArrayData fromPrimitiveArray(Object arr, int length, final int elementSize) { | ||
| final int headerSize = calculateHeaderPortionInBytes(length); | ||
| if (length > (Integer.MAX_VALUE - headerSize) / elementSize) { | ||
| private static UnsafeArrayData fromPrimitiveArray( | ||
| Object arr, int offset, int length, int elementSize) { | ||
| final long headerSize = calculateHeaderPortionInBytes(length); | ||
| final long valueRegionSize = (long)elementSize * (long)length; | ||
| final long allocationSize = (headerSize + valueRegionSize + 7) / 8; | ||
| if (allocationSize > (long)Integer.MAX_VALUE) { | ||
|
||
| throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + | ||
| "it's too big."); | ||
| } | ||
|
|
||
| final int valueRegionSize = elementSize * length; | ||
| final byte[] data = new byte[valueRegionSize + headerSize]; | ||
| final long[] data = new long[(int)allocationSize]; | ||
|
|
||
| Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, length); | ||
| Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data, | ||
| Platform.BYTE_ARRAY_OFFSET + headerSize, valueRegionSize); | ||
|
|
||
| UnsafeArrayData result = new UnsafeArrayData(); | ||
| result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, valueRegionSize + headerSize); | ||
| result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, (int)allocationSize * 8); | ||
| return result; | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(boolean[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 1); | ||
| return fromPrimitiveArray(arr, Platform.BYTE_ARRAY_OFFSET, arr.length, 1); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(byte[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 1); | ||
| return fromPrimitiveArray(arr, Platform.BYTE_ARRAY_OFFSET, arr.length, 1); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(short[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 2); | ||
| return fromPrimitiveArray(arr, Platform.SHORT_ARRAY_OFFSET, arr.length, 2); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(int[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 4); | ||
| return fromPrimitiveArray(arr, Platform.INT_ARRAY_OFFSET, arr.length, 4); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(long[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 8); | ||
| return fromPrimitiveArray(arr, Platform.LONG_ARRAY_OFFSET, arr.length, 8); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(float[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 4); | ||
| return fromPrimitiveArray(arr, Platform.FLOAT_ARRAY_OFFSET, arr.length, 4); | ||
| } | ||
|
|
||
| public static UnsafeArrayData fromPrimitiveArray(double[] arr) { | ||
| return fromPrimitiveArray(arr, arr.length, 8); | ||
| return fromPrimitiveArray(arr, Platform.DOUBLE_ARRAY_OFFSET, arr.length, 8); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing to use
sizefor all the names, how aboutheaderInBytes,valueRegionInBytes,totalSizeInWords?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated these names