-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49224][TESTS] Regenerate benchmark results #47743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,7 @@ | |||
| OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This result file is added.
| @@ -0,0 +1,7 @@ | |||
| OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This result file is added.
| Coalesce Num Partitions: 100 Num Hosts: 10 178 185 6 0.6 1779.9 2.0X | ||
| Coalesce Num Partitions: 100 Num Hosts: 20 153 156 4 0.7 1531.2 2.3X | ||
| Coalesce Num Partitions: 100 Num Hosts: 40 148 149 1 0.7 1479.1 2.4X | ||
| Coalesce Num Partitions: 100 Num Hosts: 80 166 170 5 0.6 1657.8 2.2X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratio slightly changed here in a marginal difference. Java 17 result is okay.
| Deserialization 329 346 18 0.6 1645.6 2.1X | ||
|
|
||
| Compressed Serialized MapStatus sizes: 570.0 B | ||
| Compressed Serialized MapStatus sizes: 569.0 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due the compression library change.
| Spark 3473 3476 3 0.3 3472.8 1.4X | ||
| Spark Binary 2625 2628 3 0.4 2624.6 1.8X | ||
| Common Codecs 4444 4451 11 0.2 4444.1 1.0X | ||
| Java 5500 5533 41 0.2 5500.5 0.8X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becomes slower relatively in Java 21 while Java 17 result looks good.
| UNICODE 364027 364291 374 0.0 3640267.9 0.0X | ||
| UNICODE_CI 421444 422138 981 0.0 4214438.7 0.0X | ||
| UTF8_BINARY 8793 8794 1 0.0 87929.3 1.0X | ||
| UTF8_LCASE 19382 19394 16 0.0 193824.8 0.5X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF8_LCASE becomes faster consistently in other test cases too in this suite.
| SQL Json 7831 7865 48 2.0 497.9 1.3X | ||
| SQL Json with UnsafeRow 8565 8571 8 1.8 544.6 1.2X | ||
| SQL Parquet Vectorized: DataPageV1 81 96 11 193.3 5.2 125.6X | ||
| SQL Parquet Vectorized: DataPageV2 201 210 8 78.4 12.8 50.9X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataPageV2 becomes very slower than DataPageV1 here and next benchmark. Please note that the previous result was generated when we upgraded to Apache Parquet 1.14.1.
So, if there is a reason for this, it's not Apache Parquet dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, Java 17 result looks okay with this part. This could be a transient issue.
| ParquetReader Vectorized -> Row: DataPageV1 73 74 1 216.2 4.6 1.2X | ||
| ParquetReader Vectorized -> Row: DataPageV2 92 93 1 171.0 5.8 1.0X | ||
| ParquetReader Vectorized: DataPageV1 84 86 1 187.3 5.3 1.0X | ||
| ParquetReader Vectorized: DataPageV2 208 211 4 75.7 13.2 0.4X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
| Native ORC Vectorized 5129 5216 60 3.1 326.1 1.2X | ||
| Native ORC Vectorized (Pushdown) 323 330 6 48.7 20.5 19.5X | ||
| Parquet Vectorized 6345 6437 61 2.5 403.4 1.0X | ||
| Parquet Vectorized (Pushdown) 341 363 12 46.2 21.7 18.6X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet Vectorized (Pushdown) becomes slower consistently in this benchmark for both Java 17/21.
| Join w 2 ints wholestage off 148982 149062 112 0.1 7104.0 1.0X | ||
| Join w 2 ints wholestage on 105434 105515 63 0.2 5027.5 1.4X | ||
| Join w 2 ints wholestage off 106730 106790 85 0.2 5089.3 1.0X | ||
| Join w 2 ints wholestage on 105489 105534 40 0.2 5030.1 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join w 2 ints wholestage on becomes slower in both Java 17 and 21.
Or, wholestage off code becomes faster in this case.
| AMD EPYC 7763 64-Core Processor | ||
| TakeOrderedAndProject with SMJ: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| --------------------------------------------------------------------------------------------------------------------------------- | ||
| TakeOrderedAndProject with SMJ for doExecute 87 91 4 0.1 8677.0 1.0X | ||
| TakeOrderedAndProject with SMJ for executeCollect 63 70 8 0.2 6290.5 1.4X | ||
| TakeOrderedAndProject with SMJ for doExecute 214 243 27 0.0 21428.5 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is a GitHub action result, this could be slower a little.
| Native ORC Vectorized 12870 12946 107 0.1 12274.1 1.0X | ||
| Hive built-in ORC 12664 12690 37 0.1 12077.5 1.0X | ||
| Native ORC MR 12398 12513 162 0.1 11823.9 1.0X | ||
| Native ORC Vectorized 12552 12553 1 0.1 11970.4 1.0X | ||
|
|
||
|
|
||
| ================================================================================================ | ||
| Nested Struct scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratio is changed in these benchmark cases for ORC MR in Java 21 while Java 17 result looks fine.
|
cc @LuciferYang and @yaooqinn |
|
Thank you, @yaooqinn ! |
|
This is only a set of generated result from a specific commit. Let me merge this before this PR becomes far from that commit~ Merged to master. |
|
late LGTM ~ |
|
Thank you, @LuciferYang . |
### What changes were proposed in this pull request? This reverts commit 717a6da. ### Why are the changes needed? To fix a performance regression. During the regular performance audit, - #47743 `ExternalAppendOnlyUnsafeRowArrayBenchmark` detected a performance regression caused by SPARK-48626. - #47192 ### Does this PR introduce _any_ user-facing change? No. This is not released yet. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47747 from dongjoon-hyun/SPARK-48628. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to regenerate benchmark results (except
ExternalAppendOnlyUnsafeRowArrayBenchmark) as a preparation for Apache Spark 4.0.0-preview2.During the testing, it's observed that
ExternalAppendOnlyUnsafeRowArrayBenchmarkhangs in both CI and local environment. SPARK-49228 is filed for its investigation.In addition,
Storage Partition Join-related benchmark are generated for the following commits.Why are the changes needed?
To check the performance regression.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This is generated by
Manual review.
Was this patch authored or co-authored using generative AI tooling?
No.