Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 13, 2024

What changes were proposed in this pull request?

This PR aims to regenerate benchmark results (except ExternalAppendOnlyUnsafeRowArrayBenchmark) as a preparation for Apache Spark 4.0.0-preview2.

Why are the changes needed?

To check the performance regression.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This is generated by

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -0,0 +1,7 @@
OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This result file is added.

@@ -0,0 +1,7 @@
OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This result file is added.

Coalesce Num Partitions: 100 Num Hosts: 10 178 185 6 0.6 1779.9 2.0X
Coalesce Num Partitions: 100 Num Hosts: 20 153 156 4 0.7 1531.2 2.3X
Coalesce Num Partitions: 100 Num Hosts: 40 148 149 1 0.7 1479.1 2.4X
Coalesce Num Partitions: 100 Num Hosts: 80 166 170 5 0.6 1657.8 2.2X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio slightly changed here in a marginal difference. Java 17 result is okay.

Deserialization 329 346 18 0.6 1645.6 2.1X

Compressed Serialized MapStatus sizes: 570.0 B
Compressed Serialized MapStatus sizes: 569.0 B
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due the compression library change.

Spark 3473 3476 3 0.3 3472.8 1.4X
Spark Binary 2625 2628 3 0.4 2624.6 1.8X
Common Codecs 4444 4451 11 0.2 4444.1 1.0X
Java 5500 5533 41 0.2 5500.5 0.8X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes slower relatively in Java 21 while Java 17 result looks good.

UNICODE 364027 364291 374 0.0 3640267.9 0.0X
UNICODE_CI 421444 422138 981 0.0 4214438.7 0.0X
UTF8_BINARY 8793 8794 1 0.0 87929.3 1.0X
UTF8_LCASE 19382 19394 16 0.0 193824.8 0.5X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF8_LCASE becomes faster consistently in other test cases too in this suite.

SQL Json 7831 7865 48 2.0 497.9 1.3X
SQL Json with UnsafeRow 8565 8571 8 1.8 544.6 1.2X
SQL Parquet Vectorized: DataPageV1 81 96 11 193.3 5.2 125.6X
SQL Parquet Vectorized: DataPageV2 201 210 8 78.4 12.8 50.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataPageV2 becomes very slower than DataPageV1 here and next benchmark. Please note that the previous result was generated when we upgraded to Apache Parquet 1.14.1.

So, if there is a reason for this, it's not Apache Parquet dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, Java 17 result looks okay with this part. This could be a transient issue.

ParquetReader Vectorized -> Row: DataPageV1 73 74 1 216.2 4.6 1.2X
ParquetReader Vectorized -> Row: DataPageV2 92 93 1 171.0 5.8 1.0X
ParquetReader Vectorized: DataPageV1 84 86 1 187.3 5.3 1.0X
ParquetReader Vectorized: DataPageV2 208 211 4 75.7 13.2 0.4X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Native ORC Vectorized 5129 5216 60 3.1 326.1 1.2X
Native ORC Vectorized (Pushdown) 323 330 6 48.7 20.5 19.5X
Parquet Vectorized 6345 6437 61 2.5 403.4 1.0X
Parquet Vectorized (Pushdown) 341 363 12 46.2 21.7 18.6X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parquet Vectorized (Pushdown) becomes slower consistently in this benchmark for both Java 17/21.

Join w 2 ints wholestage off 148982 149062 112 0.1 7104.0 1.0X
Join w 2 ints wholestage on 105434 105515 63 0.2 5027.5 1.4X
Join w 2 ints wholestage off 106730 106790 85 0.2 5089.3 1.0X
Join w 2 ints wholestage on 105489 105534 40 0.2 5030.1 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join w 2 ints wholestage on becomes slower in both Java 17 and 21.

Or, wholestage off code becomes faster in this case.

AMD EPYC 7763 64-Core Processor
TakeOrderedAndProject with SMJ: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProject with SMJ for doExecute 87 91 4 0.1 8677.0 1.0X
TakeOrderedAndProject with SMJ for executeCollect 63 70 8 0.2 6290.5 1.4X
TakeOrderedAndProject with SMJ for doExecute 214 243 27 0.0 21428.5 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is a GitHub action result, this could be slower a little.

Native ORC Vectorized 12870 12946 107 0.1 12274.1 1.0X
Hive built-in ORC 12664 12690 37 0.1 12077.5 1.0X
Native ORC MR 12398 12513 162 0.1 11823.9 1.0X
Native ORC Vectorized 12552 12553 1 0.1 11970.4 1.0X


================================================================================================
Nested Struct scan
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio is changed in these benchmark cases for ORC MR in Java 21 while Java 17 result looks fine.

@dongjoon-hyun
Copy link
Member Author

cc @LuciferYang and @yaooqinn

@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn !

@dongjoon-hyun
Copy link
Member Author

This is only a set of generated result from a specific commit. Let me merge this before this PR becomes far from that commit~

Merged to master.

@LuciferYang
Copy link
Contributor

late LGTM ~

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang .

dongjoon-hyun added a commit that referenced this pull request Aug 14, 2024
### What changes were proposed in this pull request?

This reverts commit 717a6da.

### Why are the changes needed?

To fix a performance regression.

During the regular performance audit,
- #47743

`ExternalAppendOnlyUnsafeRowArrayBenchmark` detected a performance regression caused by SPARK-48626.
- #47192

### Does this PR introduce _any_ user-facing change?

No. This is not released yet.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47747 from dongjoon-hyun/SPARK-48628.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants