Skip to content

Commit b3f2911

Browse files
MaxGekkHyukjinKwon
authored andcommitted
[SPARK-24945][SQL] Switching to uniVocity 2.7.3
## What changes were proposed in this pull request? In the PR, I propose to upgrade uniVocity parser from **2.6.3** to **2.7.3**. The recent version includes a fix for the SPARK-24645 issue and has better performance. Before changes: ``` Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ One quoted string 33336 / 34122 0.0 666727.0 1.0X Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Select 1000 columns 90287 / 91713 0.0 90286.9 1.0X Select 100 columns 31826 / 36589 0.0 31826.4 2.8X Select one column 25738 / 25872 0.0 25737.9 3.5X count() 6931 / 7269 0.1 6931.5 13.0X ``` after: ``` Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ One quoted string 33411 / 33510 0.0 668211.4 1.0X Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Select 1000 columns 88028 / 89311 0.0 88028.1 1.0X Select 100 columns 29010 / 32755 0.0 29010.1 3.0X Select one column 22936 / 22953 0.0 22936.5 3.8X count() 6657 / 6740 0.2 6656.6 13.5X ``` Closes #21892 ## How was this patch tested? It was tested by `CSVSuite` and `CSVBenchmarks` Author: Maxim Gekk <[email protected]> Closes #21969 from MaxGekk/univocity-2_7_3.
1 parent 7cf16a7 commit b3f2911

File tree

4 files changed

+4
-4
lines changed

4 files changed

+4
-4
lines changed

dev/deps/spark-deps-hadoop-2.6

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ stax-api-1.0.1.jar
191191
stream-2.7.0.jar
192192
stringtemplate-3.2.1.jar
193193
super-csv-2.2.0.jar
194-
univocity-parsers-2.6.3.jar
194+
univocity-parsers-2.7.3.jar
195195
validation-api-1.1.0.Final.jar
196196
xbean-asm6-shaded-4.8.jar
197197
xercesImpl-2.9.1.jar

dev/deps/spark-deps-hadoop-2.7

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ stax-api-1.0.1.jar
192192
stream-2.7.0.jar
193193
stringtemplate-3.2.1.jar
194194
super-csv-2.2.0.jar
195-
univocity-parsers-2.6.3.jar
195+
univocity-parsers-2.7.3.jar
196196
validation-api-1.1.0.Final.jar
197197
xbean-asm6-shaded-4.8.jar
198198
xercesImpl-2.9.1.jar

dev/deps/spark-deps-hadoop-3.1

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ stream-2.7.0.jar
212212
stringtemplate-3.2.1.jar
213213
super-csv-2.2.0.jar
214214
token-provider-1.0.1.jar
215-
univocity-parsers-2.6.3.jar
215+
univocity-parsers-2.7.3.jar
216216
validation-api-1.1.0.Final.jar
217217
woodstox-core-5.0.3.jar
218218
xbean-asm6-shaded-4.8.jar

sql/core/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
<dependency>
3939
<groupId>com.univocity</groupId>
4040
<artifactId>univocity-parsers</artifactId>
41-
<version>2.6.3</version>
41+
<version>2.7.3</version>
4242
<type>jar</type>
4343
</dependency>
4444
<dependency>

0 commit comments

Comments
 (0)