Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,14 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

val PARQUET_FILTER_PUSHDOWN_DECIMAL_ENABLED =
buildConf("spark.sql.parquet.filterPushdown.decimal")
.doc("If true, enables Parquet filter push-down optimization for Decimal. " +
"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled.")
.internal()
.booleanConf
.createWithDefault(true)

val PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED =
buildConf("spark.sql.parquet.filterPushdown.string.startsWith")
.doc("If true, enables Parquet filter push-down optimization for string startsWith function. " +
Expand Down Expand Up @@ -1505,6 +1513,8 @@ class SQLConf extends Serializable with Logging {

def parquetFilterPushDownTimestamp: Boolean = getConf(PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED)

def parquetFilterPushDownDecimal: Boolean = getConf(PARQUET_FILTER_PUSHDOWN_DECIMAL_ENABLED)

def parquetFilterPushDownStringStartWith: Boolean =
getConf(PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED)

Expand Down
96 changes: 48 additions & 48 deletions sql/core/benchmarks/FilterPushdownBenchmark-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 3785 / 3867 4.2 240.6 1.0X
Parquet Vectorized (Pushdown) 3820 / 3928 4.1 242.9 1.0X
Native ORC Vectorized 3981 / 4049 4.0 253.1 1.0X
Native ORC Vectorized (Pushdown) 702 / 735 22.4 44.6 5.4X
Parquet Vectorized 4546 / 4743 3.5 289.0 1.0X
Parquet Vectorized (Pushdown) 161 / 175 98.0 10.2 28.3X
Native ORC Vectorized 5721 / 5842 2.7 363.7 0.8X
Native ORC Vectorized (Pushdown) 1019 / 1070 15.4 64.8 4.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 10% decimal(9, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 4694 / 4813 3.4 298.4 1.0X
Parquet Vectorized (Pushdown) 4839 / 4907 3.3 307.6 1.0X
Native ORC Vectorized 4943 / 5032 3.2 314.2 0.9X
Native ORC Vectorized (Pushdown) 2043 / 2085 7.7 129.9 2.3X
Parquet Vectorized 6340 / 7236 2.5 403.1 1.0X
Parquet Vectorized (Pushdown) 3052 / 3164 5.2 194.1 2.1X
Native ORC Vectorized 8370 / 9214 1.9 532.1 0.8X
Native ORC Vectorized (Pushdown) 4137 / 4242 3.8 263.0 1.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 50% decimal(9, 2) rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8321 / 8472 1.9 529.0 1.0X
Parquet Vectorized (Pushdown) 8125 / 8471 1.9 516.6 1.0X
Native ORC Vectorized 8524 / 8616 1.8 541.9 1.0X
Native ORC Vectorized (Pushdown) 7961 / 8383 2.0 506.1 1.0X
Parquet Vectorized 12976 / 13249 1.2 825.0 1.0X
Parquet Vectorized (Pushdown) 12655 / 13570 1.2 804.6 1.0X
Native ORC Vectorized 15562 / 15950 1.0 989.4 0.8X
Native ORC Vectorized (Pushdown) 15042 / 15668 1.0 956.3 0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 90% decimal(9, 2) rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 9587 / 10112 1.6 609.5 1.0X
Parquet Vectorized (Pushdown) 9726 / 10370 1.6 618.3 1.0X
Native ORC Vectorized 10119 / 11147 1.6 643.4 0.9X
Native ORC Vectorized (Pushdown) 9366 / 9497 1.7 595.5 1.0X
Parquet Vectorized 14303 / 14616 1.1 909.3 1.0X
Parquet Vectorized (Pushdown) 14380 / 14649 1.1 914.3 1.0X
Native ORC Vectorized 16964 / 17358 0.9 1078.5 0.8X
Native ORC Vectorized (Pushdown) 17255 / 17874 0.9 1097.0 0.8X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 decimal(18, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 4060 / 4093 3.9 258.1 1.0X
Parquet Vectorized (Pushdown) 4037 / 4125 3.9 256.6 1.0X
Native ORC Vectorized 4756 / 4811 3.3 302.4 0.9X
Native ORC Vectorized (Pushdown) 824 / 889 19.1 52.4 4.9X
Parquet Vectorized 4701 / 6416 3.3 298.9 1.0X
Parquet Vectorized (Pushdown) 128 / 164 122.8 8.1 36.7X
Native ORC Vectorized 5698 / 7904 2.8 362.3 0.8X
Native ORC Vectorized (Pushdown) 913 / 942 17.2 58.0 5.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 10% decimal(18, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 5157 / 5271 3.0 327.9 1.0X
Parquet Vectorized (Pushdown) 5051 / 5141 3.1 321.1 1.0X
Native ORC Vectorized 5723 / 6146 2.7 363.9 0.9X
Native ORC Vectorized (Pushdown) 2198 / 2317 7.2 139.8 2.3X
Parquet Vectorized 5376 / 5461 2.9 341.8 1.0X
Parquet Vectorized (Pushdown) 1479 / 1543 10.6 94.0 3.6X
Native ORC Vectorized 6640 / 6748 2.4 422.2 0.8X
Native ORC Vectorized (Pushdown) 2438 / 2479 6.5 155.0 2.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 50% decimal(18, 2) rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8608 / 8647 1.8 547.3 1.0X
Parquet Vectorized (Pushdown) 8471 / 8584 1.9 538.6 1.0X
Native ORC Vectorized 9249 / 10048 1.7 588.0 0.9X
Native ORC Vectorized (Pushdown) 7645 / 8091 2.1 486.1 1.1X
Parquet Vectorized 9224 / 9356 1.7 586.5 1.0X
Parquet Vectorized (Pushdown) 7172 / 7415 2.2 456.0 1.3X
Native ORC Vectorized 11017 / 11408 1.4 700.4 0.8X
Native ORC Vectorized (Pushdown) 8771 / 10218 1.8 557.7 1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 90% decimal(18, 2) rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 11658 / 11888 1.3 741.2 1.0X
Parquet Vectorized (Pushdown) 11812 / 12098 1.3 751.0 1.0X
Native ORC Vectorized 12943 / 13312 1.2 822.9 0.9X
Native ORC Vectorized (Pushdown) 13139 / 13465 1.2 835.4 0.9X
Parquet Vectorized 13933 / 15990 1.1 885.8 1.0X
Parquet Vectorized (Pushdown) 12683 / 12942 1.2 806.4 1.1X
Native ORC Vectorized 16344 / 20196 1.0 1039.1 0.9X
Native ORC Vectorized (Pushdown) 15162 / 16627 1.0 964.0 0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 decimal(38, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 5491 / 5716 2.9 349.1 1.0X
Parquet Vectorized (Pushdown) 5515 / 5615 2.9 350.6 1.0X
Native ORC Vectorized 4582 / 4654 3.4 291.3 1.2X
Native ORC Vectorized (Pushdown) 815 / 861 19.3 51.8 6.7X
Parquet Vectorized 7102 / 8282 2.2 451.5 1.0X
Parquet Vectorized (Pushdown) 124 / 150 126.4 7.9 57.1X
Native ORC Vectorized 5811 / 6883 2.7 369.5 1.2X
Native ORC Vectorized (Pushdown) 1121 / 1502 14.0 71.3 6.3X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 6432 / 6527 2.4 409.0 1.0X
Parquet Vectorized (Pushdown) 6513 / 6607 2.4 414.1 1.0X
Native ORC Vectorized 5618 / 6085 2.8 357.2 1.1X
Native ORC Vectorized (Pushdown) 2403 / 2443 6.5 152.8 2.7X
Parquet Vectorized 6894 / 7562 2.3 438.3 1.0X
Parquet Vectorized (Pushdown) 1863 / 1980 8.4 118.4 3.7X
Native ORC Vectorized 6812 / 6848 2.3 433.1 1.0X
Native ORC Vectorized (Pushdown) 2511 / 2598 6.3 159.7 2.7X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 50% decimal(38, 2) rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 11041 / 11467 1.4 701.9 1.0X
Parquet Vectorized (Pushdown) 10909 / 11484 1.4 693.5 1.0X
Native ORC Vectorized 9860 / 10436 1.6 626.9 1.1X
Native ORC Vectorized (Pushdown) 7908 / 8069 2.0 502.8 1.4X
Parquet Vectorized 11732 / 12183 1.3 745.9 1.0X
Parquet Vectorized (Pushdown) 8912 / 9945 1.8 566.6 1.3X
Native ORC Vectorized 11499 / 12387 1.4 731.1 1.0X
Native ORC Vectorized (Pushdown) 9328 / 9382 1.7 593.1 1.3X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 90% decimal(38, 2) rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 14816 / 16877 1.1 942.0 1.0X
Parquet Vectorized (Pushdown) 15383 / 15740 1.0 978.0 1.0X
Native ORC Vectorized 14408 / 14771 1.1 916.0 1.0X
Native ORC Vectorized (Pushdown) 13968 / 14805 1.1 888.1 1.1X
Parquet Vectorized 16272 / 16328 1.0 1034.6 1.0X
Parquet Vectorized (Pushdown) 15714 / 18100 1.0 999.1 1.0X
Native ORC Vectorized 16539 / 18897 1.0 1051.5 1.0X
Native ORC Vectorized (Pushdown) 16328 / 17306 1.0 1038.1 1.0X


================================================================================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,7 @@ class ParquetFileFormat
val returningBatch = supportBatch(sparkSession, resultSchema)
val pushDownDate = sqlConf.parquetFilterPushDownDate
val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp
val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal
val pushDownStringStartWith = sqlConf.parquetFilterPushDownStringStartWith
val pushDownInFilterThreshold = sqlConf.parquetFilterPushDownInFilterThreshold

Expand All @@ -367,7 +368,7 @@ class ParquetFileFormat
val pushed = if (enableParquetFilterPushDown) {
val parquetSchema = ParquetFileReader.readFooter(sharedConf, filePath, SKIP_ROW_GROUPS)
.getFileMetaData.getSchema
val parquetFilters = new ParquetFilters(pushDownDate, pushDownTimestamp,
val parquetFilters = new ParquetFilters(pushDownDate, pushDownTimestamp, pushDownDecimal,
pushDownStringStartWith, pushDownInFilterThreshold)
filters
// Collects all converted Parquet filter predicates. Notice that not all predicates can be
Expand Down
Loading