-
Notifications
You must be signed in to change notification settings - Fork 6
Comparing changes
Open a pull request
base repository: HyukjinKwon/spark
base: 6b76741
head repository: HyukjinKwon/spark
compare: 1431a4a
- 5 commits
- 36 files changed
- 5 contributors
Commits on Jan 13, 2022
-
[SPARK-37686][PYTHON][SQL] Use _invoke_function helpers for all pyspa…
…rk.sql.functions ### What changes were proposed in this pull request? This PR proposes conversion of functions not covered by SPARK-32084 to `_invoke_functions` style. Two new `_invoke` functions where added: - `_invoke_function_over_columns` - `_invoke_function_over_seq_of_columns` to address common examples. ### Why are the changes needed? To reduce boilerplate (especially related to type checking) and improve manageability. Additionally, it opens opportunity for reducing driver-side invocation latency. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#34951 from zero323/SPARK-37686. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: zero323 <mszymkiewicz@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 0e186e8 - Browse repository at this point
Copy the full SHA 0e186e8View commit details -
[SPARK-35703][SQL][FOLLOWUP] Only eliminate shuffles if partition key…
…s contain all the join keys ### What changes were proposed in this pull request? This is a followup of apache#32875 . Basically apache#32875 did two improvements: 1. allow bucket join even if the bucket hash function is different from Spark's shuffle hash function 2. allow bucket join even if the hash partition keys are subset of join keys. The first improvement is the major target for implementing the SPIP "storage partition join". The second improvement is kind of a consequence of the framework refactor, which is not planned. This PR is to disable the second improvement by default, which may introduce perf regression if there are data skew without shuffle. We need more designs to enable this improvement, like checking the ndv. ### Why are the changes needed? Avoid perf regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Closes apache#35138 from cloud-fan/join. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4b4ff4b - Browse repository at this point
Copy the full SHA 4b4ff4bView commit details -
[SPARK-37864][SQL] Support vectorized read boolean values use RLE enc…
…oding with Parquet DataPage V2 ### What changes were proposed in this pull request? Parquet v2 data page write Boolean Values use RLE encoding, when read v2 boolean type values it will throw exceptions as follows now: ```java Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305) ~[classes/:?] at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277) ~[classes/:?] at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344) ~[classes/:?] at ``` This PR extends the `readBooleans` and `skipBooleans` of `VectorizedRleValuesReader` to ensure that the above scenario can pass. ### Why are the changes needed? Support Parquet v2 data page RLE encoding for the vectorized read path ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test case Closes apache#35163 from LuciferYang/SPARK-37864. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Chao Sun <sunchao@apple.com>Configuration menu - View commit details
-
Copy full SHA for 9980555 - Browse repository at this point
Copy the full SHA 9980555View commit details -
[SPARK-37900][CORE] Use
SparkMasterRegex.KUBERNETES_REGEXin `Secur……ityManager` ### What changes were proposed in this pull request? This PR removes `SecurityManager.k8sRegex` and use `SparkMasterRegex.KUBERNETES_REGEX` in `SecurityManager`. ### Why are the changes needed? `SparkMasterRegex.KUBERNETES_REGEX` is more accurate and official than the existing `val k8sRegex = "k8s.*".r` pattern. https://github.com/apache/spark/blob/99805558fc80743747f32c7008cb7cc99c1cda01/core/src/main/scala/org/apache/spark/SparkContext.scala#L3063 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the existing test coverage. Closes apache#35195 from dongjoon-hyun/SPARK-37900. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Configuration menu - View commit details
-
Copy full SHA for f7dd37c - Browse repository at this point
Copy the full SHA f7dd37cView commit details -
[SPARK-37887][CORE] Fix the check of repl log level
### What changes were proposed in this pull request? This patch fixes the check of repl's log level. So we can correctly know if the repl class is set with log level or not. ### Why are the changes needed? Same as the check in `SparkShellLoggingFilter`, `getLevel` cannot be used anymore to check if the log level is set or not for a logger in log4j2. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual verified locally. Closes apache#35198 from viirya/SPARK-37887. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Configuration menu - View commit details
-
Copy full SHA for 1431a4a - Browse repository at this point
Copy the full SHA 1431a4aView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 6b76741...1431a4a