Comparing changes

…rk.sql.functions ### What changes were proposed in this pull request? This PR proposes conversion of functions not covered by SPARK-32084 to `_invoke_functions` style. Two new `_invoke` functions where added: - `_invoke_function_over_columns` - `_invoke_function_over_seq_of_columns` to address common examples. ### Why are the changes needed? To reduce boilerplate (especially related to type checking) and improve manageability. Additionally, it opens opportunity for reducing driver-side invocation latency. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#34951 from zero323/SPARK-37686. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: zero323 <mszymkiewicz@gmail.com>

…s contain all the join keys ### What changes were proposed in this pull request? This is a followup of apache#32875 . Basically apache#32875 did two improvements: 1. allow bucket join even if the bucket hash function is different from Spark's shuffle hash function 2. allow bucket join even if the hash partition keys are subset of join keys. The first improvement is the major target for implementing the SPIP "storage partition join". The second improvement is kind of a consequence of the framework refactor, which is not planned. This PR is to disable the second improvement by default, which may introduce perf regression if there are data skew without shuffle. We need more designs to enable this improvement, like checking the ndv. ### Why are the changes needed? Avoid perf regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Closes apache#35138 from cloud-fan/join. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…oding with Parquet DataPage V2 ### What changes were proposed in this pull request? Parquet v2 data page write Boolean Values use RLE encoding, when read v2 boolean type values it will throw exceptions as follows now: ```java Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305) ~[classes/:?] at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277) ~[classes/:?] at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344) ~[classes/:?] at ``` This PR extends the `readBooleans` and `skipBooleans` of `VectorizedRleValuesReader` to ensure that the above scenario can pass. ### Why are the changes needed? Support Parquet v2 data page RLE encoding for the vectorized read path ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test case Closes apache#35163 from LuciferYang/SPARK-37864. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Chao Sun <sunchao@apple.com>

…ityManager` ### What changes were proposed in this pull request? This PR removes `SecurityManager.k8sRegex` and use `SparkMasterRegex.KUBERNETES_REGEX` in `SecurityManager`. ### Why are the changes needed? `SparkMasterRegex.KUBERNETES_REGEX` is more accurate and official than the existing `val k8sRegex = "k8s.*".r` pattern. https://github.com/apache/spark/blob/99805558fc80743747f32c7008cb7cc99c1cda01/core/src/main/scala/org/apache/spark/SparkContext.scala#L3063 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the existing test coverage. Closes apache#35195 from dongjoon-hyun/SPARK-37900. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This patch fixes the check of repl's log level. So we can correctly know if the repl class is set with log level or not. ### Why are the changes needed? Same as the check in `SparkShellLoggingFilter`, `getLevel` cannot be used anymore to check if the log level is set or not for a logger in log4j2. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual verified locally. Closes apache#35198 from viirya/SPARK-37887. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Jan 13, 2022

This comparison is taking too long to generate.

Uh oh!