[SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) #38167

philwalk · 2022-10-08T23:31:13Z

This PR is superceded by #38228

This fixes two problems that affect development in a Windows shell environment, such as cygwin or msys2.
Running ./build/sbt packageBin from A Windows cygwin bash session fails.

$ ./build.sbt packageBin

Details

[info] compiling 9 Java sources to C:\Users\philwalk\workspace\spark\common\sketch\target\scala-2.12\classes ...
/bin/bash: C:Usersphilwalkworkspacesparkcore/../build/spark-build-info: No such file or directory
[info] compiling 1 Scala source to C:\Users\philwalk\workspace\spark\tools\target\scala-2.12\classes ...
[info] compiling 5 Scala sources to C:\Users\philwalk\workspace\spark\mllib-local\target\scala-2.12\classes ...
[info] Compiling 5 protobuf files to C:\Users\philwalk\workspace\spark\connector\connect\target\scala-2.12\src_managed\main
[error] stack trace is suppressed; run last core / Compile / managedResources for the full output
[error] (core / Compile / managedResources) Nonzero exit value: 127
[error] Total time: 42 s, completed Oct 8, 2022, 4:49:12 PM
sbt:spark-parent>
sbt:spark-parent> last core /Compile /managedResources
last core /Compile /managedResources
[error] java.lang.RuntimeException: Nonzero exit value: 127
[error]         at scala.sys.package$.error(package.scala:30)
[error]         at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:138)
[error]         at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:108)
[error]         at Core$.$anonfun$settings$4(SparkBuild.scala:604)
[error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error]         at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error]         at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error]         at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error]         at sbt.Execute.work(Execute.scala:291)
[error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error]         at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error]         at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[error]         at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[error]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[error]         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[error]         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[error]         at java.base/java.lang.Thread.run(Thread.java:834)
[error] (core / Compile / managedResources) Nonzero exit value: 127

This occurs if WSL is installed, because project\SparkBuild.scala creates a bash process, but WSL bash is called, even though cygwin bash appears earlier in the PATH. In addition, file path arguments to bash contain backslashes. The fix is to insure that the correct bash is called, and that arguments passed to bash are passed with slashes rather than slashes.

The other problem fixed by the PR is to address problems preventing the bash scripts (spark-shell, spark-submit, etc.) from being used in Windows SHELL environments. The problem is that the bash version of spark-class fails in a Windows shell environment, the result of launcher/src/main/java/org/apache/spark/launcher/Main.java not following the convention expected by spark-class, and also appending CR to line endings. The resulting error message not helpful.

There are two parts to this fix:

modify Main.java to treat a SHELL session on Windows as a bash session
remove the appended CR character when parsing the output produced by Main.java

Does this PR introduce any user-facing change?

These changes should NOT affect anyone who is not trying build or run bash scripts from a Windows SHELL environment.

It might make sense to actively unset the SHELL variable inside of spark-class.cmd, to avoid this corner case.

How was this patch tested?

Manual tests were performed to verify both changes.

AmplabJenkins · 2022-10-10T01:45:42Z

Can one of the admins verify this patch?

HyukjinKwon · 2022-10-10T11:13:55Z

Thanks for the contribution. Would you mind checking https://github.com/apache/spark/pull/38167/checks?check_run_id=8783733198 and https://spark.apache.org/contributing.html? e.g., let's file a JIRA and link it to the PR title.

…umentation ### What changes were proposed in this pull request? This PR aims to supplement undocumented orc configurations in documentation. ### Why are the changes needed? Help users to confirm configurations through documentation instead of code. ### Does this PR introduce _any_ user-facing change? Yes, more configurations in documentations. ### How was this patch tested? Pass the GA. Closes #38188 from dcoliversun/SPARK-40726. Authored-by: Qian.Sun <[email protected]> Signed-off-by: Sean Owen <[email protected]>

… Row to JSON for Scala 2.13 ### What changes were proposed in this pull request? I encountered an issue using Spark while reading JSON files based on a schema it throws every time an exception related to conversion of types. >Note: This issue can be reproduced only with Scala `2.13`, I'm not having this issue with `2.12` ```` Failed to convert value ArraySeq(1, 2, 3) (class of class scala.collection.mutable.ArraySeq$ofRef}) with the type of ArrayType(StringType,true) to JSON. java.lang.IllegalArgumentException: Failed to convert value ArraySeq(1, 2, 3) (class of class scala.collection.mutable.ArraySeq$ofRef}) with the type of ArrayType(StringType,true) to JSON. ```` If I add ArraySeq to the matching cases, the test that I added passed successfully ![image](https://user-images.githubusercontent.com/28459763/194669557-2f13032f-126f-4c2e-bc6d-1a4cfd0a009d.png) With the current code source, the test fails and we have this following error ![image](https://user-images.githubusercontent.com/28459763/194669654-19cefb13-180c-48ac-9206-69d8f672f64c.png) ### Why are the changes needed? If the person is using Scala 2.13, they can't parse an array. Which means they need to fallback to 2.12 to keep the project functioning ### How was this patch tested? I added a sample unit test for the case, but I can add more if you want to. Closes #38154 from Amraneze/fix/spark_40705. Authored-by: Ait Zeouay Amrane <[email protected]> Signed-off-by: Sean Owen <[email protected]>

philwalk · 2022-10-10T15:30:41Z

Would you mind checking https://github.com/apache/spark/pull/38167/checks?check_run_id=8783733198

Two suggestions are provided:

Enable Github Actions:
My fork appears to be configured to allow actions, although I'm not sure. Here's what I see:

Actions permissions

Any action or reusable workflow can be used, regardless of who authored it or where it is defined.

Workflow permissions

Workflows have read and write permissions in the repository for all scopes.

Allow Github Actions to create and approve pull requests

The second suggestion is this:

git fetch upstream
git rebase upstream/master
git push origin YOUR_BRANCH --force

I just did so, although it didn't fix the problem.

UPDATE: I found the screen for enabling workflows, so we should be okay to re-run the failed check now.

philwalk · 2022-10-10T15:32:03Z

https://spark.apache.org/contributing.html? e.g., let's file a JIRA and link it to the PR title.

I'm looking into it now on the JIRA website.

### What changes were proposed in this pull request? In the PR, I propose to remove `PartitionAlreadyExistsException` and use `PartitionsAlreadyExistException` instead of it. ### Why are the changes needed? 1. To simplify user apps. After the changes, users don't need to catch both exceptions `PartitionsAlreadyExistException` as well as `PartitionAlreadyExistsException `. 2. To improve code maintenance since don't need to support almost the same code. 3. To avoid errors like the PR #38152 fixed `PartitionsAlreadyExistException` but not `PartitionAlreadyExistsException`. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the affected test suites: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *SupportsPartitionManagementSuite" $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite" ``` Closes #38161 from MaxGekk/remove-PartitionAlreadyExistsException. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

philwalk · 2022-10-10T20:37:37Z

The following 2 JIRA issue were created. Both are fixed by this PR. They are both linked to this PR.

Bug SPARK-40739 "sbt packageBin" fails in cygwin or other windows bash session
Bug SPARK-40738 spark-shell fails with "bad array

…n types ### What changes were proposed in this pull request? 1. Extend the support for Join with different join types. Before this PR, all joins are hardcoded `inner` type. So this PR supports other join types. 2. Add join to connect DSL. 3. Update a few Join proto fields to better reflect the semantic. ### Why are the changes needed? Extend the support for Join in connect. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38157 from amaliujia/SPARK-40534. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ndency for Spark Connect ### What changes were proposed in this pull request? `mypy-protobuf` is only needed when the connect proto is changed and then to use [generate_protos.sh](https://github.com/apache/spark/blob/master/connector/connect/dev/generate_protos.sh) to update python side generated proto files. We should mark this dependency as optional for people who do not care. ### Why are the changes needed? `mypy-protobuf` can be optional dependency for people who do not touch connect proto files. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #38195 from amaliujia/dev_requirements. Authored-by: Rui Wang <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…ecutorDecommissionInfo ### What changes were proposed in this pull request? This change populates `ExecutorDecommission` with messages in `ExecutorDecommissionInfo`. ### Why are the changes needed? Currently the message in `ExecutorDecommission` is a fixed value ("Executor decommission."), so it is the same for all cases, e.g. spot instance interruptions and auto-scaling down. With this change we can better differentiate those cases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes #38030 from bozhang2820/spark-40596. Authored-by: Bo Zhang <[email protected]> Signed-off-by: Yi Wu <[email protected]>

…ules ### What changes were proposed in this pull request? This main change of this pr is refactor shade relocation/rename rules refer to result of `mvn dependency:tree -pl connector/connect` to ensure that maven and sbt produce assembly jar according to the same rules. The main parts of `mvn dependency:tree -pl connector/connect` result as follows: ``` [INFO] +- com.google.guava:guava:jar:31.0.1-jre:compile [INFO] | +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile [INFO] | +- org.checkerframework:checker-qual:jar:3.12.0:compile [INFO] | +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile [INFO] | \- com.google.j2objc:j2objc-annotations:jar:1.3:compile [INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile [INFO] +- com.google.protobuf:protobuf-java:jar:3.21.1:compile [INFO] +- io.grpc:grpc-netty:jar:1.47.0:compile [INFO] | +- io.grpc:grpc-core:jar:1.47.0:compile [INFO] | | +- com.google.code.gson:gson:jar:2.9.0:runtime [INFO] | | +- com.google.android:annotations:jar:4.1.1.4:runtime [INFO] | | \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.19:runtime [INFO] | +- io.netty:netty-codec-http2:jar:4.1.72.Final:compile [INFO] | | \- io.netty:netty-codec-http:jar:4.1.72.Final:compile [INFO] | +- io.netty:netty-handler-proxy:jar:4.1.72.Final:runtime [INFO] | | \- io.netty:netty-codec-socks:jar:4.1.72.Final:runtime [INFO] | +- io.perfmark:perfmark-api:jar:0.25.0:runtime [INFO] | \- io.netty:netty-transport-native-unix-common:jar:4.1.72.Final:runtime [INFO] +- io.grpc:grpc-protobuf:jar:1.47.0:compile [INFO] | +- io.grpc:grpc-api:jar:1.47.0:compile [INFO] | | \- io.grpc:grpc-context:jar:1.47.0:compile [INFO] | +- com.google.api.grpc:proto-google-common-protos:jar:2.0.1:compile [INFO] | \- io.grpc:grpc-protobuf-lite:jar:1.47.0:compile [INFO] +- io.grpc:grpc-services:jar:1.47.0:compile [INFO] | \- com.google.protobuf:protobuf-java-util:jar:3.19.2:runtime [INFO] +- io.grpc:grpc-stub:jar:1.47.0:compile [INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile ``` The new shade rule excludes the following jar packages: - scala related jars - netty related jars - only sbt inlcude jars before: pmml-model-*.jar, findbugs jsr305-*.jar, spark unused-1.0.0.jar So after this pr maven shade will includes the following jars: ``` [INFO] --- maven-shade-plugin:3.2.4:shade (default) spark-connect_2.12 --- [INFO] Including com.google.guava:guava:jar:31.0.1-jre in the shaded jar. [INFO] Including com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava in the shaded jar. [INFO] Including org.checkerframework:checker-qual:jar:3.12.0 in the shaded jar. [INFO] Including com.google.errorprone:error_prone_annotations:jar:2.7.1 in the shaded jar. [INFO] Including com.google.j2objc:j2objc-annotations:jar:1.3 in the shaded jar. [INFO] Including com.google.guava:failureaccess:jar:1.0.1 in the shaded jar. [INFO] Including com.google.protobuf:protobuf-java:jar:3.21.1 in the shaded jar. [INFO] Including io.grpc:grpc-netty:jar:1.47.0 in the shaded jar. [INFO] Including io.grpc:grpc-core:jar:1.47.0 in the shaded jar. [INFO] Including com.google.code.gson:gson:jar:2.9.0 in the shaded jar. [INFO] Including com.google.android:annotations:jar:4.1.1.4 in the shaded jar. [INFO] Including org.codehaus.mojo:animal-sniffer-annotations:jar:1.19 in the shaded jar. [INFO] Including io.perfmark:perfmark-api:jar:0.25.0 in the shaded jar. [INFO] Including io.grpc:grpc-protobuf:jar:1.47.0 in the shaded jar. [INFO] Including io.grpc:grpc-api:jar:1.47.0 in the shaded jar. [INFO] Including io.grpc:grpc-context:jar:1.47.0 in the shaded jar. [INFO] Including com.google.api.grpc:proto-google-common-protos:jar:2.0.1 in the shaded jar. [INFO] Including io.grpc:grpc-protobuf-lite:jar:1.47.0 in the shaded jar. [INFO] Including io.grpc:grpc-services:jar:1.47.0 in the shaded jar. [INFO] Including com.google.protobuf:protobuf-java-util:jar:3.19.2 in the shaded jar. [INFO] Including io.grpc:grpc-stub:jar:1.47.0 in the shaded jar. ``` sbt assembly will include the following jars: ``` [debug] Including from cache: j2objc-annotations-1.3.jar [debug] Including from cache: guava-31.0.1-jre.jar [debug] Including from cache: protobuf-java-3.21.1.jar [debug] Including from cache: grpc-services-1.47.0.jar [debug] Including from cache: failureaccess-1.0.1.jar [debug] Including from cache: grpc-stub-1.47.0.jar [debug] Including from cache: perfmark-api-0.25.0.jar [debug] Including from cache: annotations-4.1.1.4.jar [debug] Including from cache: listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar [debug] Including from cache: animal-sniffer-annotations-1.19.jar [debug] Including from cache: checker-qual-3.12.0.jar [debug] Including from cache: grpc-netty-1.47.0.jar [debug] Including from cache: grpc-api-1.47.0.jar [debug] Including from cache: grpc-protobuf-lite-1.47.0.jar [debug] Including from cache: grpc-protobuf-1.47.0.jar [debug] Including from cache: grpc-context-1.47.0.jar [debug] Including from cache: grpc-core-1.47.0.jar [debug] Including from cache: protobuf-java-util-3.19.2.jar [debug] Including from cache: error_prone_annotations-2.10.0.jar [debug] Including from cache: gson-2.9.0.jar [debug] Including from cache: proto-google-common-protos-2.0.1.jar ``` All the dependencies mentioned above are relocationed to the `org.sparkproject.connect` package according to the new rules to avoid conflicts with other third-party dependencies. ### Why are the changes needed? Refactor shade relocation/rename rules to ensure that maven and sbt produce assembly jar according to the same rules. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #38162 from LuciferYang/SPARK-40677-FOLLOWUP. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…l inputs ### What changes were proposed in this pull request? add a dedicated expression for `product`: 1. for integral inputs, directly use `LongType` to avoid the rounding error: 2. when `ignoreNA` is true, skip following values when meet a `zero`; 3. when `ignoreNA` is false, skip following values when meet a `zero` or `null`; ### Why are the changes needed? 1. existing computation logic is too complex in the PySpark side, with a dedicated expression, we can simplify the PySpark side and apply it in more cases. 2. existing computation of `product` is likely to introduce rounding error for integral inputs, for example `55108 x 55108 x 55108 x 55108` in the following case: before: ``` In [14]: df = pd.DataFrame({"a": [55108, 55108, 55108, 55108], "b": [55108.0, 55108.0, 55108.0, 55108.0], "c": [1, 2, 3, 4]}) In [15]: df.a.prod() Out[15]: 9222710978872688896 In [16]: type(df.a.prod()) Out[16]: numpy.int64 In [17]: df.b.prod() Out[17]: 9.222710978872689e+18 In [18]: type(df.b.prod()) Out[18]: numpy.float64 In [19]: In [19]: psdf = ps.from_pandas(df) In [20]: psdf.a.prod() Out[20]: 9222710978872658944 In [21]: type(psdf.a.prod()) Out[21]: int In [22]: psdf.b.prod() Out[22]: 9.222710978872659e+18 In [23]: type(psdf.b.prod()) Out[23]: float In [24]: df.a.prod() - psdf.a.prod() Out[24]: 29952 ``` after: ``` In [1]: import pyspark.pandas as ps In [2]: import pandas as pd In [3]: df = pd.DataFrame({"a": [55108, 55108, 55108, 55108], "b": [55108.0, 55108.0, 55108.0, 55108.0], "c": [1, 2, 3, 4]}) In [4]: df.a.prod() Out[4]: 9222710978872688896 In [5]: psdf = ps.from_pandas(df) In [6]: psdf.a.prod() Out[6]: 9222710978872688896 In [7]: df.a.prod() - psdf.a.prod() Out[7]: 0 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT & added UT Closes #38148 from zhengruifeng/ps_new_prod. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

HyukjinKwon · 2022-10-11T03:20:06Z

(@philwalk rebasing it would retrigger the Github Actions jobs)

…one grouping expressions ### What changes were proposed in this pull request? 1. Add `groupby` to connect DSL and test more than one grouping expressions 2. Pass limited data types through connect proto for LocalRelation's attributes. 3. Cleanup unused `Trait` in the testing code. ### Why are the changes needed? Enhance connect's support for GROUP BY. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38155 from amaliujia/support_more_than_one_grouping_set. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…classes ### What changes were proposed in this pull request? In the PR, I propose to use error classes in the case of type check failure in collection expressions. ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? ``` build/sbt "sql/testOnly *SQLQueryTestSuite" build/sbt "test:testOnly org.apache.spark.SparkThrowableSuite" build/sbt "test:testOnly *ExpressionTypeCheckingSuite" build/sbt "test:testOnly *DataFrameFunctionsSuite" build/sbt "test:testOnly *DataFrameAggregateSuite" build/sbt "test:testOnly *AnalysisErrorSuite" build/sbt "test:testOnly *CollectionExpressionsSuite" build/sbt "test:testOnly *ComplexTypeSuite" build/sbt "test:testOnly *HigherOrderFunctionsSuite" build/sbt "test:testOnly *PredicateSuite" build/sbt "test:testOnly *TypeUtilsSuite" ``` Closes #38197 from lvshaokang/SPARK-40358. Authored-by: lvshaokang <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? This PR cleans up the logic of `listFunctions`. Currently `listFunctions` gets all external functions and registered functions (built-in, temporary, and persistent functions with a specific database name). It is not necessary to get persistent functions that match a specific database name again since`externalCatalog.listFunctions` already fetched them. We only need to list all built-in and temporary functions from the function registries. ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests. Closes #38194 from allisonwang-db/spark-40740-list-functions. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Code refactor on all File data source options: - `TextOptions` - `CSVOptions` - `JSONOptions` - `AvroOptions` - `ParquetOptions` - `OrcOptions` - `FileIndex` related options Change semantics: - First, we introduce a new trait `DataSourceOptions`, which defines the following functions: - `newOption(name)`: Register a new option - `newOption(name, alternative)`: Register a new option with alternative - `getAllValidOptions`: retrieve all valid options - `isValidOption(name)`: validate a given option name - `getAlternativeOption(name)`: get alternative option name if any - Then, for each class above - Create/update its companion object to extend from the trait above and register all valid options within it. - Update places where name strings are used directly to fetch option values to use those option constants instead. - Add a unit test for each file data source options ### Why are the changes needed? Currently for each file data source, all options are placed sparsely in the options class and there is no clear list of all options supported. As more and more options are added, the readability get worse. Thus, we want to refactor those codes so that - we can easily get a list of supported options for each data source - enforce better practice for adding new options going forwards. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Closes #38113 from xiaonanyang-db/SPARK-40667. Authored-by: xiaonanyang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Support Column Alias in the Connect DSL (thus in Connect proto). ### Why are the changes needed? Column alias is a part of dataframe API , meanwhile we need column alias to support `withColumn` etc. API. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38174 from amaliujia/alias. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…t `min_count` ### What changes were proposed in this pull request? Make `_reduce_for_stat_function` in `groupby` accept `min_count` ### Why are the changes needed? to simplify the implementations ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UTs Closes #38201 from zhengruifeng/ps_groupby_mc. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…eric type ### What changes were proposed in this pull request? This pr aims to fix following Java compilation warnings related to generic type: ``` 2022-10-08T01:43:33.6487078Z /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: warning: [rawtypes] found raw type: HashMap 2022-10-08T01:43:33.6487456Z return new HashMap(); 2022-10-08T01:43:33.6487682Z ^ 2022-10-08T01:43:33.6487957Z missing type arguments for generic class HashMap<K,V> 2022-10-08T01:43:33.6488617Z where K,V are type-variables: 2022-10-08T01:43:33.6488911Z K extends Object declared in class HashMap 2022-10-08T01:43:33.6489211Z V extends Object declared in class HashMap 2022-10-08T01:50:21.5951932Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: warning: [rawtypes] found raw type: Map 2022-10-08T01:50:21.5999993Z createPartitions(new InternalRow[]{ident}, new Map[]{properties}); 2022-10-08T01:50:21.6000343Z ^ 2022-10-08T01:50:21.6000642Z missing type arguments for generic class Map<K,V> 2022-10-08T01:50:21.6001272Z where K,V are type-variables: 2022-10-08T01:50:21.6001569Z K extends Object declared in interface Map 2022-10-08T01:50:21.6002109Z V extends Object declared in interface Map 2022-10-08T01:50:21.6006655Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: warning: [rawtypes] found raw type: Literal 2022-10-08T01:50:21.6007121Z protected String visitLiteral(Literal literal) { 2022-10-08T01:50:21.6007395Z ^ 2022-10-08T01:50:21.6007673Z missing type arguments for generic class Literal<T> 2022-10-08T01:50:21.6008032Z where T is a type-variable: 2022-10-08T01:50:21.6008324Z T extends Object declared in interface Literal 2022-10-08T01:50:21.6008785Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: warning: [rawtypes] found raw type: Comparable 2022-10-08T01:50:21.6009223Z public static class Coord implements Comparable { 2022-10-08T01:50:21.6009503Z ^ 2022-10-08T01:50:21.6009791Z missing type arguments for generic class Comparable<T> 2022-10-08T01:50:21.6010137Z where T is a type-variable: 2022-10-08T01:50:21.6010433Z T extends Object declared in interface Comparable 2022-10-08T01:50:21.6010976Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: warning: [unchecked] unchecked method invocation: method sort in class Collections is applied to given types 2022-10-08T01:50:21.6011474Z Collections.sort(tmp_bins); 2022-10-08T01:50:21.6011714Z ^ 2022-10-08T01:50:21.6012050Z required: List<T> 2022-10-08T01:50:21.6012296Z found: ArrayList<Coord> 2022-10-08T01:50:21.6012604Z where T is a type-variable: 2022-10-08T01:50:21.6012926Z T extends Comparable<? super T> declared in method <T>sort(List<T>) 2022-10-08T02:13:38.0769617Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: warning: [rawtypes] found raw type: AbstractWriterAppender 2022-10-08T02:13:38.0770287Z AbstractWriterAppender ap = new LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode)); 2022-10-08T02:13:38.0770645Z ^ 2022-10-08T02:13:38.0770947Z missing type arguments for generic class AbstractWriterAppender<M> 2022-10-08T02:13:38.0771330Z where M is a type-variable: 2022-10-08T02:13:38.0771665Z M extends WriterManager declared in class AbstractWriterAppender 2022-10-08T02:13:38.0774487Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: warning: [rawtypes] found raw type: Layout 2022-10-08T02:13:38.0774940Z Layout l = ap.getLayout(); 2022-10-08T02:13:38.0775173Z ^ 2022-10-08T02:13:38.0775441Z missing type arguments for generic class Layout<T> 2022-10-08T02:13:38.0775849Z where T is a type-variable: 2022-10-08T02:13:38.0776359Z T extends Serializable declared in interface Layout 2022-10-08T02:19:55.0035795Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:17: [rawtypes] found raw type: SparkAvroKeyRecordWriter 2022-10-08T02:19:55.0037287Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:13: [unchecked] unchecked call to SparkAvroKeyRecordWriter(Schema,GenericData,CodecFactory,OutputStream,int,Map<String,String>) as a member of the raw type SparkAvroKeyRecordWriter 2022-10-08T02:19:55.0038442Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:31: [rawtypes] found raw type: DataFileWriter 2022-10-08T02:19:55.0039370Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:27: [unchecked] unchecked call to DataFileWriter(DatumWriter<D>) as a member of the raw type DataFileWriter ``` ### Why are the changes needed? Fix Java compilation warnings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions. Closes #38198 from LuciferYang/fix-java-warn. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…pts to make code more portable ### What changes were proposed in this pull request? Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable ### Why are the changes needed? some bash still use #!/bin/bash ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? no need test Closes #38191 from huangxiaopingRD/script. Authored-by: huangxiaoping <[email protected]> Signed-off-by: Sean Owen <[email protected]>

…CY_ERROR_TEMP_2076-2100 ### What changes were proposed in this pull request? This PR proposes to migrate 25 execution errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_2076` to `_LEGACY_ERROR_TEMP_2100`. The error classes are prefixed with `_LEGACY_ERROR_TEMP_` indicates the dev-facing error messages, and won't be exposed to end users. ### Why are the changes needed? To speed-up the error class migration. The migration on temporary error classes allow us to analyze the errors, so we can detect the most popular error classes. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ``` $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt "test:testOnly *SQLQuerySuite" ``` Closes #38122 from itholic/SPARK-40540-2076-2100. Lead-authored-by: itholic <[email protected]> Co-authored-by: Haejoon Lee <[email protected]> Signed-off-by: Max Gekk <[email protected]>

philwalk · 2022-10-11T14:44:10Z

(@philwalk rebasing it would retrigger the Github Actions jobs)

I did the following, hope it was correct:

git fetch upstream
git rebase upstream/master
git pull
git commit -m 'rebase to trigger build'
git push

srowen · 2022-10-12T14:33:48Z

I think this is messed up now, not sure how as your approach seems OK (though you would have had to force push)

philwalk · 2022-10-12T20:10:14Z

I will delete the fork and recreate the changes, it seems the simplest fix to me.
The new PR is #38228

github-actions bot added the BUILD label Oct 8, 2022

philwalk added 2 commits October 10, 2022 08:39

fix problems that affect windows shell environments (cygwin/msys2/mingw)

456d2f3

unset SHELL variable in spark-class.cmd

89adbbd

github-actions bot added the WINDOWS label Oct 10, 2022

dcoliversun and others added 2 commits October 10, 2022 09:59

amaliujia and others added 5 commits October 11, 2022 09:35

HyukjinKwon changed the title ~~fix problems that affect windows shell environments (cygwin/msys2/mingw)~~ [SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) Oct 11, 2022

amaliujia and others added 11 commits October 11, 2022 12:35

fix problems that affect windows shell environments (cygwin/msys2/mingw)

c6a9569

unset SHELL variable in spark-class.cmd

c99e267

rebase to trigger build

1c448d6

github-actions bot added AVRO CONNECT CORE DOCS KUBERNETES PANDAS API ON SPARK PYTHON R SQL STRUCTURED STREAMING labels Oct 11, 2022

philwalk closed this by deleting the head repository Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) #38167

[SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) #38167

Uh oh!

philwalk commented Oct 8, 2022 •

edited

Loading

Uh oh!

AmplabJenkins commented Oct 10, 2022

Uh oh!

HyukjinKwon commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

HyukjinKwon commented Oct 11, 2022

Uh oh!

philwalk commented Oct 11, 2022

Uh oh!

srowen commented Oct 12, 2022

Uh oh!

philwalk commented Oct 12, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

[SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) #38167

[SPARK-40739][SPARK-40738] Fix problems that affect windows shell environments (cygwin/msys2/mingw) #38167

Uh oh!

Conversation

philwalk commented Oct 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Oct 10, 2022

Uh oh!

HyukjinKwon commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

philwalk commented Oct 10, 2022

Uh oh!

HyukjinKwon commented Oct 11, 2022

Uh oh!

philwalk commented Oct 11, 2022

Uh oh!

srowen commented Oct 12, 2022

Uh oh!

philwalk commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

philwalk commented Oct 8, 2022 •

edited

Loading

philwalk commented Oct 12, 2022 •

edited

Loading