Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
96b5d50
[SPARK-40821][SQL][CORE][PYTHON][SS] Introduce window_time function t…
alex-balikov Oct 23, 2022
02a2242
[SPARK-40884][BUILD] Upgrade fabric8io - `kubernetes-client` to 6.2.0
bjornjorgensen Oct 24, 2022
5d3b1e6
[SPARK-40877][SQL] Reimplement `crosstab` with dataframe operations
zhengruifeng Oct 24, 2022
6a0713a
[SPARK-40880][SQL] Reimplement `summary` with dataframe operations
zhengruifeng Oct 24, 2022
79aae64
[SPARK-40849][SS] Async log purge
jerrypeng Oct 24, 2022
f7eee09
[SPARK-40880][SQL][FOLLOW-UP] Remove unused imports
zhengruifeng Oct 24, 2022
74c8264
[SPARK-40812][CONNECT][PYTHON][FOLLOW-UP] Improve Deduplicate in Pyth…
amaliujia Oct 24, 2022
4d33ee0
[SPARK-36114][SQL] Support subqueries with correlated non-equality pr…
allisonwang-db Oct 24, 2022
58490da
[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelatio…
allisonwang-db Oct 24, 2022
c721c72
[SPARK-40881][INFRA] Upgrade actions/cache to v3 and actions/upload-a…
Yikun Oct 24, 2022
825f219
[SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distributi…
Yikun Oct 24, 2022
9140795
[SPARK-40798][SQL] Alter partition should verify value follow storeAs…
ulysses-you Oct 24, 2022
b7a88cd
[SPARK-40821][SQL][SS][FOLLOWUP] Fix available version for new functi…
HeartSaVioR Oct 24, 2022
e966c38
[SPARK-34265][PYTHON][SQL] Instrument Python UDFs using SQL metrics
LucaCanali Oct 24, 2022
6edcafc
[SPARK-40891][SQL][TESTS] Check error classes in TableIdentifierParse…
panbingkun Oct 24, 2022
e2e449e
[SPARK-40897][DOCS] Add some PySpark APIs to References
zhengruifeng Oct 24, 2022
363b853
[SPARK-39977][BUILD] Remove unnecessary guava exclusion from jackson-…
pan3793 Oct 24, 2022
880d9bb
[SPARK-40739][SPARK-40738] Fixes for cygwin/msys2/mingw sbt build and…
philwalk Oct 24, 2022
05ad102
[SPARK-40391][SQL][TESTS][FOLLOWUP] Change to use `mockito-inline` in…
LuciferYang Oct 24, 2022
4ba7ce2
[SPARK-40857][CONNECT] Enable configurable GPRC Interceptors
grundprinzip Oct 24, 2022
9d2757c
[SPARK-40750][SQL] Migrate type check failures of math expressions on…
panbingkun Oct 24, 2022
60b1056
[SPARK-40902][MESOS][TESTS] Fix issue with mesos tests failing due to…
Oct 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-40739][SPARK-40738] Fixes for cygwin/msys2/mingw sbt build and…
… bash scripts

This fixes two problems that affect development in a Windows shell environment, such as `cygwin` or `msys2`.

### The fixed build error
Running `./build/sbt packageBin` from A Windows cygwin `bash` session fails.

This occurs if `WSL` is installed, because `project\SparkBuild.scala` creates a `bash` process, but `WSL bash` is called, even though `cygwin bash` appears earlier in the `PATH`.  In addition, file path arguments to bash contain backslashes.    The fix is to insure that the correct `bash` is called, and that arguments passed to `bash` are passed with slashes rather than **slashes.**

### The build error message:
```bash
 ./build.sbt packageBin
```
<pre>
[info] compiling 9 Java sources to C:\Users\philwalk\workspace\spark\common\sketch\target\scala-2.12\classes ...
/bin/bash: C:Usersphilwalkworkspacesparkcore/../build/spark-build-info: No such file or directory
[info] compiling 1 Scala source to C:\Users\philwalk\workspace\spark\tools\target\scala-2.12\classes ...
[info] compiling 5 Scala sources to C:\Users\philwalk\workspace\spark\mllib-local\target\scala-2.12\classes ...
[info] Compiling 5 protobuf files to C:\Users\philwalk\workspace\spark\connector\connect\target\scala-2.12\src_managed\main
[error] stack trace is suppressed; run last core / Compile / managedResources for the full output
[error] (core / Compile / managedResources) Nonzero exit value: 127
[error] Total time: 42 s, completed Oct 8, 2022, 4:49:12 PM
sbt:spark-parent>
sbt:spark-parent> last core /Compile /managedResources
last core /Compile /managedResources
[error] java.lang.RuntimeException: Nonzero exit value: 127
[error]         at scala.sys.package$.error(package.scala:30)
[error]         at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:138)
[error]         at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:108)
[error]         at Core$.$anonfun$settings$4(SparkBuild.scala:604)
[error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error]         at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error]         at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error]         at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error]         at sbt.Execute.work(Execute.scala:291)
[error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error]         at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error]         at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[error]         at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[error]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[error]         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[error]         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[error]         at java.base/java.lang.Thread.run(Thread.java:834)
[error] (core / Compile / managedResources) Nonzero exit value: 127
</pre>

### bash scripts fail when run from `cygwin` or `msys2`
The other problem fixed by the PR is to address problems preventing the `bash` scripts (`spark-shell`, `spark-submit`, etc.) from being used in Windows `SHELL` environments.   The problem is that the bash version of `spark-class` fails in a Windows shell environment, the result of `launcher/src/main/java/org/apache/spark/launcher/Main.java` not following the convention expected by `spark-class`, and also appending CR to line endings.  The resulting error message not helpful.

There are two parts to this fix:
1. modify `Main.java` to treat a `SHELL` session on Windows as a `bash` session
2. remove the appended CR character when parsing the output produced by `Main.java`

### Does this PR introduce _any_ user-facing change?

These changes should NOT affect anyone who is not trying build or run bash scripts from a Windows SHELL environment.

### How was this patch tested?
Manual tests were performed to verify both changes.

### related JIRA issues
The following 2 JIRA issue were created.  Both are fixed by this PR.  They are both linked to this PR.

- Bug SPARK-40739 "sbt packageBin" fails in cygwin or other windows bash session
- Bug SPARK-40738 spark-shell fails with "bad array"

Closes apache#38228 from philwalk/windows-shell-env-fixes.

Authored-by: Phil <philwalk9@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
  • Loading branch information
philwalk authored and srowen committed Oct 24, 2022
commit 880d9bb3fcb69001512886496f2988ed17cc4c50
3 changes: 2 additions & 1 deletion bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ set +o posix
CMD=()
DELIM=$'\n'
CMD_START_FLAG="false"
while IFS= read -d "$DELIM" -r ARG; do
while IFS= read -d "$DELIM" -r _ARG; do
ARG=${_ARG//$'\r'}
if [ "$CMD_START_FLAG" == "true" ]; then
CMD+=("$ARG")
else
Expand Down
2 changes: 2 additions & 0 deletions bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ rem SPARK-28302: %RANDOM% would return the same number if we call it instantly a
rem so we should make it sure to generate unique file to avoid process collision of writing into
rem the same file concurrently.
if exist %LAUNCHER_OUTPUT% goto :gen
rem unset SHELL to indicate non-bash environment to launcher/Main
set SHELL=
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
set SPARK_CMD=%%i
Expand Down
2 changes: 1 addition & 1 deletion build/spark-build-info
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

RESOURCE_DIR="$1"
mkdir -p "$RESOURCE_DIR"
SPARK_BUILD_INFO="${RESOURCE_DIR}"/spark-version-info.properties
SPARK_BUILD_INFO="${RESOURCE_DIR%/}"/spark-version-info.properties

echo_build_properties() {
echo version=$1
Expand Down
6 changes: 4 additions & 2 deletions launcher/src/main/java/org/apache/spark/launcher/Main.java
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,9 @@ public static void main(String[] argsArray) throws Exception {
cmd = buildCommand(builder, env, printLaunchCommand);
}

if (isWindows()) {
// test for shell environments, to enable non-Windows treatment of command line prep
boolean shellflag = !isEmpty(System.getenv("SHELL"));
if (isWindows() && !shellflag) {
System.out.println(prepareWindowsCommand(cmd, env));
} else {
// A sequence of NULL character and newline separates command-strings and others.
Expand All @@ -96,7 +98,7 @@ public static void main(String[] argsArray) throws Exception {
// In bash, use NULL as the arg separator since it cannot be used in an argument.
List<String> bashCmd = prepareBashCommand(cmd, env);
for (String c : bashCmd) {
System.out.print(c);
System.out.print(c.replaceFirst("\r$",""));
System.out.print('\0');
}
}
Expand Down
9 changes: 8 additions & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -599,11 +599,18 @@ object SparkParallelTestGrouping {

object Core {
import scala.sys.process.Process
def buildenv = Process(Seq("uname")).!!.trim.replaceFirst("[^A-Za-z0-9].*", "").toLowerCase
def bashpath = Process(Seq("where", "bash")).!!.split("[\r\n]+").head.replace('\\', '/')
lazy val settings = Seq(
(Compile / resourceGenerators) += Def.task {
val buildScript = baseDirectory.value + "/../build/spark-build-info"
val targetDir = baseDirectory.value + "/target/extra-resources/"
val command = Seq("bash", buildScript, targetDir, version.value)
// support Windows build under cygwin/mingw64, etc
val bash = buildenv match {
case "cygwin" | "msys2" | "mingw64" | "clang64" => bashpath
case _ => "bash"
}
val command = Seq(bash, buildScript, targetDir, version.value)
Process(command).!!
val propsFile = baseDirectory.value / "target" / "extra-resources" / "spark-version-info.properties"
Seq(propsFile)
Expand Down