Skip to content
Closed
2 changes: 1 addition & 1 deletion dev/run-tests-jenkins.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def main():
# against master.
ghprb_pull_id = os.environ["ghprbPullId"]
ghprb_actual_commit = os.environ["ghprbActualCommit"]
ghprb_pull_title = os.environ["ghprbPullTitle"]
ghprb_pull_title = os.environ["ghprbPullTitle"].lower()
sha1 = os.environ["sha1"]

# Marks this build as a pull request build.
Expand Down
7 changes: 7 additions & 0 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,12 @@ def run_scala_tests(build_tool, hadoop_version, test_modules, excluded_tags):
if excluded_tags:
test_profiles += ['-Dtest.exclude.tags=' + ",".join(excluded_tags)]

# set up java11 env if this is a pull request build with 'test-java11' in the title
if "test-java11" in os.environ["ghprbPullTitle"].lower():
os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], os.environ["PATH"])
test_profiles += ['-Djava.version=11']
Copy link
Member

@HyukjinKwon HyukjinKwon Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to set this in python tests too? Seems like Java gateway has to use JDK 11 as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should use Java 11 if the path provides Java 11 and the test harness that runs Python tests does too. At least I don't know how else one would tell pyspark what to use!

In fact I'm pretty sure the test failure here shows that it is using JDK 11. From JPMML: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory This would be caused by JDK 11 changes. However, I don't get why all the other non-Python tests don't fail.

Given the weird problem in #24651 I am wondering if we have some subtle classpath issues with how the Pyspark tests are run.

This one however might be more directly solvable by figuring out what is suggesting to use this old Sun JAXB implementation. I'll start digging around META-INF

Copy link
Member

@srowen srowen Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, and why does https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/ pass then? it is doing the same thing in the Jenkins config. (OK I think I answered my own question below)

EDIT: Oh, because it doesn't run Pyspark tests?

Copy link
Member

@HyukjinKwon HyukjinKwon Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually you're right. Yes, seems after Scala tests here, the PATH and JAVA_HOME still set as are.

I thought:

SPARK_HOME = _find_spark_home()
# Launch the Py4j gateway using Spark's run command so that we pick up the
# proper classpath and settings from spark-env.sh
on_windows = platform.system() == "Windows"
script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
command = [os.path.join(SPARK_HOME, script)]
if conf:
for k, v in conf.getAll():
command += ['--conf', '%s=%s' % (k, v)]
submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
if os.environ.get("SPARK_TESTING"):
submit_args = ' '.join([
"--conf spark.ui.enabled=false",
submit_args
])
command = command + shlex.split(submit_args)

args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"

Here somehow happened to use JDK 8.

Actually the PySpark tests and SparkR tests passed at #25443 (comment)

So, the issue persists here .. but I guess yes we can do it separately since at least this PR seems setting JDK 11 correctly, and it virtually doesn't affect any main or test code (if this title is not used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting. Thank you for the investigation, @srowen and @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a JIRA issue for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need one, yeah, regardless of the cause. I'll file one to track.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if build_tool == "maven":
run_scala_tests_maven(test_profiles)
else:
Expand Down Expand Up @@ -565,6 +571,7 @@ def main():
changed_files = identify_changed_files_from_git_commits("HEAD", target_branch=target_branch)
changed_modules = determine_modules_for_files(changed_files)
excluded_tags = determine_tags_to_exclude(changed_modules)

if not changed_modules:
changed_modules = [modules.root]
excluded_tags = []
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ SERVICE_ACCOUNT=
CONTEXT=
INCLUDE_TAGS="k8s"
EXCLUDE_TAGS=
JAVA_VERSION="8"
MVN="$TEST_ROOT_DIR/build/mvn"

SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version 2>/dev/null\
Expand Down Expand Up @@ -99,6 +100,10 @@ while (( "$#" )); do
R_IMAGE_NAME="$2"
shift
;;
--java-version)
JAVA_VERSION="$2"
shift
;;
*)
break
;;
Expand All @@ -107,6 +112,7 @@ while (( "$#" )); do
done

properties=(
-Djava.version=$JAVA_VERSION \
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \
Expand Down