[SPARK-23010][K8S] Initial checkin of k8s integration tests.

These tests were developed in the https://github.com/apache-spark-on-k8s/spark-integration repo by several contributors. This is a copy of the current state into the main apache spark repo. The only changes from the current spark-integration repo state are: * Move the files from the repo root into resource-managers/kubernetes/integration-tests * Add a reference to these tests in the root README.md * Fix a path reference in dev/dev-run-integration-tests.sh * Add a TODO in include/util.sh ## What changes were proposed in this pull request? Incorporation of Kubernetes integration tests. ## How was this patch tested? This code has its own unit tests, but the main purpose is to provide the integration tests. I tested this on my laptop by running dev/dev-run-integration-tests.sh --spark-tgz ~/spark-2.4.0-SNAPSHOT-bin--.tgz The spark-integration tests have already been running for months in AMPLab, here is an example: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-scheduled-spark-integration-master/ Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Sean Suchter <[email protected]> Author: Sean Suchter <[email protected]> Closes #20697 from ssuchter/ssuchter-k8s-integration-tests.
apache · hehuiyuan · Jun 2, 2018 · Jun 3, 2018 · Jun 4, 2018 · Jun 4, 2018
commit f433ef786770e48e3594ad158ce9908f98ef0d9a
diff --git a/README.md b/README.md
@@ -81,6 +81,8 @@ can be run using:
 Please see the guidance on how to
 [run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
 
+There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
+
 ## A Note About Hadoop Versions
 
 Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported

diff --git a/dev/tox.ini b/dev/tox.ini
@@ -16,4 +16,4 @@
 [pycodestyle]
 ignore=E402,E731,E241,W503,E226,E722,E741,E305
 max-line-length=100
-exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*
+exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*
diff --git a/pom.xml b/pom.xml
@@ -2705,6 +2705,7 @@
       <id>kubernetes</id>
       <modules>
         <module>resource-managers/kubernetes/core</module>
+        <module>resource-managers/kubernetes/integration-tests</module>
       </modules>
     </profile>
 

diff --git a/resource-managers/kubernetes/integration-tests/README.md b/resource-managers/kubernetes/integration-tests/README.md
@@ -0,0 +1,52 @@
+---
+layout: global
+title: Spark on Kubernetes Integration Tests
+---
+
+# Running the Kubernetes Integration Tests
+
+Note that the integration test framework is currently being heavily revised and
+is subject to change. Note that currently the integration tests only run with Java 8.
+
+The simplest way to run the integration tests is to install and run Minikube, then run the following:
+
+    dev/dev-run-integration-tests.sh
+
+The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
+run with a minimum of 3 CPUs and 4G of memory:
+
+    minikube start --cpus 3 --memory 4096
+
+You can download Minikube [here](https://github.com/kubernetes/minikube/releases).
+
+# Integration test customization
+
+Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below.
+
+## Re-using Docker Images
+
+By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
+and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker image tag
+that you have built by other means already, pass the tag to the test script:
+
+    dev/dev-run-integration-tests.sh --image-tag <tag>
+
+where if you still want to use images that were built before by the test framework:
+
+    dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)
+
+## Spark Distribution Under Test
+
+The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to specify the tarball:
+
+* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.
+
+TODO: Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree.
+
+## Customizing the Namespace and Service Account
+
+* `--namespace <namespace>` - set `<namespace>` to the namespace in which the tests should be run.
+* `--service-account <service account name>` - set `<service account name>` to the name of the Kubernetes service account to
+use in the namespace specified by the `--namespace`. The service account is expected to have permissions to get, list, watch,
+and create pods. For clusters with RBAC turned on, it's important that the right permissions are granted to the service account
+in the namespace through an appropriate role and role binding. A reference RBAC configuration is provided in `dev/spark-rbac.yaml`.
diff --git a/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh b/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)/resource-managers/kubernetes/integration-tests
+
+cd "${TEST_ROOT_DIR}"
+
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+SPARK_TGZ="N/A"
+IMAGE_TAG="N/A"
+SPARK_MASTER=
+NAMESPACE=
+SERVICE_ACCOUNT=
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+    --image-repo)
+      IMAGE_REPO="$2"
+      shift
+      ;;
+    --image-tag)
+      IMAGE_TAG="$2"
+      shift
+      ;;
+    --deploy-mode)
+      DEPLOY_MODE="$2"
+      shift
+      ;;
+    --spark-tgz)
+      SPARK_TGZ="$2"
+      shift
+      ;;
+    --spark-master)
+      SPARK_MASTER="$2"
+      shift
+      ;;
+    --namespace)
+      NAMESPACE="$2"
+      shift
+      ;;
+    --service-account)
+      SERVICE_ACCOUNT="$2"
+      shift
+      ;;
+    *)
+      break
+      ;;
+  esac
+  shift
+done
+
+cd $TEST_ROOT_DIR
+
+properties=(
+  -Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \
+  -Dspark.kubernetes.test.imageTag=$IMAGE_TAG \
+  -Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \
+  -Dspark.kubernetes.test.deployMode=$DEPLOY_MODE
+)
+
+if [ -n $NAMESPACE ];
+then
+  properties=( ${properties[@]} -Dspark.kubernetes.test.namespace=$NAMESPACE )
+fi
+
+if [ -n $SERVICE_ACCOUNT ];
+then
+  properties=( ${properties[@]} -Dspark.kubernetes.test.serviceAccountName=$SERVICE_ACCOUNT )
+fi
+
+if [ -n $SPARK_MASTER ];
+then
+  properties=( ${properties[@]} -Dspark.kubernetes.test.master=$SPARK_MASTER )
+fi
+
+../../../build/mvn integration-test ${properties[@]}
diff --git a/resource-managers/kubernetes/integration-tests/dev/spark-rbac.yaml b/resource-managers/kubernetes/integration-tests/dev/spark-rbac.yaml
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: spark
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: spark-sa
+  namespace: spark
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: ClusterRole
+metadata:
+  name: spark-role
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - "pods"
+  verbs:
+  - "*"
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: ClusterRoleBinding
+metadata:
+  name: spark-role-binding
+subjects:
+- kind: ServiceAccount
+  name: spark-sa
+  namespace: spark
+roleRef:
+  kind: ClusterRole
+  name: spark-role
+  apiGroup: rbac.authorization.k8s.io
diff --git a/resource-managers/kubernetes/integration-tests/pom.xml b/resource-managers/kubernetes/integration-tests/pom.xml
@@ -0,0 +1,155 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+  <parent>
+    <groupId>org.apache.spark</groupId>
+    <artifactId>spark-parent_2.11</artifactId>
+    <version>2.4.0-SNAPSHOT</version>
+    <relativePath>../../../pom.xml</relativePath>
+  </parent>
+
+  <artifactId>spark-kubernetes-integration-tests_2.11</artifactId>
+  <groupId>spark-kubernetes-integration-tests</groupId>
+  <properties>
+    <download-maven-plugin.version>1.3.0</download-maven-plugin.version>
+    <exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
+    <extraScalaTestArgs></extraScalaTestArgs>
+    <kubernetes-client.version>3.0.0</kubernetes-client.version>
+    <scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
+    <scalatest-maven-plugin.version>1.0</scalatest-maven-plugin.version>
+    <sbt.project.name>kubernetes-integration-tests</sbt.project.name>
+    <spark.kubernetes.test.unpackSparkDir>${project.build.directory}/spark-dist-unpacked</spark.kubernetes.test.unpackSparkDir>
+    <spark.kubernetes.test.imageTag>N/A</spark.kubernetes.test.imageTag>
+    <spark.kubernetes.test.imageTagFile>${project.build.directory}/imageTag.txt</spark.kubernetes.test.imageTagFile>
+    <spark.kubernetes.test.deployMode>minikube</spark.kubernetes.test.deployMode>
+    <spark.kubernetes.test.imageRepo>docker.io/kubespark</spark.kubernetes.test.imageRepo>
+    <test.exclude.tags></test.exclude.tags>
+  </properties>
+  <packaging>jar</packaging>
+  <name>Spark Project Kubernetes Integration Tests</name>
+
+  <dependencies>
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+      <type>test-jar</type>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>io.fabric8</groupId>
+      <artifactId>kubernetes-client</artifactId>
+      <version>${kubernetes-client.version}</version>
+    </dependency>
+  </dependencies>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.codehaus.mojo</groupId>
+        <artifactId>exec-maven-plugin</artifactId>
+        <version>${exec-maven-plugin.version}</version>
+        <executions>
+          <execution>
+            <id>setup-integration-test-env</id>
+            <phase>pre-integration-test</phase>
+            <goals>
+              <goal>exec</goal>
+            </goals>
+            <configuration>
+              <executable>scripts/setup-integration-test-env.sh</executable>
+              <arguments>
+                <argument>--unpacked-spark-tgz</argument>
+                <argument>${spark.kubernetes.test.unpackSparkDir}</argument>
+
+                <argument>--image-repo</argument>
+                <argument>${spark.kubernetes.test.imageRepo}</argument>
+
+                <argument>--image-tag</argument>
+                <argument>${spark.kubernetes.test.imageTag}</argument>
+
+                <argument>--image-tag-output-file</argument>
+                <argument>${spark.kubernetes.test.imageTagFile}</argument>
+
+                <argument>--deploy-mode</argument>
+                <argument>${spark.kubernetes.test.deployMode}</argument>
+
+                <argument>--spark-tgz</argument>
+                <argument>${spark.kubernetes.test.sparkTgz}</argument>
+              </arguments>
+            </configuration>
+          </execution>
+        </executions>
+      </plugin>
+      <plugin>
+        <!-- Triggers scalatest plugin in the integration-test phase instead of
+             the test phase. -->
+        <groupId>org.scalatest</groupId>
+        <artifactId>scalatest-maven-plugin</artifactId>
+        <version>${scalatest-maven-plugin.version}</version>
+        <configuration>
+          <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
+          <junitxml>.</junitxml>
+          <filereports>SparkTestSuite.txt</filereports>
+          <argLine>-ea -Xmx3g -XX:ReservedCodeCacheSize=512m ${extraScalaTestArgs}</argLine>
+          <stderr/>
+          <systemProperties>
+            <log4j.configuration>file:src/test/resources/log4j.properties</log4j.configuration>
+            <java.awt.headless>true</java.awt.headless>
+            <spark.kubernetes.test.imageTagFile>${spark.kubernetes.test.imageTagFile}</spark.kubernetes.test.imageTagFile>
+            <spark.kubernetes.test.unpackSparkDir>${spark.kubernetes.test.unpackSparkDir}</spark.kubernetes.test.unpackSparkDir>
+            <spark.kubernetes.test.imageRepo>${spark.kubernetes.test.imageRepo}</spark.kubernetes.test.imageRepo>
+            <spark.kubernetes.test.deployMode>${spark.kubernetes.test.deployMode}</spark.kubernetes.test.deployMode>
+            <spark.kubernetes.test.master>${spark.kubernetes.test.master}</spark.kubernetes.test.master>
+            <spark.kubernetes.test.namespace>${spark.kubernetes.test.namespace}</spark.kubernetes.test.namespace>
+            <spark.kubernetes.test.serviceAccountName>${spark.kubernetes.test.serviceAccountName}</spark.kubernetes.test.serviceAccountName>
+          </systemProperties>
+          <tagsToExclude>${test.exclude.tags}</tagsToExclude>
+        </configuration>
+        <executions>
+          <execution>
+            <id>test</id>
+            <goals>
+              <goal>test</goal>
+            </goals>
+            <configuration>
+              <!-- The negative pattern below prevents integration tests such as
+                   KubernetesSuite from running in the test phase. -->
+              <suffixes>(?&lt;!Suite)</suffixes>
+            </configuration>
+          </execution>
+          <execution>
+            <id>integration-test</id>
+            <phase>integration-test</phase>
+            <goals>
+              <goal>test</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+
+  </build>
+
+</project>