Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/workflows/master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,15 @@ jobs:
matrix:
java: [ '1.8', '11' ]
hadoop: [ 'hadoop-2.7', 'hadoop-3.2' ]
hive: [ 'hive-1.2', 'hive-2.3' ]
exclude:
- java: '11'
hadoop: 'hadoop-2.7'
name: Build Spark with JDK ${{ matrix.java }} and ${{ matrix.hadoop }}
- java: '11'
hive: 'hive-1.2'
- hadoop: 'hadoop-3.2'
hive: 'hive-1.2'
name: Build Spark - JDK${{ matrix.java }}/${{ matrix.hadoop }}/${{ matrix.hive }}

steps:
- uses: actions/checkout@master
Expand All @@ -44,7 +49,7 @@ jobs:
run: |
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -P${{ matrix.hadoop }} -Phadoop-cloud -Djava.version=${{ matrix.java }} install
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -P${{ matrix.hive }} -Phive-thriftserver -P${{ matrix.hadoop }} -Phadoop-cloud -Djava.version=${{ matrix.java }} install
rm -rf ~/.m2/repository/org/apache/spark


Expand Down
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ install:
build_script:
# '-Djna.nosys=true' is required to avoid kernel32.dll load failure.
# See SPARK-28759.
- cmd: mvn -DskipTests -Psparkr -Phive -Djna.nosys=true package
- cmd: mvn -DskipTests -Psparkr -Phive -Phive-1.2 -Djna.nosys=true package

environment:
NOT_CRAN: true
Expand Down
4 changes: 2 additions & 2 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,8 @@ def get_hadoop_profiles(hadoop_version):
"""

sbt_maven_hadoop_profiles = {
"hadoop2.7": ["-Phadoop-2.7"],
"hadoop3.2": ["-Phadoop-3.2"],
"hadoop2.7": ["-Phadoop-2.7", "-Phive-1.2"],
"hadoop3.2": ["-Phadoop-3.2", "-Phive-2.3"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, @dongjoon-hyun, can you create JIRAs to follow up? Seems we will change -Phive-2.3 as a default profile as discussed in the mailing list.

  1. Verify *Hadoop 2.7 and Hive 2.3 combination in both JDK 8 and JDK 11 (WIP) as it will be the default
  2. Setting a Jenkins job for -Phadoop-2.7 -Phive-2.3.
  3. We will need to be able to configure Hive version and also Hadoop version in the PR builder.
  4. Change the default profile to Hive 2.3.
  5. [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles #26619 (comment)
  6. Release script updates.
  7. ... (other more .. ?)

*Hadoop 2 will be default at this moment. it's being discussed in the mailing list

Many things are going on so it looks very easy to lose the track.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the thread is independent from JDK11 or Hadoop 3. For the following, only JDK8 is the target of the follow-up of this PR. And, for JDK11, I believe @wangyum will handle it at his on-going work.

Verify *Hadoop 2.7 and Hive 2.3 combination in both JDK 8 and JDK 11 (WIP) as it will be the default

For (4), Hive 2.3 is already default for Hadoop-3. And, this PR makes Hive 2.3 as a default in the pom files. So, I guess the remains of (4) is equal to (2) and (3).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow-up issues are created and mentioned in the PR description, @HyukjinKwon .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure, thanks for clarification. @wangyum, can you verify the new JDK 11 combination and take a following action?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I will do it later.

}

if hadoop_version in sbt_maven_hadoop_profiles:
Expand Down
11 changes: 8 additions & 3 deletions dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,20 @@ $MVN -q versions:set -DnewVersion=$TEMP_VERSION -DgenerateBackupPoms=false > /de

# Generate manifests for each Hadoop profile:
for HADOOP_PROFILE in "${HADOOP_PROFILES[@]}"; do
if [[ $HADOOP_PROFILE == **hadoop-3** ]]; then
HIVE_PROFILE=hive-2.3
else
HIVE_PROFILE=hive-1.2
fi
echo "Performing Maven install for $HADOOP_PROFILE"
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE jar:jar jar:test-jar install:install clean -q
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE jar:jar jar:test-jar install:install clean -q

echo "Performing Maven validate for $HADOOP_PROFILE"
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE validate -q
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE validate -q

echo "Generating dependency manifest for $HADOOP_PROFILE"
mkdir -p dev/pr-deps
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE dependency:build-classpath -pl assembly -am \
$MVN $HADOOP2_MODULE_PROFILES -P$HADOOP_PROFILE -P$HIVE_PROFILE dependency:build-classpath -pl assembly -am \
| grep "Dependencies classpath:" -A 1 \
| tail -n 1 | tr ":" "\n" | rev | cut -d "/" -f 1 | rev | sort \
| grep -v spark > dev/pr-deps/spark-deps-$HADOOP_PROFILE
Expand Down
43 changes: 27 additions & 16 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -128,19 +128,19 @@
<zookeeper.version>3.4.14</zookeeper.version>
<curator.version>2.7.1</curator.version>
<okapi.version>0.4.2</okapi.version>
<hive.group>org.spark-project.hive</hive.group>
<hive.classifier></hive.classifier>
<hive.group>org.apache.hive</hive.group>
<hive.classifier>core</hive.classifier>
<!-- Version used in Maven Hive dependency -->
<hive.version>1.2.1.spark2</hive.version>
<hive.version>2.3.6</hive.version>
<hive23.version>2.3.6</hive23.version>
<!-- Version used for internal directory structure -->
<hive.version.short>1.2.1</hive.version.short>
<hive.version.short>2.3.5</hive.version.short>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, any reason why we needed the patch version in the directory names initially? I don't think we need different shims for Hive patch versions. If that's true, I'd suggest renaming the sql/hive-thriftserver/{v1.2.1,v2.3.5} folders to just v1.2 and v2.3. Of course, this can be done in a follow-up PR.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We can use the short versions (v1.2 and v2.3). I'll do in the next follow-up PR. It will include renaming files mostly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are already in Apache Hive 2.3.6, I don't expect many changes at 2.3.7+.

<!-- note that this should be compatible with Kafka brokers version 0.10 and up -->
<kafka.version>2.3.1</kafka.version>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.10.1</parquet.version>
<orc.version>1.5.7</orc.version>
<orc.classifier>nohive</orc.classifier>
<orc.classifier></orc.classifier>
<hive.parquet.group>com.twitter</hive.parquet.group>
<hive.parquet.version>1.6.0</hive.parquet.version>
<jetty.version>9.4.18.v20190429</jetty.version>
Expand Down Expand Up @@ -181,7 +181,7 @@
<commons-lang3.version>3.8.1</commons-lang3.version>
<!-- org.apache.commons/commons-pool2/-->
<commons-pool2.version>2.6.2</commons-pool2.version>
<datanucleus-core.version>3.2.10</datanucleus-core.version>
<datanucleus-core.version>4.1.17</datanucleus-core.version>
<janino.version>3.0.15</janino.version>
<jersey.version>2.29</jersey.version>
<joda.version>2.10.5</joda.version>
Expand Down Expand Up @@ -228,7 +228,7 @@
-->
<hadoop.deps.scope>compile</hadoop.deps.scope>
<hive.deps.scope>compile</hive.deps.scope>
<hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
<hive.parquet.scope>provided</hive.parquet.scope>
<orc.deps.scope>compile</orc.deps.scope>
<parquet.deps.scope>compile</parquet.deps.scope>
<parquet.test.deps.scope>test</parquet.test.deps.scope>
Expand Down Expand Up @@ -2921,16 +2921,27 @@
<properties>
<hadoop.version>3.2.0</hadoop.version>
<curator.version>2.13.0</curator.version>
<hive.group>org.apache.hive</hive.group>
<hive.classifier>core</hive.classifier>
<hive.version>${hive23.version}</hive.version>
<hive.version.short>2.3.5</hive.version.short>
<!-- Do not need parquet-hadoop-bundle because we already have
parquet-common, parquet-column and parquet-hadoop -->
<hive.parquet.scope>provided</hive.parquet.scope>
<orc.classifier></orc.classifier>
<datanucleus-core.version>4.1.17</datanucleus-core.version>
</properties>
</profile>

<profile>
<id>hive-1.2</id>
<properties>
<hive.group>org.spark-project.hive</hive.group>
<hive.classifier></hive.classifier>
<!-- Version used in Maven Hive dependency -->
<hive.version>1.2.1.spark2</hive.version>
<!-- Version used for internal directory structure -->
<hive.version.short>1.2.1</hive.version.short>
<hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
<orc.classifier>nohive</orc.classifier>
<datanucleus-core.version>3.2.10</datanucleus-core.version>
</properties>
</profile>

<profile>
<id>hive-2.3</id>
<!-- Default hive profile. Uses global properties. -->
<dependencies>
<!-- Both Hive and ORC need hive-storage-api, but it is excluded by orc-mapreduce -->
<dependency>
Expand Down
2 changes: 1 addition & 1 deletion sql/hive/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@
</build>
</profile>
<profile>
<id>hadoop-3.2</id>
<id>hive-2.3</id>
<dependencies>
<dependency>
<groupId>${hive.group}</groupId>
Expand Down