[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation #18645

srowen · 2017-07-16T14:17:47Z

…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure

What changes were proposed in this pull request?

This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.

In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.

It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the latest maintenance release in the earliest viable minor release.

Scalatest 2.x -> 3.0.3
Chill 0.8.0 -> 0.8.4
Clapper 1.0.x -> 1.1.2
json4s 3.2.x -> 3.4.2
Jackson 2.6.x -> 2.7.9 (required by json4s)

This change does not fully enable a Scala 2.12 build:

It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
It will require recreating SparkILoop and Main for REPL 2.12, which is SPARK-14650. Possible to do here too.

What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.

How was this patch tested?

Existing tests and build. Manually tested with ./dev/change-scala-version.sh 2.12 to verify it compiles, modulo the exceptions above.

srowen · 2017-07-16T14:18:13Z

CC @rxin since you asked; CC @JoshRosen
Just a WIP now of course, but wanted to table a take on most of the remaining gap we can bridge right now towards 2.12.

SparkQA · 2017-07-16T14:26:07Z

Test build #79643 has finished for PR 18645 at commit ef1c221.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-16T14:53:37Z

Test build #79645 has finished for PR 18645 at commit 1204077.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-07-16T16:50:53Z

core/src/main/scala/org/apache/spark/FutureAction.scala

These cause a MiMa warning, because they're new methods in a trait that might be extended by users. I'll have to go back and remember whether this is an actual problem, because the trait is providing a default implementation here.

I tried compiling a small app that calls RDD.countAsync (which returns a FutureAction) and even implements a custom FutureAction and compiled vs 2.2.0, then ran vs this build, and it worked. I believe this may be legitimately excluded from MiMa.

srowen · 2017-07-16T16:51:55Z

core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala

Changes like this resolve an ambiguity where two overloads of a method exist, one with a signature taking a lambda, and the other taking a trait/interface of one method, both of which could fit. In cases like this I resolved in favor of implementing a specific listener class.

isn't this a source breaking change?

I think that you can avoid source incompatibilities for Scala users by removing the overloads which accept Scala functions and then adding in a package-level implicit conversion to convert from Scala functions back into our own custom trait / interface.

The trickiness here is that you need to preserve binary compatibility on Scala 2.10/2.11, so the removal of the overload needs to be done conditionally so that it only occurs when building with Scala 2.12. Rather than having a separate source tree for 2.12, I'd propose defining the removed overload in a mixin trait which comes from a separate source file and then configure the build to use different versions of that file for 2.10/2.11 and for 2.12.

For details on this proposal, see https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit, a document that I wrote in March 2016 which explores these source incompatibility difficulties.

Applying that idea here, the idea would be to remove the method

def addTaskCompletionListener(f: (TaskContext) => Unit)

and add a package-level implicit conversion from TaskContext => Unit to TaskCompletionListener, but to do this only in the 2.12 source tree / shim. This approach has some caveats and could potentially impact Java users who are doing weird things (violating the goal that Java Spark code is source and binary compatible with all Scala versions). See the linked doc for a full discussion of this problem.

srowen · 2017-07-16T16:52:35Z

sql/catalyst/pom.xml

Some stuff like this became necessary due to other dependency changes and changes in their transitive dependencies.

srowen · 2017-07-16T16:52:56Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala

Actually this could have been removed with Scala 2.10 and will be, separately, shortly.

srowen · 2017-07-16T16:53:40Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

In cases like this I resolved the ambiguity by casting the argument, instead of implementing an interface -- because in cases like this the interface version exists just for Java support.

srowen · 2017-07-16T16:54:07Z

sql/hive/pom.xml

Not strictly related, just found this redundancy while editing. The exact same stanza appears below.

SparkQA · 2017-07-22T00:36:25Z

Test build #79851 has finished for PR 18645 at commit fec76be.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2017-07-22T06:28:56Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

isn't this a source breaking change?

This is just clarifying to the compiler which of two overloads to invoke, because two actually match and it's an error in 2.12.

It's going to be a problem for end-user apps in exactly the same way. They would need similar changes to work with Spark in Scala 2.12 (and, probably, other code with similar characteristics in 2.12)

This is really https://issues.apache.org/jira/browse/SPARK-14643
The thing is, different Scala versions have always been 'source incompatible' too, so that much may not be a big deal.

It would be a big problem if it caused Scala 2.11 apps to change, but this one won't.

Actually, no not the same issue as https://issues.apache.org/jira/browse/SPARK-14643 but similar. Let me ask the question there about how much source compatibility matters across Scala versions, for Java.

I'm not in any event proposing to enable a 2.12 build anytime soon, just see if we can get it closer without any implications for 2.11

I think this is primarily going to be an issue for end users who want to use an existing source tree to cross-compile for Scala 2.10, 2.11, and 2.12. Thus the pain of the source incompatibility would mostly be felt by library/package maintainers but it can be worked around as long as there's at least some common subset which is source compatible across all of those versions.

A similar ambiguity issue exists for people writing Spark code in Java 8 and who use lambda syntax. The number of such users must also upgrade to Scala 2.12 is probably fairly small and the number of those users who have a requirement to cross-build their Java 8 lambda syntax Spark code against 2.11 and 2.12 is probably even smaller, so maybe this isn't a huge deal to accept the ambiguity and require those users to do a bit of work when upgrading.

rxin · 2017-07-22T06:29:32Z

@srowen You just showed that the Scala 2.12 changes are source breaking, isn't it?

srowen · 2017-07-22T07:41:30Z

@rxin (See above for what I think you are referring to.)

I do now see tests failing that diff JSON, but that's probably because the Jackson version had to change. (This is a WIP.) And I need to figure out whether that's effectively a breaking change or not.

rxin · 2017-07-23T01:26:11Z

It is still source breaking change, and this is why I was saying it would be a lot of work to upgrade to Scala 2.12 without breaking existing source code. For 2.12 we should get rid of the functions that are problematic, using build profile.

srowen · 2017-07-23T08:55:52Z

@rxin it's definitely breaking between Scala 2.11 and Scala 2.12, but not for the Scala 2.11 build. Ideally, we find it's possible to enable a 2.12 build, such that the changes are binary and source compatible within the Scala 2.11 builds. Otherwise, it'd be impossible to merge this before 3.0.0. But if it doesn't affect 2.11 users there's a much better argument for adding the build in a minor release, as Scala versions have never been mutually compatible.

srowen · 2017-07-23T10:24:45Z

The JSON error arises because now, JSON messages contain an explicit entry for null properties, like "message" : null. This looks like a small behavior change introduced in Jackson 2.7. See https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7 Although it's the same, semantically, in theory, we should make the behavior identical.

It looks like several places in the code explicitly configure Jackson to only include NON_NULL properties. I believe it needs to be extended to be set explicitly elsewhere now. Still investigating ...

rxin · 2017-07-23T18:07:07Z

@srowen I don't agree that we should just break source compatibility here. We have already spent a lot of time doing this in the past and figuring out how to preserve it.

srowen · 2017-07-23T18:23:06Z

Are we talking about the same thing? this does not break source compatibility for Scala 2.11. Scala 2.12, yes. But that has never been binary, or necessarily source, compatible. What's different?

rxin · 2017-07-23T18:33:14Z

When users upgrade from 2.11 to 2.12, their app would be broken, wouldn't it?

srowen · 2017-07-23T18:50:27Z

That's not an upgrade. spark_2.11 2.2.0 to spark_2.11 2.3.0 is an upgrade, and that can't break, and doesn't with this change.

Or: Scala itself has never been source- or binary-compatible across Scala minor releases, and neither has Spark across Scala versions. Compiling vs spark_2.10 and running on spark_2.11 doesn't work; cross compiling I don't think even works in all cases. I don't follow why that's an issue here.

In fact, we already know it can't be source compatible, for Java callers: https://issues.apache.org/jira/browse/SPARK-14643 That may well be the argument that this can't happen until 3.0. Check in with @JoshRosen for more context.

However, I am not suggesting creating any Scala 2.12 release at all anytime soon here, so I don't think that's worth debating. The question of whether's it's source compatible is moot because this creates no new type of release to be compatible or not. I don't even disagree with someone who argues that Scala 2.12 support can't happen before 3.0.

It does make a lot of the changes that will have to be made for 2.12 though, without disrupting the current 2.11 build. The question is really whether that statement is true -- the Jackson thing is the problem right now.

Why would we bother implementing most of the groundwork for 2.12 without making an official 2.12 build? generally, to break down the problem and address this incrementally rather than in one huge change. But also to make it possible for a brave soul out there, like maybe the folks at Twitter, to run their own 2.12 build if they wanted to self-support.

More concretely, if this went in to a 2.3.0 release tomorrow, what goes wrong?

SparkQA · 2017-07-23T19:58:37Z

Test build #79892 has finished for PR 18645 at commit 59b46ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2017-07-23T20:36:16Z

Looking at the source compatibility changes made here, I was a little confused about why we didn't need to make many more changes. In principle, it seemed like the ambiguous overload issue affecting that foreachPartition call should have also impacted calls like .filter(_ > 0), so I was originally expecting that we'd have to update hundreds of call sites.

It turns out that the final Scala 2.12 release has a nice feature which resolves this ambiguity in many cases. In the release notes at https://www.scala-lang.org/news/2.12.0/#lambda-syntax-for-sam-types, there's an example where even though there's a potentially ambiguous overload, "overloading resolution selects the one with the Function1 argument type". This is explained in more detail at https://www.scala-lang.org/news/2.12.0/#sam-conversion-in-overloading-resolution.

Quoting:

In Scala 2.12, both methods are applicable, therefore overloading resolution needs to pick the most specific alternative. The specification for type compatibility has been updated to consider SAM conversion, so that the first alternative is more specific.

This explains why we didn't have to update all call sites. However, why did the .foreachPartition example break? I played around with this in the Scastie Scala paste bin and I think that I've spotted the problem: all of the cases where we had ambiguity seem to be cases where we're operating on Datasets in a generic way and have a type parameter representing the Dataset's type.

So, this means that

def f(ds: Dataset[Int]): Dataset[Int] = ds.filter(_ != null)

is unambiguous but

def f[T](ds: Dataset[T]): Dataset[T] = ds.filter(_ != null)

fails with a compiler error in Scala 2.12. You can try this out at https://scastie.scala-lang.org/JoshRosen/eBcxUGdORsiOQNULMUnAsw.

I took a look at the sections of the language spec linked in the quote above but it's not immediately clear to me whether this is a compiler bug or expected behavior (I'll have to take some more time to try to understand the spec). My hunch is that it has to do with type inference, e.g. the compiler can't figure out which method to call until it knows the function's type and it can't resolve the function's type until it knows which method it's applying to. That makes intuitive sense to me, but I'm not sure if that's the actual reason.

srowen · 2017-07-26T16:50:38Z

I believe the last errors were essentially the same issue discussed in json4s here:
json4s/json4s#227

I suspect that it's down to a change in the effective setting of Jackson when reading or writing JSON: https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/DeserializationFeature.html#ACCEPT_SINGLE_VALUE_AS_ARRAY or https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/SerializationFeature.html#WRITE_SINGLE_ELEM_ARRAYS_UNWRAPPED

It doesn't look like the value's defaults changed from Jackson 2.6 to 2.7, and don't see it overridden in json4s, though it did see some changes in 3.3 that cause it to ignore some Jackson defaults.

In any event I thought it safer to handle both scenarios in the tests where they matter. Let's see if that works, first.

SparkQA · 2017-07-26T19:03:05Z

Test build #79968 has finished for PR 18645 at commit 82b51dc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

xuwei-k · 2017-07-29T17:35:57Z

json4s 3.2.11 for Scala 2.12 released

srowen · 2017-07-29T17:37:16Z

Oh nice, it will help if that's not a required update along with everything else here as it requires a Jackson update. The Jackson update may be important eventually but nice if it can be separate.

SparkQA · 2017-07-30T22:34:20Z

Test build #80057 has finished for PR 18645 at commit bc419a0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-07-31T09:44:18Z

The json4s change above made this change notably simpler. The current problem is the same, next error:

sbt.ForkMain$ForkError: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector
	at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
	at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
	at org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:146)
	at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
	at org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:146)

See comments on that test:

3ae25f2#commitcomment-23320234
3ae25f2#commitcomment-23341672

I'm current looking into Michael's suggestion in the second case.

SparkQA · 2017-07-31T14:05:42Z

Test build #80074 has finished for PR 18645 at commit dc2cdb4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-23T19:02:30Z

Test build #81037 has finished for PR 18645 at commit 0c81fdd.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-26T00:52:27Z

Test build #81143 has finished for PR 18645 at commit 273dbdb.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-26T11:58:01Z

Test build #3905 has finished for PR 18645 at commit 273dbdb.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-28T20:57:16Z

Test build #81184 has finished for PR 18645 at commit 19ee35e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure

SparkQA · 2017-08-30T15:00:19Z

Test build #81255 has finished for PR 18645 at commit f6b0a4c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-08-30T21:46:04Z

The status here is that this change does work fine with Scala 2.11, and does implement changes that will be needed for 2.12. Right now I've found that, suddenly, it doesn't compile for Scala 2.12:

[error] /Users/srowen/Documents/Cloudera/spark/core/src/test/scala/org/apache/spark/FileSuite.scala:100: could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[org.apache.hadoop.io.IntWritable]
[error] Error occurred in an application involving default arguments.
[error]     val output = sc.sequenceFile[IntWritable, Text](outputDir)
[error]                                                    ^
[error] /Users/srowen/Documents/Cloudera/spark/core/src/test/scala/org/apache/spark/FileSuite.scala:131: could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[org.apache.hadoop.io.IntWritable]
[error] Error occurred in an application involving default arguments.
[error]     val output = sc.sequenceFile[IntWritable, Text](outputDir)

I didn't see this with earlier 2.12, I believe, so could be some change or problem in implicit resolution.

Of course, the 2.12 build never was going to work fully after this change anyway (Kafka 0.8 needs to go, REPL needs to update) so maybe worth going for anyway.

srowen · 2017-09-01T18:21:42Z

Merged to master

srowen force-pushed the SPARK-14280 branch from ef1c221 to 1204077 Compare July 16, 2017 14:38

srowen commented Jul 16, 2017

View reviewed changes

rxin reviewed Jul 22, 2017

View reviewed changes

srowen force-pushed the SPARK-14280 branch from 59b46ef to 82b51dc Compare July 26, 2017 16:44

xuwei-k mentioned this pull request Jul 29, 2017

release 3.2.11 for Scala 2.12 for apache spark json4s/json4s#472

Closed

srowen force-pushed the SPARK-14280 branch from 82b51dc to bc419a0 Compare July 30, 2017 20:02

srowen force-pushed the SPARK-14280 branch from bc419a0 to dc2cdb4 Compare July 31, 2017 11:44

srowen force-pushed the SPARK-14280 branch from dc2cdb4 to 0c81fdd Compare August 23, 2017 14:48

srowen force-pushed the SPARK-14280 branch from 0c81fdd to 273dbdb Compare August 25, 2017 20:38

srowen force-pushed the SPARK-14280 branch from 12831da to 19ee35e Compare August 28, 2017 14:34

Update dependencies where needed to minimum that offers a Scala 2.12 …

f6b0a4c

…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure

srowen force-pushed the SPARK-14280 branch from 19ee35e to f6b0a4c Compare August 30, 2017 11:14

asfgit closed this in 12ab7f7 Sep 1, 2017

srowen deleted the SPARK-14280 branch September 6, 2017 12:47

[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation #18645

[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation #18645

Uh oh!

Conversation

srowen commented Jul 16, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Jul 16, 2017

Uh oh!

SparkQA commented Jul 16, 2017

Uh oh!

SparkQA commented Jul 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Jul 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Jul 22, 2017

Uh oh!

srowen commented Jul 22, 2017

Uh oh!

rxin commented Jul 23, 2017

Uh oh!

srowen commented Jul 23, 2017

Uh oh!

srowen commented Jul 23, 2017

Uh oh!

rxin commented Jul 23, 2017

Uh oh!

srowen commented Jul 23, 2017

Uh oh!

rxin commented Jul 23, 2017

Uh oh!

srowen commented Jul 23, 2017

Uh oh!

SparkQA commented Jul 23, 2017

Uh oh!

JoshRosen commented Jul 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jul 26, 2017

Uh oh!

SparkQA commented Jul 26, 2017

Uh oh!

xuwei-k commented Jul 29, 2017

Uh oh!

srowen commented Jul 29, 2017

Uh oh!

SparkQA commented Jul 30, 2017

Uh oh!

srowen commented Jul 31, 2017

Uh oh!

SparkQA commented Jul 31, 2017

srowen Jul 22, 2017 •

edited

Loading

JoshRosen commented Jul 23, 2017 •

edited

Loading