-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation #18645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC @rxin since you asked; CC @JoshRosen |
|
Test build #79643 has finished for PR 18645 at commit
|
|
Test build #79645 has finished for PR 18645 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cause a MiMa warning, because they're new methods in a trait that might be extended by users. I'll have to go back and remember whether this is an actual problem, because the trait is providing a default implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried compiling a small app that calls RDD.countAsync (which returns a FutureAction) and even implements a custom FutureAction and compiled vs 2.2.0, then ran vs this build, and it worked. I believe this may be legitimately excluded from MiMa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes like this resolve an ambiguity where two overloads of a method exist, one with a signature taking a lambda, and the other taking a trait/interface of one method, both of which could fit. In cases like this I resolved in favor of implementing a specific listener class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a source breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can avoid source incompatibilities for Scala users by removing the overloads which accept Scala functions and then adding in a package-level implicit conversion to convert from Scala functions back into our own custom trait / interface.
The trickiness here is that you need to preserve binary compatibility on Scala 2.10/2.11, so the removal of the overload needs to be done conditionally so that it only occurs when building with Scala 2.12. Rather than having a separate source tree for 2.12, I'd propose defining the removed overload in a mixin trait which comes from a separate source file and then configure the build to use different versions of that file for 2.10/2.11 and for 2.12.
For details on this proposal, see https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit, a document that I wrote in March 2016 which explores these source incompatibility difficulties.
Applying that idea here, the idea would be to remove the method
def addTaskCompletionListener(f: (TaskContext) => Unit)
and add a package-level implicit conversion from TaskContext => Unit to TaskCompletionListener, but to do this only in the 2.12 source tree / shim. This approach has some caveats and could potentially impact Java users who are doing weird things (violating the goal that Java Spark code is source and binary compatible with all Scala versions). See the linked doc for a full discussion of this problem.
sql/catalyst/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some stuff like this became necessary due to other dependency changes and changes in their transitive dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this could have been removed with Scala 2.10 and will be, separately, shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like this I resolved the ambiguity by casting the argument, instead of implementing an interface -- because in cases like this the interface version exists just for Java support.
sql/hive/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related, just found this redundancy while editing. The exact same stanza appears below.
|
Test build #79851 has finished for PR 18645 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a source breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just clarifying to the compiler which of two overloads to invoke, because two actually match and it's an error in 2.12.
It's going to be a problem for end-user apps in exactly the same way. They would need similar changes to work with Spark in Scala 2.12 (and, probably, other code with similar characteristics in 2.12)
This is really https://issues.apache.org/jira/browse/SPARK-14643
The thing is, different Scala versions have always been 'source incompatible' too, so that much may not be a big deal.
It would be a big problem if it caused Scala 2.11 apps to change, but this one won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no not the same issue as https://issues.apache.org/jira/browse/SPARK-14643 but similar. Let me ask the question there about how much source compatibility matters across Scala versions, for Java.
I'm not in any event proposing to enable a 2.12 build anytime soon, just see if we can get it closer without any implications for 2.11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is primarily going to be an issue for end users who want to use an existing source tree to cross-compile for Scala 2.10, 2.11, and 2.12. Thus the pain of the source incompatibility would mostly be felt by library/package maintainers but it can be worked around as long as there's at least some common subset which is source compatible across all of those versions.
A similar ambiguity issue exists for people writing Spark code in Java 8 and who use lambda syntax. The number of such users must also upgrade to Scala 2.12 is probably fairly small and the number of those users who have a requirement to cross-build their Java 8 lambda syntax Spark code against 2.11 and 2.12 is probably even smaller, so maybe this isn't a huge deal to accept the ambiguity and require those users to do a bit of work when upgrading.
|
@srowen You just showed that the Scala 2.12 changes are source breaking, isn't it? |
|
@rxin (See above for what I think you are referring to.) I do now see tests failing that diff JSON, but that's probably because the Jackson version had to change. (This is a WIP.) And I need to figure out whether that's effectively a breaking change or not. |
|
It is still source breaking change, and this is why I was saying it would be a lot of work to upgrade to Scala 2.12 without breaking existing source code. For 2.12 we should get rid of the functions that are problematic, using build profile. |
|
@rxin it's definitely breaking between Scala 2.11 and Scala 2.12, but not for the Scala 2.11 build. Ideally, we find it's possible to enable a 2.12 build, such that the changes are binary and source compatible within the Scala 2.11 builds. Otherwise, it'd be impossible to merge this before 3.0.0. But if it doesn't affect 2.11 users there's a much better argument for adding the build in a minor release, as Scala versions have never been mutually compatible. |
|
The JSON error arises because now, JSON messages contain an explicit entry for null properties, like It looks like several places in the code explicitly configure Jackson to only include |
|
@srowen I don't agree that we should just break source compatibility here. We have already spent a lot of time doing this in the past and figuring out how to preserve it. |
|
Are we talking about the same thing? this does not break source compatibility for Scala 2.11. Scala 2.12, yes. But that has never been binary, or necessarily source, compatible. What's different? |
|
When users upgrade from 2.11 to 2.12, their app would be broken, wouldn't it? |
|
That's not an upgrade. spark_2.11 2.2.0 to spark_2.11 2.3.0 is an upgrade, and that can't break, and doesn't with this change. Or: Scala itself has never been source- or binary-compatible across Scala minor releases, and neither has Spark across Scala versions. Compiling vs spark_2.10 and running on spark_2.11 doesn't work; cross compiling I don't think even works in all cases. I don't follow why that's an issue here. In fact, we already know it can't be source compatible, for Java callers: https://issues.apache.org/jira/browse/SPARK-14643 That may well be the argument that this can't happen until 3.0. Check in with @JoshRosen for more context. However, I am not suggesting creating any Scala 2.12 release at all anytime soon here, so I don't think that's worth debating. The question of whether's it's source compatible is moot because this creates no new type of release to be compatible or not. I don't even disagree with someone who argues that Scala 2.12 support can't happen before 3.0. It does make a lot of the changes that will have to be made for 2.12 though, without disrupting the current 2.11 build. The question is really whether that statement is true -- the Jackson thing is the problem right now. Why would we bother implementing most of the groundwork for 2.12 without making an official 2.12 build? generally, to break down the problem and address this incrementally rather than in one huge change. But also to make it possible for a brave soul out there, like maybe the folks at Twitter, to run their own 2.12 build if they wanted to self-support. More concretely, if this went in to a 2.3.0 release tomorrow, what goes wrong? |
|
Test build #79892 has finished for PR 18645 at commit
|
|
Looking at the source compatibility changes made here, I was a little confused about why we didn't need to make many more changes. In principle, it seemed like the ambiguous overload issue affecting that It turns out that the final Scala 2.12 release has a nice feature which resolves this ambiguity in many cases. In the release notes at https://www.scala-lang.org/news/2.12.0/#lambda-syntax-for-sam-types, there's an example where even though there's a potentially ambiguous overload, "overloading resolution selects the one with the Quoting:
This explains why we didn't have to update all call sites. However, why did the So, this means that is unambiguous but fails with a compiler error in Scala 2.12. You can try this out at https://scastie.scala-lang.org/JoshRosen/eBcxUGdORsiOQNULMUnAsw. I took a look at the sections of the language spec linked in the quote above but it's not immediately clear to me whether this is a compiler bug or expected behavior (I'll have to take some more time to try to understand the spec). My hunch is that it has to do with type inference, e.g. the compiler can't figure out which method to call until it knows the function's type and it can't resolve the function's type until it knows which method it's applying to. That makes intuitive sense to me, but I'm not sure if that's the actual reason. |
|
I believe the last errors were essentially the same issue discussed in json4s here: I suspect that it's down to a change in the effective setting of Jackson when reading or writing JSON: https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/DeserializationFeature.html#ACCEPT_SINGLE_VALUE_AS_ARRAY or https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/SerializationFeature.html#WRITE_SINGLE_ELEM_ARRAYS_UNWRAPPED It doesn't look like the value's defaults changed from Jackson 2.6 to 2.7, and don't see it overridden in json4s, though it did see some changes in 3.3 that cause it to ignore some Jackson defaults. In any event I thought it safer to handle both scenarios in the tests where they matter. Let's see if that works, first. |
|
Test build #79968 has finished for PR 18645 at commit
|
|
Oh nice, it will help if that's not a required update along with everything else here as it requires a Jackson update. The Jackson update may be important eventually but nice if it can be separate. |
|
Test build #80057 has finished for PR 18645 at commit
|
|
The json4s change above made this change notably simpler. The current problem is the same, next error: See comments on that test: 3ae25f2#commitcomment-23320234 I'm current looking into Michael's suggestion in the second case. |
|
Test build #80074 has finished for PR 18645 at commit
|
|
Test build #81037 has finished for PR 18645 at commit
|
|
Test build #81143 has finished for PR 18645 at commit
|
|
Test build #3905 has finished for PR 18645 at commit
|
|
Test build #81184 has finished for PR 18645 at commit
|
…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
|
Test build #81255 has finished for PR 18645 at commit
|
|
The status here is that this change does work fine with Scala 2.11, and does implement changes that will be needed for 2.12. Right now I've found that, suddenly, it doesn't compile for Scala 2.12: I didn't see this with earlier 2.12, I believe, so could be some change or problem in implicit resolution. Of course, the 2.12 build never was going to work fully after this change anyway (Kafka 0.8 needs to go, REPL needs to update) so maybe worth going for anyway. |
|
Merged to master |
…build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
What changes were proposed in this pull request?
This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the latest maintenance release in the earliest viable minor release.
This change does not fully enable a Scala 2.12 build:
SparkILoopandMainfor REPL 2.12, which is SPARK-14650. Possible to do here too.What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
How was this patch tested?
Existing tests and build. Manually tested with
./dev/change-scala-version.sh 2.12to verify it compiles, modulo the exceptions above.