Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Nov 5, 2019

What changes were proposed in this pull request?

This patch adds @JsonDeserialize annotation for the field which type is Option[Long] in LogInfo/AttemptInfoWrapper. It hits https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges - other existing json models take care of this, but we missed to add annotation to these classes.

Why are the changes needed?

Without this change, SHS will throw ClassNotFoundException when rebuilding App UI.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested.

@HeartSaVioR
Copy link
Contributor Author

cc. @vanzin This is a follow-up patch of SPARK-28869. Thanks in advance.

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113258 has finished for PR 26397 at commit cf6cf8a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113260 has finished for PR 26397 at commit 3d34769.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Nov 5, 2019

Why didn't unit tests catch this? (Or maybe: add a unit test?)

@HeartSaVioR
Copy link
Contributor Author

Actually I have been trying to construct/modify UT to let it fail on master and pass with this patch, but still no luck.
Btw, I searched the code where same annotation is tagged and you've also tagged it in api models. Did you hit some issue before adding annotation, or added it based on knowledge?

@vanzin
Copy link
Contributor

vanzin commented Nov 6, 2019

I added those annotations because I hit errors when I was modifying the SHS to use the disk store.

Maybe you need to try rolling logs with the disk store to hit it.

@HeartSaVioR
Copy link
Contributor Author

Thanks for the quick response! That's actually one of things what I tried - no luck. I'll try to shutdown and restore KVStore to see whether it triggers failure.

@HeartSaVioR
Copy link
Contributor Author

OK, I added the test which fails on master branch and passes with the patch. I guess I found it earlier but checkForLogs() swallow the exception and log is not printed in console so I missed the exception.

@SparkQA
Copy link

SparkQA commented Nov 6, 2019

Test build #113296 has finished for PR 26397 at commit 3bd2760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2019

Test build #113298 has finished for PR 26397 at commit b3f3479.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// The issue happens only the value in Option is being unboxed. Simple comparison sometimes
// doesn't go though unboxing the value, hence ClassCastException is not occurred.
// Here we ensure unboxing to Long succeeds. Please refer SPARK-29755 for more details.
assert(BoxesRunTime.unboxToLong(opt.get) === expected.get)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use .toLong here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

scala> val i = Option(1)
i: Option[Int] = Some(1)

scala> val l = i.asInstanceOf[Option[Long]]
l: Option[Long] = Some(1)

scala> l.get.toLong
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
  at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105)
  ... 28 elided

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here opt really is an Option[Long], not a Option[Int]. You can't assign the latter to the former in Scala anyway. I feel like I'm missing the whole point here. Is the idea that Jackson is coming up with an Option[Int] when deserializing, without this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's the bug. Jackson deserializes the value "1" as an Option[Int] because it fits in an int, and sticks it into that variable using reflection, regardless of the type parameter (since at runtime Jackson can only see it's an Option, not an Option[Long]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but if the bug is fixed, then this really is an Option[Long]. Would handling the case that it isn't not cover up the bug if it existed? again I probably misunderstand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand your question.

As Jungtaek said, this test fails before the fix, because even if the code says "Option[Long]", Jackson actually put an "Option[Int]" in there.

So, as far as this test goes:

  • fails without the fix
  • works with the fix

Again, isn't that what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jackson-Scala FAQ page I've linked in description explains what is happening here and why users should apply the workaround. They couldn't find a good way to fix it in Jackson-Scala side.

Why I simply pick BoxesRunTime.unboxToLong directly is, that's the actual path where exception is thrown. (Please refer JIRA description - https://issues.apache.org/jira/browse/SPARK-29755)
Simple comparison (like opt === expect) didn't bring exception, so I wanted to make sure it definitely fails on current master, not probably fails.

I'd prefer to have a test that exercised this through the normal SHS operation

Yeah that was what I tried first, and checkLogs() swallowed the exception (so we can't catch CCE in caller side) and another unexpected error was coming when trying to verify. It might also reflect something is not working, but looks to be less clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I was just commenting you can use opt.get.toLong to hit the bug without the fix, which looks a bit better than calling into BoxesRunTime, but either is fine really.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. No strong voice on either. I'll change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, .toLong hits the bug. So I was wondering what the deal is with BoxesRunTime, and though it was there to make Option[Int] -> long work. But then that would not fail the test. If that's not what it does, and it effectively does .toLong, then yeah we're on the same page: just use .toLong

attemptId: Option[String],
fileSize: Long,
lastIndex: Option[Long],
@JsonDeserialize(contentAs = classOf[JLong]) lastIndex: Option[Long],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same approach used elsewhere right? if so that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right. There're other spots we already took care of.

@SparkQA
Copy link

SparkQA commented Nov 6, 2019

Test build #113343 has finished for PR 26397 at commit 53a0a91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked, the change solves the problem. Only minors found.

import org.apache.spark.util.{Clock, JsonProtocol, ManualClock, Utils}
import org.apache.spark.util.logging.DriverLogger


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why this needed?

test("SPARK-29755 LogInfo should be serialized/deserialized by jackson properly") {
def assertSerDe(serializer: KVStoreScalaSerializer, info: LogInfo): Unit = {
val infoAfterSerDe = serializer.deserialize(
serializer.serialize(info), classOf[LogInfo])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: No break needed.

val logPath: String,
val fileSize: Long,
val lastIndex: Option[Long],
@JsonDeserialize(contentAs = classOf[JLong]) val lastIndex: Option[Long],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe a break break between deser and the val would make it more readable.

* limitations under the License.
*/

package org.apache.spark.deploy.history
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There are couple of unused imports at the beginning of the file. Maybe worth to clean them up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's occurred from here I'll remove. If not, it might make other PRs be broken, so I might take it carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FsHistoryProvider.scala is one of the main target of this development (I mean not the PR) it would be good to clean it up somewhere. What do you think where should we do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's OK to do with minor PR, but commiter could finally judge the worth. (treat my comment as 2 cents)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this resolution. Such way we separate the concerns. I've created #26436, in case of disagreement it can be just dropped.


import scala.collection.JavaConverters._
import scala.concurrent.duration._
import scala.runtime.BoxesRunTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this needed? Tests show the same result with and without it.

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see now. This was coming from a previous code:
assert(BoxesRunTime.unboxToLong(opt.get) === expected.get)

@SparkQA
Copy link

SparkQA commented Nov 8, 2019

Test build #113415 has finished for PR 26397 at commit 62c1aed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming the resolution offered in #26397 (comment) is accepted.

if (expected.isEmpty) {
assert(opt.isEmpty)
} else {
// The issue happens only the value in Option is being unboxed. Here we ensure unboxing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/happens only the/happens only when the

@SparkQA
Copy link

SparkQA commented Nov 9, 2019

Test build #113478 has finished for PR 26397 at commit d1bc497.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

srowen pushed a commit that referenced this pull request Nov 9, 2019
### What changes were proposed in this pull request?
As it has been discussed in #26397 (comment) `FsHistoryProvider` import section has to be cleaned up.

### Why are the changes needed?
Unused imports.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing unit tests.

Closes #26436 from gaborgsomogyi/SPARK-29755.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113521 has finished for PR 26397 at commit 698a657.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Bump

@vanzin
Copy link
Contributor

vanzin commented Nov 11, 2019

Merging to master.

@vanzin vanzin closed this in df08e90 Nov 11, 2019
@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-29755 branch November 12, 2019 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants