Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 14, 2018

What changes were proposed in this pull request?

In the PR, I propose to output an warning if the addFile() or addJar() methods are callled more than once for the same path. Currently, overwriting of already added files is not supported. New comments and warning are reflected the existing behaviour.

@MaxGekk MaxGekk changed the title [SPARK-24807][SQL] Adding files/jars twice: output a warning and add a note [SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note Jul 14, 2018
@SparkQA
Copy link

SparkQA commented Jul 14, 2018

Test build #93005 has finished for PR 21771 at commit 0942fd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2018

Test build #93007 has finished for PR 21771 at commit 4a583c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

env.securityManager, hadoopConfiguration, timestamp, useCache = false)
postEnvironmentUpdate()
} else {
logWarning(s"The path $path has been added already. Overwriting of added paths " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, @MaxGekk, how about we just give warnings without notes for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notes are also reasonable to me. This is a common user error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering how common it is for an user to add the same jar expecting it will overwrite since mostly we consider those cases as immutable resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Our support receives a few "bug" reports per months. For now we can provide a link to the note at least. The warning itself is needed to our support engineers to detect such kind of problems from logs of already finished jobs. Actually customers do not say in their bug reports that files/jars weren't overwritten (it would be easier). They report problems like calling a method from a lib crashes due to incompatible signature of method or a class doesn't exists. Or final result of a Spark job is not correct because a config/resource files added via addFile() is not up to date. Now I can detect the situation from logs and provide a link to docs for addFile()/addJar().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I'm happy with the warning message but less sure if we note. I'm okay.

env.securityManager, hadoopConfiguration, timestamp, useCache = false)
postEnvironmentUpdate()
} else {
logWarning(s"The path $path has been added already. Overwriting of added paths " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell we leave a JIRA link as a comment for example SPARK-16787 and/or SPARK-19417

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally do not post the JIRA number in the message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a comment.

@HyukjinKwon
Copy link
Member

Seems fine to me.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 6999321 Jul 15, 2018
logInfo(s"Added JAR $path at $key with timestamp $timestamp")
postEnvironmentUpdate()
} else {
logWarning(s"The jar $path has been added already. Overwriting of added jars " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could confuse what it means with spark.files.overwrite.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 15, 2018

We could have updated the doc for spark.files.overwrite too since the confusion could be probably with this configuration.

@MaxGekk MaxGekk deleted the warning-on-adding-file branch August 17, 2019 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants