-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note #21771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #93005 has finished for PR 21771 at commit
|
|
Test build #93007 has finished for PR 21771 at commit
|
| env.securityManager, hadoopConfiguration, timestamp, useCache = false) | ||
| postEnvironmentUpdate() | ||
| } else { | ||
| logWarning(s"The path $path has been added already. Overwriting of added paths " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, @MaxGekk, how about we just give warnings without notes for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notes are also reasonable to me. This is a common user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering how common it is for an user to add the same jar expecting it will overwrite since mostly we consider those cases as immutable resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Our support receives a few "bug" reports per months. For now we can provide a link to the note at least. The warning itself is needed to our support engineers to detect such kind of problems from logs of already finished jobs. Actually customers do not say in their bug reports that files/jars weren't overwritten (it would be easier). They report problems like calling a method from a lib crashes due to incompatible signature of method or a class doesn't exists. Or final result of a Spark job is not correct because a config/resource files added via addFile() is not up to date. Now I can detect the situation from logs and provide a link to docs for addFile()/addJar().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I'm happy with the warning message but less sure if we note. I'm okay.
| env.securityManager, hadoopConfiguration, timestamp, useCache = false) | ||
| postEnvironmentUpdate() | ||
| } else { | ||
| logWarning(s"The path $path has been added already. Overwriting of added paths " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell we leave a JIRA link as a comment for example SPARK-16787 and/or SPARK-19417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally do not post the JIRA number in the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a comment.
|
Seems fine to me. |
|
LGTM |
|
Thanks! Merged to master. |
| logInfo(s"Added JAR $path at $key with timestamp $timestamp") | ||
| postEnvironmentUpdate() | ||
| } else { | ||
| logWarning(s"The jar $path has been added already. Overwriting of added jars " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could confuse what it means with spark.files.overwrite.
|
We could have updated the doc for |
What changes were proposed in this pull request?
In the PR, I propose to output an warning if the
addFile()oraddJar()methods are callled more than once for the same path. Currently, overwriting of already added files is not supported. New comments and warning are reflected the existing behaviour.