-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note #21771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1496,6 +1496,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported | ||
| * filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, | ||
| * use `SparkFiles.get(fileName)` to find its download location. | ||
| * | ||
| * @note A path can be added only once. Subsequent additions of the same path are ignored. | ||
| */ | ||
| def addFile(path: String): Unit = { | ||
| addFile(path, false) | ||
|
|
@@ -1516,6 +1518,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| * use `SparkFiles.get(fileName)` to find its download location. | ||
| * @param recursive if true, a directory can be given in `path`. Currently directories are | ||
| * only supported for Hadoop-supported filesystems. | ||
| * | ||
| * @note A path can be added only once. Subsequent additions of the same path are ignored. | ||
| */ | ||
| def addFile(path: String, recursive: Boolean): Unit = { | ||
| val uri = new Path(path).toUri | ||
|
|
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { | |
| Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, | ||
| env.securityManager, hadoopConfiguration, timestamp, useCache = false) | ||
| postEnvironmentUpdate() | ||
| } else { | ||
| logWarning(s"The path $path has been added already. Overwriting of added paths " + | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shell we leave a JIRA link as a comment for example SPARK-16787 and/or SPARK-19417
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We normally do not post the JIRA number in the message.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant a comment. |
||
| "is not supported in the current version.") | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -1803,6 +1810,8 @@ class SparkContext(config: SparkConf) extends Logging { | |
| * | ||
| * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), | ||
| * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. | ||
| * | ||
| * @note A path can be added only once. Subsequent additions of the same path are ignored. | ||
| */ | ||
| def addJar(path: String) { | ||
| def addJarFile(file: File): String = { | ||
|
|
@@ -1849,6 +1858,9 @@ class SparkContext(config: SparkConf) extends Logging { | |
| if (addedJars.putIfAbsent(key, timestamp).isEmpty) { | ||
| logInfo(s"Added JAR $path at $key with timestamp $timestamp") | ||
| postEnvironmentUpdate() | ||
| } else { | ||
| logWarning(s"The jar $path has been added already. Overwriting of added jars " + | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could confuse what it means with |
||
| "is not supported in the current version.") | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, @MaxGekk, how about we just give warnings without notes for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notes are also reasonable to me. This is a common user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering how common it is for an user to add the same jar expecting it will overwrite since mostly we consider those cases as immutable resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Our support receives a few "bug" reports per months. For now we can provide a link to the note at least. The warning itself is needed to our support engineers to detect such kind of problems from logs of already finished jobs. Actually customers do not say in their bug reports that files/jars weren't overwritten (it would be easier). They report problems like calling a method from a lib crashes due to incompatible signature of method or a class doesn't exists. Or final result of a Spark job is not correct because a config/resource files added via
addFile()is not up to date. Now I can detect the situation from logs and provide a link to docs foraddFile()/addJar().There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I'm happy with the warning message but less sure if we note. I'm okay.