-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode #29000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
2a87bc0
269f09b
16e219a
8b131fa
d306538
85aa12a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,17 +41,23 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil | |
| * @param jobId the job's or stage's id | ||
| * @param path the job's output path, or null if committer acts as a noop | ||
| * @param dynamicPartitionOverwrite If true, Spark will overwrite partition directories at runtime | ||
| * dynamically, i.e., for speculative tasks, we first write files | ||
| * to task attempt paths under a staging directory, e.g. | ||
| * /path/to/staging/.spark-staging-{jobId}/_temporary/ | ||
| * dynamically, i.e., we first write files to task attempt paths | ||
| * under a staging directory, e.g. | ||
| * /path/to/outputPath/.spark-staging-{jobId}/_temporary/ | ||
| * {appAttemptId}/_temporary/{taskAttemptId}/a=1/b=1/xxx.parquet. | ||
| * When committing the job, we first move files from task attempt | ||
| * 1. When [[FileOutputCommitter]] algorithm version set to 1, | ||
| * we firstly move files from task attempt | ||
| * paths to corresponding partition directories under the staging | ||
| * directory, e.g. | ||
| * /path/to/staging/.spark-staging-{jobId}/a=1/b=1. | ||
| * directory during committing job, e.g. | ||
| * /path/to/outputPath/.spark-staging-{jobId}/a=1/b=1. | ||
| * Secondly, move the partition directories under staging | ||
| * directory to partition directories under destination path, | ||
| * e.g. /path/to/destination/a=1/b=1 | ||
| * directory to destination path, e.g. /path/to/outputPath/a=1/b=1 | ||
| * 2. When [[FileOutputCommitter]] algorithm version set to 2, | ||
| * committing tasks directly move files to staging directory, | ||
| * e.g. /path/to/outputPath/.spark-staging-{jobId}/a=1/b=1. | ||
| * Then move this partition directories under staging directory | ||
| * to destination path during job committing, e.g. | ||
| * /path/to/outputPath/a=1/b=1 | ||
|
||
| */ | ||
| class HadoopMapReduceCommitProtocol( | ||
| jobId: String, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this isn't the normal behavior of the algorithm version 2, right? Normally it writes the task files directly to the final output location. The whole point of algorithm 2 is to prevent all of the extra moves on the driver at the end of the job. For large jobs this time can be huge. I'm not sure the benefit here of algorithm 2 because that is all happening distributed on each task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v2 isn't safe in the presence of failures during task commit; at least here if the entire job fails then, provided job ids are unique, the output doesn't become visible. it is essentially a second attempt at the v1 rename algorithm with (hopefully) smaller output datasets.