Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
various doc fixes
  • Loading branch information
jose-torres committed Mar 3, 2018
commit 79495b1f9e994f77ccf40c47eb2fb0baf5873f66
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
* succeeds), a {@link WriterCommitMessage} will be sent to the driver side and pass to
* {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit messages from other data
* writers. If this data writer fails(one record fails to write or {@link #commit()} fails), an
* exception will be sent to the driver side, and Spark will retry this writing task for some times,
* exception will be sent to the driver side, and Spark may retry this writing task for some times,
* each time {@link DataWriterFactory#createDataWriter(int, int, long)} gets a different
* `attemptNumber`, and finally call {@link DataSourceWriter#abort(WriterCommitMessage[])} if all
* retry fail.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ public interface DataWriterFactory<T> extends Serializable {
* tasks with the same task id running at the same time. Implementations can
* use this attempt number to distinguish writers of different task attempts.
* @param epochId A monotonically increasing id for streaming queries that are split in to
* discrete periods of execution. For queries that execute as a single batch, this
* id will always be zero.
* discrete periods of execution. For non-streaming queries,
* this ID will always be 0.
*/
DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long epochId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add clear lifecycle semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the same interface for streaming and batch here? Is there a compelling reason to do so instead of adding StreamingWriterFactory? Are the guarantees for an epoch identical to those of a single batch job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantees are identical, and in the current execution model, each epoch is in fact processed by a single batch job.

}