Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
change comment
  • Loading branch information
jose-torres committed Mar 2, 2018
commit b2ee7f35f71044eae531ba3f8e72cd55bd7c7628
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ public interface DataWriterFactory<T> extends Serializable {
* same task id but different attempt number, which means there are multiple
* tasks with the same task id running at the same time. Implementations can
* use this attempt number to distinguish writers of different task attempts.
* @param epochId An opaque identifier for the current subset of data being operated on. Intended
* for use in streaming contexts by implementations of
* {@link org.apache.spark.sql.sources.v2.writer.streaming.StreamWriter}.
* @param epochId A monotonically increasing id for streaming queries that are split in to discrete
* periods of execution. For queries that execute as a single batch, this id will
* always be zero.
*/
DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long epochId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add clear lifecycle semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the same interface for streaming and batch here? Is there a compelling reason to do so instead of adding StreamingWriterFactory? Are the guarantees for an epoch identical to those of a single batch job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantees are identical, and in the current execution model, each epoch is in fact processed by a single batch job.

}