Skip to content

Commit a4ef126

Browse files
committed
Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings
1 parent 65f66cd commit a4ef126

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

docs/streaming-programming-guide.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1837,34 +1837,35 @@ With this basic knowledge, let us understand the fault-tolerance semantics of Sp
18371837

18381838
## Semantics with files as input source
18391839
{:.no_toc}
1840-
In this case, since all the input data is already present in a fault-tolerant files system like
1840+
If all of the input data is already present in a fault-tolerant files system like
18411841
HDFS, Spark Streaming can always recover from any failure and process all the data. This gives
18421842
*exactly-once* semantics, that all the data will be processed exactly once no matter what fails.
18431843

18441844
## Semantics with input sources based on receivers
18451845
{:.no_toc}
1846-
Here we will first discuss the semantics in the context of different types of failures. As we
1847-
discussed [earlier](#receiver-reliability), there are two kinds of receivers.
1846+
For input sources based on receivers, the fault-tolerance semantics depend on both the failure
1847+
scenario and type of receiver.
1848+
As we discussed [earlier](#receiver-reliability), there are two types of receivers:
18481849

18491850
1. *Reliable Receiver* - These receivers acknowledge reliable sources only after ensuring that
18501851
the received data has been replicated. If such a receiver fails,
18511852
the buffered (unreplicated) data does not get acknowledged to the source. If the receiver is
1852-
restarted, the source would resend the data, and so no data will be lost due to the failure.
1853+
restarted, the source will resend the data, and therefore no data will be lost due to the failure.
18531854
1. *Unreliable Receiver* - Such receivers can lose data when they fail due to worker
18541855
or driver failures.
18551856

18561857
Depending on what type of receivers are used we achieve the following semantics.
18571858
If a worker node fails, then there is no data loss with reliable receivers. With unreliable
18581859
receivers, data received but not replicated can get lost. If the driver node fails,
1859-
then besides these losses, all the past data that were received and replicated in memory will be
1860+
then besides these losses, all the past data that was received and replicated in memory will be
18601861
lost. This will affect the results of the stateful transformations.
18611862

1862-
To avoid this loss of past received data, Spark 1.2 introduces an experimental feature of write
1863-
ahead logs, that saves the received data to a fault-tolerant storage. With the [write ahead logs
1863+
To avoid this loss of past received data, Spark 1.2 introduces an experimental feature of _write
1864+
ahead logs_ which saves the received data to fault-tolerant storage. With the [write ahead logs
18641865
enabled](#deploying-applications) and reliable receivers, there is zero data loss and
18651866
exactly-once semantics.
18661867

1867-
The following table summarizes the semantics under failures.
1868+
The following table summarizes the semantics under failures:
18681869

18691870
<table class="table">
18701871
<tr>

0 commit comments

Comments
 (0)