@@ -1837,34 +1837,35 @@ With this basic knowledge, let us understand the fault-tolerance semantics of Sp
18371837
18381838## Semantics with files as input source
18391839{:.no_toc}
1840- In this case, since all the input data is already present in a fault-tolerant files system like
1840+ If all of the input data is already present in a fault-tolerant files system like
18411841HDFS, Spark Streaming can always recover from any failure and process all the data. This gives
18421842* exactly-once* semantics, that all the data will be processed exactly once no matter what fails.
18431843
18441844## Semantics with input sources based on receivers
18451845{:.no_toc}
1846- Here we will first discuss the semantics in the context of different types of failures. As we
1847- discussed [ earlier] ( #receiver-reliability ) , there are two kinds of receivers.
1846+ For input sources based on receivers, the fault-tolerance semantics depend on both the failure
1847+ scenario and type of receiver.
1848+ As we discussed [ earlier] ( #receiver-reliability ) , there are two types of receivers:
18481849
184918501 . * Reliable Receiver* - These receivers acknowledge reliable sources only after ensuring that
18501851 the received data has been replicated. If such a receiver fails,
18511852 the buffered (unreplicated) data does not get acknowledged to the source. If the receiver is
1852- restarted, the source would resend the data, and so no data will be lost due to the failure.
1853+ restarted, the source will resend the data, and therefore no data will be lost due to the failure.
185318541 . * Unreliable Receiver* - Such receivers can lose data when they fail due to worker
18541855 or driver failures.
18551856
18561857Depending on what type of receivers are used we achieve the following semantics.
18571858If a worker node fails, then there is no data loss with reliable receivers. With unreliable
18581859receivers, data received but not replicated can get lost. If the driver node fails,
1859- then besides these losses, all the past data that were received and replicated in memory will be
1860+ then besides these losses, all the past data that was received and replicated in memory will be
18601861lost. This will affect the results of the stateful transformations.
18611862
1862- To avoid this loss of past received data, Spark 1.2 introduces an experimental feature of write
1863- ahead logs, that saves the received data to a fault-tolerant storage. With the [ write ahead logs
1863+ To avoid this loss of past received data, Spark 1.2 introduces an experimental feature of _ write
1864+ ahead logs _ which saves the received data to fault-tolerant storage. With the [ write ahead logs
18641865enabled] ( #deploying-applications ) and reliable receivers, there is zero data loss and
18651866exactly-once semantics.
18661867
1867- The following table summarizes the semantics under failures.
1868+ The following table summarizes the semantics under failures:
18681869
18691870<table class =" table " >
18701871 <tr >
0 commit comments