You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/streaming-programming-guide.md
+89Lines changed: 89 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1415,6 +1415,95 @@ Note that the connections in the pool should be lazily created on demand and tim
1415
1415
1416
1416
***
1417
1417
1418
+
## Accumulator and Broadcast
1419
+
1420
+
Accumulator and Broadcast cannot be recovered from checkpoint in Streaming. If you enable checkpoint and use Accumulator or Broadcast as well, you have to create lazily instantiated singleton instances for Accumulator and Broadcast so that they can be restarted on driver failures. This is shown in the following example.
1421
+
1422
+
<divclass="codetabs">
1423
+
<divdata-lang="scala"markdown="1">
1424
+
{% highlight scala %}
1425
+
1426
+
object WordBlacklist {
1427
+
1428
+
@volatile private var instance: Broadcast[Seq[String]] = null
See the full [source code]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala).
1483
+
</div>
1484
+
<divdata-lang="java"markdown="1">
1485
+
{% highlight java %}
1486
+
1487
+
TODO
1488
+
1489
+
{% endhighlight %}
1490
+
1491
+
See the full [source code]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java).
1492
+
</div>
1493
+
<divdata-lang="python"markdown="1">
1494
+
{% highlight python %}
1495
+
1496
+
TODO
1497
+
1498
+
{% endhighlight %}
1499
+
1500
+
See the full [source code]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/streaming/recoverable_network_wordcount.py).
1501
+
1502
+
</div>
1503
+
</div>
1504
+
1505
+
***
1506
+
1418
1507
## DataFrame and SQL Operations
1419
1508
You can easily use [DataFrames and SQL](sql-programming-guide.html) operations on streaming data. You have to create a SQLContext using the SparkContext that the StreamingContext is using. Furthermore this has to done such that it can be restarted on driver failures. This is done by creating a lazily instantiated singleton instance of SQLContext. This is shown in the following example. It modifies the earlier [word count example](#a-quick-example) to generate word counts using DataFrames and SQL. Each RDD is converted to a DataFrame, registered as a temporary table and then queried using SQL.
0 commit comments