Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Clarify docstring for foreachPartition
Due to the underlying use of `mapPartitions` which requires a function that maps partitions to partitions, `foreachPartition` requires the function passed to be a generator function or return an iterable (although these results are discarded). 

This is currently not stated in the documentation except through the unexplained example. It would help users to understand that example and not waste time with this error:

```
TypeError: 'NoneType' object is not iterable
```
  • Loading branch information
tdhopper committed Oct 22, 2014
commit e22f753ebcd5c7f44f9699481ad26d2263290097
5 changes: 5 additions & 0 deletions python/pyspark/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,11 @@ def processPartition(iterator):
def foreachPartition(self, f):
"""
Applies a function to each partition of this RDD.


Note: Due to implementation, f must either return an iterable object
or be a generator function. However, foreachPartition always returns
`None`.

>>> def f(iterator):
... for x in iterator:
Expand Down