Commit 9305cc7
[SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion
### What changes were proposed in this pull request?
This PR fixes the issue described in https://issues.apache.org/jira/browse/SPARK-39084 where calling `df.rdd.isEmpty()` on a particular dataset could result in a JVM crash and/or executor failure.
The issue was due to Python iterator not being synchronised with Java iterator so when the task is complete, the Python iterator continues to process data. We have introduced ContextAwareIterator as part of https://issues.apache.org/jira/browse/SPARK-33277 but we did not fix all of the places where this should be used.
### Why are the changes needed?
Fixes the JVM crash when checking isEmpty() on a dataset.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I added a test case that reproduces the issue 100%. I confirmed that the test fails without the fix and passes with the fix.
Closes #36425 from sadikovi/fix-pyspark-iter-2.
Authored-by: Ivan Sadikov <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>1 parent 6479455 commit 9305cc7
File tree
2 files changed
+38
-1
lines changed- python/pyspark/sql/tests
- sql/core/src/main/scala/org/apache/spark/sql/execution/python
2 files changed
+38
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
1176 | 1177 | | |
1177 | 1178 | | |
1178 | 1179 | | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
1179 | 1215 | | |
1180 | 1216 | | |
1181 | 1217 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
301 | 302 | | |
302 | 303 | | |
303 | 304 | | |
304 | | - | |
| 305 | + | |
305 | 306 | | |
306 | 307 | | |
307 | 308 | | |
0 commit comments