-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5063] More helpful error messages for several invalid operations #3884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… of transformations
|
Test build #25005 has finished for PR 3884 at commit
|
|
Will this work for broadcast variables as well? One thing I often see is users trying to directly broadcast an RDD without collecting it. |
|
@sryza Good idea; I've added a new check which prevents RDDs from being directly broadcasted. I should probably add these checks to PySpark, too. I'm not actually sure what happens if you try to do these invalid things in PySpark, so I should probably try them first and add their errors / stacktraces to the JIRA so that it's easier for me / the support team to pattern-match to this issue. |
|
Test build #25036 has finished for PR 3884 at commit
|
|
Haha, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing to a JIRA ticket might not be the most friendly way for users. Maybe make it more verbose and explain it in one or two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. How about this:
RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example,
rdd1.map(x => rdd2.values.count() * x)is invalid because thevaluestransformation andcountaction cannot be performed inside of therdd1.maptransformation. For more information, see SPARK-5063.
Kind of verbose, but I think an example might be the clearest way to explain this, esp. to someone unfamiliar with the terminology.
It might be nice to keep the JIRA reference since it will make the exception easier to search for (I'm kind of inspired by React.js's error messages, which include URL-shortened links to the documentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
|
I've added some additional tests to prevent users from calling methods on a stopped SparkContext, since this usually resulted in confusing NullPointerExceptions. |
|
Test build #25084 has finished for PR 3884 at commit
|
|
Test build #25085 has finished for PR 3884 at commit
|
|
Jenkins, retest this please. |
|
Test build #25114 has finished for PR 3884 at commit
|
|
Any opinions on the (Note: I still need to do the PySpark half of the "nested RDDs" and "nested actions" checks) |
|
Maybe IllegalStateException? |
|
Alright, I've updated this to use IllegalStateException when methods are called on a stopped SparkContext. I've also added some more helpful error messages to PySpark when users attempt to mis-use SparkContext or RDDs from actions, transformations, or broadcast variables. I plan to merge this into master, then backport a smaller patch which excludes most of the |
|
Test build #25426 has finished for PR 3884 at commit
|
|
Test build #25433 timed out for PR 3884 at commit |
|
Test build #25441 has finished for PR 3884 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
scala> sc.getRDDStorageInfo
org.apache.spark.SparkException: Error sending message as actor is null [message = GetStorageStatus]
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178)
at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221)
at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:152)
at org.apache.spark.SparkContext.getExecutorStorageStatus(SparkContext.scala:1068)
at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1052)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:13)
at $iwC$$iwC$$iwC.<init>(<console>:18)
at $iwC$$iwC.<init>(<console>:20)
at $iwC.<init>(<console>:22)
at <init>(<console>:24)
at .<init>(<console>:28)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(N
|
I audited the uses of |
|
Test build #25928 has finished for PR 3884 at commit
|
|
Alright, this is ready for a final review + commit if anyone wants to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe this check should go somewhere else, since I think that it might technically have been safe to create a broadcast variable with an RDD, even though doing anything with it would trigger errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this in my latest patch; we log a warning here and any errors are caught by the more general "display an error about RDD nesting if the sc field is null" check.
…programs in some edge-cases.
|
Test build #26026 has finished for PR 3884 at commit
|
python/pyspark/rdd.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "transforamtion"
|
Scala changes look ok to me; I'm not super familiar with the pyspark internals, but the check on |
|
@vanzin Thanks for looking this over. The Python I'm going to fix the spelling error, take one final look over this, and commit it so we can get it into the first 1.2.1 RC. I saw a couple of mailing list questions yesterday that could have been prevented by this patch, which illustrates why I really want to get this into our next maintenance release. |
This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling `count()` from inside of `map()`), and call certain methods on stopped SparkContexts. Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow. In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them). In SparkContext, the new `assertNotStopped()` method is used to check whether methods are being invoked on a stopped SparkContext. In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added `assertNotStopped()` calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null `dagScheduler` pointer). Author: Josh Rosen <[email protected]> Closes #3884 from JoshRosen/SPARK-5063 and squashes the following commits: a38774b [Josh Rosen] Fix spelling typo a943e00 [Josh Rosen] Convert two exceptions into warnings in order to avoid breaking user programs in some edge-cases. 2d0d7f7 [Josh Rosen] Fix test to reflect 1.2.1 compatibility 3f0ea0c [Josh Rosen] Revert two unintentional formatting changes 8e5da69 [Josh Rosen] Remove assertNotStopped() calls for methods that were sometimes safe to call on stopped SC's in Spark 1.2 8cff41a [Josh Rosen] IllegalStateException fix 6ef68d0 [Josh Rosen] Fix Python line length issues. 9f6a0b8 [Josh Rosen] Add improved error messages to PySpark. 13afd0f [Josh Rosen] SparkException -> IllegalStateException 8d404f3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5063 b39e041 [Josh Rosen] Fix BroadcastSuite test which broadcasted an RDD 99cc09f [Josh Rosen] Guard against calling methods on stopped SparkContexts. 34833e8 [Josh Rosen] Add more descriptive error message. 57cc8a1 [Josh Rosen] Add error message when directly broadcasting RDD. 15b2e6b [Josh Rosen] [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations (cherry picked from commit cef1f09) Signed-off-by: Josh Rosen <[email protected]>
|
Alright, I've merged this into |
|
Test build #26037 has finished for PR 3884 at commit
|
|
Thank you @JoshRosen for working on usability issues like this. |
This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling
count()from inside ofmap()), and call certain methods on stopped SparkContexts. Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow.In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them).
In SparkContext, the new
assertNotStopped()method is used to check whether methods are being invoked on a stopped SparkContext. In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only addedassertNotStopped()calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a nulldagSchedulerpointer).