Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling count() from inside of map()), and call certain methods on stopped SparkContexts. Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow.

In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them).

In SparkContext, the new assertNotStopped() method is used to check whether methods are being invoked on a stopped SparkContext. In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added assertNotStopped() calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null dagScheduler pointer).

@SparkQA
Copy link

SparkQA commented Jan 3, 2015

Test build #25005 has finished for PR 3884 at commit 15b2e6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sryza
Copy link
Contributor

sryza commented Jan 4, 2015

Will this work for broadcast variables as well? One thing I often see is users trying to directly broadcast an RDD without collecting it.

@JoshRosen
Copy link
Contributor Author

@sryza Good idea; I've added a new check which prevents RDDs from being directly broadcasted.

I should probably add these checks to PySpark, too. I'm not actually sure what happens if you try to do these invalid things in PySpark, so I should probably try them first and add their errors / stacktraces to the JIRA so that it's easier for me / the support team to pattern-match to this issue.

@JoshRosen JoshRosen changed the title [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations [SPARK-5063] Useful error messages for nested RDDs, broadcasted RDDs, and actions inside of transformations Jan 4, 2015
@SparkQA
Copy link

SparkQA commented Jan 4, 2015

Test build #25036 has finished for PR 3884 at commit 57cc8a1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Haha, the org.apache.spark.broadcast.BroadcastSuite.Using broadcast after destroy prints callsite test actually broadcasts an RDD (which is invalid), which is what caused that test failure. I'll fix this up in my next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing to a JIRA ticket might not be the most friendly way for users. Maybe make it more verbose and explain it in one or two lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. How about this:

RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

Kind of verbose, but I think an example might be the clearest way to explain this, esp. to someone unfamiliar with the terminology.

It might be nice to keep the JIRA reference since it will make the exception easier to search for (I'm kind of inspired by React.js's error messages, which include URL-shortened links to the documentation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@JoshRosen
Copy link
Contributor Author

I've added some additional tests to prevent users from calling methods on a stopped SparkContext, since this usually resulted in confusing NullPointerExceptions.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25084 has finished for PR 3884 at commit 99cc09f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25085 has finished for PR 3884 at commit b39e041.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25114 has finished for PR 3884 at commit b39e041.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Any opinions on the assertNotStopped() checks here? I'd like to backport this patch to other branches since I think it's a huge usability improvement. If there are any changes here that you think might break user programs that used to work, then I'll remove them and re-add them in a separate PR.

(Note: I still need to do the PySpark half of the "nested RDDs" and "nested actions" checks)

@rxin
Copy link
Contributor

rxin commented Jan 7, 2015

Maybe IllegalStateException?

@JoshRosen
Copy link
Contributor Author

Alright, I've updated this to use IllegalStateException when methods are called on a stopped SparkContext. I've also added some more helpful error messages to PySpark when users attempt to mis-use SparkContext or RDDs from actions, transformations, or broadcast variables.

I plan to merge this into master, then backport a smaller patch which excludes most of the assertNotStopped() calls.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25426 has finished for PR 3884 at commit 9f6a0b8.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25433 timed out for PR 3884 at commit 6ef68d0 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25441 has finished for PR 3884 at commit 8cff41a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

scala> sc.getRDDStorageInfo
org.apache.spark.SparkException: Error sending message as actor is null [message = GetStorageStatus]
    at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:178)
    at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221)
    at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:152)
    at org.apache.spark.SparkContext.getExecutorStorageStatus(SparkContext.scala:1068)
    at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1052)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:13)
    at $iwC$$iwC$$iwC.<init>(<console>:18)
    at $iwC$$iwC.<init>(<console>:20)
    at $iwC.<init>(<console>:22)
    at <init>(<console>:24)
    at .<init>(<console>:28)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(N

@JoshRosen
Copy link
Contributor Author

I audited the uses of assertNotStopped and removed a bunch of calls in methods that sometimes didn't throw exceptions on Spark 1.2.0. Pending Jenkins, I'm planning to commit this slightly smaller patch to branch-1.2 for inclusion in 1.2.1.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25928 has finished for PR 3884 at commit 2d0d7f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen JoshRosen changed the title [SPARK-5063] Useful error messages for nested RDDs, broadcasted RDDs, and actions inside of transformations [SPARK-5063] More helpful error messages for several invalid operations Jan 22, 2015
@JoshRosen
Copy link
Contributor Author

Alright, this is ready for a final review + commit if anyone wants to take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe this check should go somewhere else, since I think that it might technically have been safe to create a broadcast variable with an RDD, even though doing anything with it would trigger errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this in my latest patch; we log a warning here and any errors are caught by the more general "display an error about RDD nesting if the sc field is null" check.

@SparkQA
Copy link

SparkQA commented Jan 23, 2015

Test build #26026 has finished for PR 3884 at commit a943e00.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "transforamtion"

@vanzin
Copy link
Contributor

vanzin commented Jan 23, 2015

Scala changes look ok to me; I'm not super familiar with the pyspark internals, but the check on rdd.py surprised me because I thought RDDs were actually serialized, at least on the Scala side.

@JoshRosen
Copy link
Contributor Author

@vanzin Thanks for looking this over. The Python RDD objects themselves are never actually serialized and are used internally in a way that's slightly different than in Scala/Java Spark. In the existing code, any attempt to serialize instances of those Python classes throws an exception in the __getnewargs__ method, which is why I was able to add new exceptions there.

I'm going to fix the spelling error, take one final look over this, and commit it so we can get it into the first 1.2.1 RC. I saw a couple of mailing list questions yesterday that could have been prevented by this patch, which illustrates why I really want to get this into our next maintenance release.

asfgit pushed a commit that referenced this pull request Jan 24, 2015
This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling `count()` from inside of `map()`), and call certain methods on stopped SparkContexts.  Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow.

In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them).

In SparkContext, the new `assertNotStopped()` method is used to check whether methods are being invoked on a stopped SparkContext.  In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added `assertNotStopped()` calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null `dagScheduler` pointer).

Author: Josh Rosen <[email protected]>

Closes #3884 from JoshRosen/SPARK-5063 and squashes the following commits:

a38774b [Josh Rosen] Fix spelling typo
a943e00 [Josh Rosen] Convert two exceptions into warnings in order to avoid breaking user programs in some edge-cases.
2d0d7f7 [Josh Rosen] Fix test to reflect 1.2.1 compatibility
3f0ea0c [Josh Rosen] Revert two unintentional formatting changes
8e5da69 [Josh Rosen] Remove assertNotStopped() calls for methods that were sometimes safe to call on stopped SC's in Spark 1.2
8cff41a [Josh Rosen] IllegalStateException fix
6ef68d0 [Josh Rosen] Fix Python line length issues.
9f6a0b8 [Josh Rosen] Add improved error messages to PySpark.
13afd0f [Josh Rosen] SparkException -> IllegalStateException
8d404f3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5063
b39e041 [Josh Rosen] Fix BroadcastSuite test which broadcasted an RDD
99cc09f [Josh Rosen] Guard against calling methods on stopped SparkContexts.
34833e8 [Josh Rosen] Add more descriptive error message.
57cc8a1 [Josh Rosen] Add error message when directly broadcasting RDD.
15b2e6b [Josh Rosen] [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations

(cherry picked from commit cef1f09)
Signed-off-by: Josh Rosen <[email protected]>
@asfgit asfgit closed this in cef1f09 Jan 24, 2015
@JoshRosen
Copy link
Contributor Author

Alright, I've merged this into master (1.3.0) and branch-1.2 (1.2.1).

@JoshRosen JoshRosen deleted the SPARK-5063 branch January 24, 2015 01:54
@SparkQA
Copy link

SparkQA commented Jan 24, 2015

Test build #26037 has finished for PR 3884 at commit a38774b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor

Thank you @JoshRosen for working on usability issues like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants