Skip to content

Conversation

@chipsenkbeil
Copy link

As requested in SPARK-4923, I've provided a rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has the old scala.tools.nsc package and the SparkIMain does not appear to have the class server needed for shipping code over (unless this functionality has been moved elsewhere?). I also left alone the ExecutorClassLoader and ConstructorCleaner as I have no experience working with those classes.

This marks the majority of methods in SparkIMain as private with a few special cases being private[repl] as other classes within the same package access them. Any public method has been marked with @DeveloperApi as suggested by @pwendell and I took the liberty of writing up a Scaladoc for each one to further elaborate their usage.

As the Scala 2.11 REPL conforms to JSR-223, the Spark Kernel uses the SparkIMain of Scala 2.10 in the same manner. So, I've taken care to expose methods predominately related to necessary functionality towards a JSR-223 scripting engine implementation.

  1. The ability to get variables from the interpreter (and other information like class/symbol/type)
  2. The ability to put variables into the interpreter
  3. The ability to compile code
  4. The ability to execute code
  5. The ability to get contextual information regarding the scripting environment

Additional functionality that I marked as exposed included the following:

  1. The blocking initialization method (needed to actually start SparkIMain instance)
  2. The class server uri (needed to set the spark.repl.class.uri property after initialization), reduced from the entire class server
  3. The class output directory (beneficial for tools like ours that need to inspect and use the directory where class files are served)
  4. Suppression (quiet/silence) mechanics for output
  5. Ability to add a jar to the compile/runtime classpath
  6. The reset/close functionality
  7. Metric information (last variable assignment, "needed" for extracting results from last execution, real variable name for better debugging)
  8. Execution wrapper (useful to have, but debatable)

Aside from SparkIMain, I updated other classes/traits and their methods in the repl package to be private/package protected where possible. A few odd cases (like the SparkHelper being in the scala.tools.nsc package to expose a private variable) still exist, but I did my best at labelling them.

SparkCommandLine has proven useful to extract settings and SparkJLineCompletion has proven to be useful in implementing auto-completion in the Spark Kernel project. Other than those - and SparkIMain - my experience has yielded that other classes/methods are not necessary for interactive applications taking advantage of the REPL API.

Tested via the following:

$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test

Also did a quick verification that I could start the shell and execute some code:

$ ./bin/spark-shell
...

scala> val x = 3
x: Int = 3

scala> sc.parallelize(1 to 10).reduce(_+_)
...
res1: Int = 55

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@chipsenkbeil chipsenkbeil changed the title [REPL][SPARK-4923] Add Developer API to REPL to allow re-publishing the REPL jar [SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing the REPL jar Jan 14, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be marked as a devloper API also, if it's exposed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do you use the DeveloperApi for entire classes? Or are you saying that since the purpose I described for SparkCommandLine was to retrieve settings?

I didn't mark SparkILoop, SparkIMain, SparkHelper (forced to be public due to packaging), SparkJLineCompletion, or SparkCommandLine as DeveloperApi on the class level. I was assuming that internal markings of DeveloperApi conveyed that. I can go back and do that if that's the way things are normally done.

Or, I can just add it to SparkCommandLine since it is the only one without any internal DeveloperApi marks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually out of abundance of caution we mark it on the class level as well, even if everything bytecode-exposed inside of the class is also marked. We tend to err on the side of over communication with this.

@pwendell
Copy link
Contributor

Jenkins, test this please. Thanks a lot @rcsenkbeil for such a thorough treatment of this! LGTM in general, I made a minor comment.

@SparkQA
Copy link

SparkQA commented Jan 15, 2015

Test build #25617 has started for PR 4034 at commit 6dc1ee2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 15, 2015

Test build #25617 has finished for PR 4034 at commit 6dc1ee2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkILoop(
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the class representing the id (variable name, method name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @return Some containing term name (id) class if exists, else None
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the runtime class and type representing the id (variable name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25617/
Test PASSed.

@pwendell
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25637 has started for PR 4034 at commit c1b88aa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25637 has finished for PR 4034 at commit c1b88aa.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkILoop(
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the class representing the id (variable name, method name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @return Some containing term name (id) class if exists, else None
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the runtime class and type representing the id (variable name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25637/
Test FAILed.

@chipsenkbeil
Copy link
Author

Whoops, missed adding an import for the DeveloperApi on SparkCommandLine. Give me a minute to add it.

@chipsenkbeil
Copy link
Author

Took a little longer on an old computer, but I made sure it built successfully on my end this time. Should hopefully be good to go now.

@pwendell
Copy link
Contributor

Jenkins ok to test. Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25652 has started for PR 4034 at commit 053ca75.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25652 has finished for PR 4034 at commit 053ca75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkILoop(
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the class representing the id (variable name, method name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @return Some containing term name (id) class if exists, else None
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose
    • * Retrieves the runtime class and type representing the id (variable name,
    • * @param id The id (variable name, method name, class name, etc) whose
    • * @param id The id (variable name, method name, class name, etc) whose

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25652/
Test PASSed.

@pwendell
Copy link
Contributor

Thanks Chip - I will pull this in. After some more thought on this, I'm just going to pull this into master for 1.3+ and for 1.2 we'll just publish the original REPL with all the open permissions. My concern is giving people some time to adjust to the new locked down permissions before we ship it in a release.

@asfgit asfgit closed this in d05c9ee Jan 16, 2015
@chipsenkbeil
Copy link
Author

@pwendell, that sounds like a good decision. Thanks for letting me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants