Skip to content

Conversation

@ksakellis
Copy link

Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming.

In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24494 has started for PR 3711 at commit b1f715d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24494 has finished for PR 3711 at commit b1f715d.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class SparkListenerAdapter implements SparkListener
    • case class SparkListenerExecutorAdded(executorId: String, executorInfo : ExecutorInfo)
    • case class SparkListenerExecutorRemoved(executorId: String, executorInfo : ExecutorInfo)
    • class ExecutorInfo(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24494/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24496 has started for PR 3711 at commit ab2575f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24496 has finished for PR 3711 at commit ab2575f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class SparkListenerAdapter implements SparkListener
    • case class SparkListenerExecutorAdded(executorId: String, executorInfo : ExecutorInfo)
    • case class SparkListenerExecutorRemoved(executorId: String, executorInfo : ExecutorInfo)
    • class ExecutorInfo(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24496/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this class? Why not just store it in the events themselves?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was so that we can keep the ExeuctorInfo for the internal akka message and the SparkListener in sync. The parent class, ExecutorInfo, contains all the info we want to share with the SparkListener and others and the ExecutorData child class add additional information like the akka address that should not be exposed. Adding this info to the events themselves is possible but we will lead to duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that this should be a trait/interface rather than a class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'll second @andrewor14's comment from the other PR: I think that the ExecutorInfo name might be potentially confusing since we already have an internal ExecutorInfo class. So, if you think of a better name maybe we could change it.

@andrewor14 Regarding whether we should have a separate class or not, you could imagine that the web UI might want to store some information on executors for later display / lookup. By storing the executor info outside of the message, we can just store the ExecutorInfo class. If the fields were part of the event, then we'd have to store the events themselves or re-pack the data into some new class that would be essentially the same as this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen yes, we should probably make it a trait/interface instead of a class. I'll try and pick a better name than ExecutorInfo. I was trying to follow the TaskInfo pattern but yes it might be confusing since there is another class named ExecutorInfo.

@JoshRosen
Copy link
Contributor

Hi @ksakellis,

This PR has some nice additions and looks pretty good overall. In particular, I like the addition of the SparkListenerAdapter class, since it's a nice way of shielding Java users from binary incompatbility when we add things to SparkListener (this isn't an issue for Scala implementors of SparkListener, since they inherit default implementations from the trait). Good call on making it non-abstract, by the way.

I have one main piece of feedback: I think we need to add these new events to JSONProtocol / JSONProtocolSuite so that they're properly persisted in the event log. I think there's a bunch of examples of this in the code, so this should be fairly straightforward. You may end up having to write JSON serializers for ExecutorInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-iterate a style comment from Andrew: our style is to not place spaces before :, so this should be

executorInfo: ExecutorInfo

@ksakellis
Copy link
Author

@JoshRosen I actually couldn't make ExecutorInfo (now ExecutorDetails) a trait because the json protocol stuff requires a concrete object to deserialize events. The other option is to keep ExecutorDetails a trait and create a third object as a concrete implementation but that seemed to add unnecessary complexity.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25060 has started for PR 3711 at commit 6e06a79.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25061 has started for PR 3711 at commit 7b64ea0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25060 has finished for PR 3711 at commit 6e06a79.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class SparkListenerAdapter implements SparkListener
    • case class SparkListenerExecutorAdded(executorId: String, executorDetails: ExecutorDetails)
    • case class SparkListenerExecutorRemoved(executorId: String, executorDetails: ExecutorDetails)
    • class ExecutorDetails(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25060/
Test FAILed.

@JoshRosen
Copy link
Contributor

This looks like a legitimate test failure. Ther AMPLab webserver is having some issues today, so here's a different link to reach the same test result: https://hadrian.ist.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25060/testReport/

@ksakellis
Copy link
Author

Yes it is. Not sure why I changed the #of cores between the two commits in the unit test - weird. Anyways. it has been fixed.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25065 has started for PR 3711 at commit 776d743.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25061 has finished for PR 3711 at commit 7b64ea0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class SparkListenerAdapter implements SparkListener
    • case class SparkListenerExecutorAdded(executorId: String, executorDetails: ExecutorDetails)
    • case class SparkListenerExecutorRemoved(executorId: String, executorDetails: ExecutorDetails)
    • class ExecutorDetails(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25061/
Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling this JavaSparkListener - that's what we've tended to use in the past for things that were supposed to be "drop in" substitutes for Scala classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on JavaSparkListener. It's weird for the user to extend an "adapter"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for completeness (since I suggested the original name), listener/adapter is used extensively in the Java API (see java.awt.event, for example). An adapter is just a default implementation of a listener in Java-ese.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25065 has finished for PR 3711 at commit 776d743.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class SparkListenerAdapter implements SparkListener
    • case class SparkListenerExecutorAdded(executorId: String, executorDetails: ExecutorDetails)
    • case class SparkListenerExecutorRemoved(executorId: String, executorDetails: ExecutorDetails)
    • class ExecutorDetails(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25065/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the executor removed event need to include executor details? The listener already listens for executor added events, which contain the same details

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the event more self describing. Otherwise you'd need to also listen on the executorAdded event, have a hashmap then look it up in the executor removed method to find the details.

That being said, I don't really feel that strongly so we can remove the executorDetails structure if you are concerned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just kind of weird that you can't remove an executor without having full information about it. I think it's reasonable to expect the caller to listen for both the add and remove events.

@andrewor14
Copy link
Contributor

Left a few relatively minor comments. Otherwise this LGTM.

Kostas Sakellis added 5 commits January 7, 2015 16:07
Adds onExecutorAdded and onExecutorRemoved events to the
SparkListener. This will allow a client to get notified when an
executor has been added/removed and provide additional information
such as how many vcores it is consuming.

In addition, this commit adds a SparkListenerAdapter to the
Java API that provides default implementations to the SparkListener.
This is to get around the fact that default implementations for
traits don't work in Java. Having Java clients extend
SparkListenerAdapter moving forward will prevent breakage in java
when we add new events to SparkListener.
We want to persist the executor added/removed events.
Renamed ExecutorInfo to ExecutorDetails to avoid
conflicting name. Also additional CR feedback.
This is to support a Mesos fine grained scheduler.
@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25188 has started for PR 3711 at commit 946d2c5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25188 has finished for PR 3711 at commit 946d2c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25188/
Test PASSed.

@andrewor14
Copy link
Contributor

Ok, latest changes look fine to me. Any other comments @pwendell @JoshRosen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this into an integration test (the local-cluster mode is pretty expensive) could you modify LocalBackend to send an executor added event? Then you can still test with local mode, and also there will be less of an overall difference between the local and the cluster modes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, but then that is not really testing much is it. The point of this test was to test that we are getting the onExecutorAdded event when an executor is added to the cluster. Modifying the LocalBacked to publish these events will greatly(in my opinion) reduce the usefulness of the test. We'd only be testing some of the SparkListener plumbing and not the actual creation of the events.

@pwendell
Copy link
Contributor

LGTM - just had a minor comment that can also be addressed on merge. Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25406 has started for PR 3711 at commit 946d2c5.

  • This patch merges cleanly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we are doing this here instead of in "executorLost" method of DAGScheduler.scala ? (similarly for SparkListenerExecutorAdded event above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksakellis that seems like a good idea actually.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25406 has finished for PR 3711 at commit 946d2c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25406/
Test PASSed.

@pwendell
Copy link
Contributor

Actually, why not add these in the DAG Scheduler per @nitin2goyal's suggestion.

@ksakellis
Copy link
Author

@pwendell @nitin2goyal I had previously thought of adding it to the DAGScheduler - this would avoid the duplicated code between the CoarseGrainedSchedulerBackend and MesosSchedulerBackend. The reason I did not go with this approach is two fold:
a) Not enough information is available in the the DAGScheduler (don't know the core count). This is more minor and can be added to the callbacks.
b) More importantly, the code path from CoarseGrainedScheduler.resourceOffers->TaskSchedulerImpl.resourceOffers->DAGScheduler.executorAdded happens before the executor actually registers with the scheduler backed. I wanted the events to be published after the executor registers via the Akka event.

The consequence is that the events are generated at a lower level but I think they are more accurate. Thoughts? I'd really like to get this merged so that it can unblock: #3486 (Yarn links in UI) which i think is pretty important for usability.

@pwendell
Copy link
Contributor

Okay - let's do this as-is and then we can consider changing it later. It's purely an internal API.

@pwendell
Copy link
Contributor

I believe apache git is down right now, but this is ready to merge from my perspective.

@asfgit asfgit closed this in 96c2c71 Jan 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants