-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23820][CORE] Enable use of long form of callsite in logs #21433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you please add the description for this PR? |
|
Done! |
|
Although maybe not widely used, I could see allowing control of this via an undocumented param |
|
Test build #4200 has finished for PR 21433 at commit
|
9800d2e to
245181a
Compare
|
Rebased on top of master. The failing test is unrelated. |
|
Test build #4203 has finished for PR 21433 at commit
|
|
@srowen Yes, I don't expect it will be widely used but I've personally found it helpful in some performance debugging and it's a fairly low impact change. I was just hoping to avoid having to keep applying this patch and doing my own build of Spark in the future :) |
|
Test build #4205 has finished for PR 21433 at commit
|
|
Test build #4206 has finished for PR 21433 at commit
|
|
Merged to master |
|
Thanks @srowen! |
| val parentIds = rdd.dependencies.map(_.rdd.id) | ||
| val callSite = callsiteForm match { | ||
| case "short" => rdd.creationSite.shortForm | ||
| case "long" => rdd.creationSite.longForm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the users input is neither short nor long, we will get an exception, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, usually we will define an enum and verify the given config value is valid or not.
For this particular case, I think we can just define a boolean flag: spark.eventLog.callsite.longForm.enabled.
| ConfigBuilder("spark.eventLog.overwrite").booleanConf.createWithDefault(false) | ||
|
|
||
| private[spark] val EVENT_LOG_CALLSITE_FORM = | ||
| ConfigBuilder("spark.eventLog.callsite").stringConf.createWithDefault("short") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short is defined? Where is the test case? Why this is not documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we should introduce a conf here. cc @rxin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this should have been documented, that is a good point. I should have looked more carefully at the configs. The other eventLog configs aren't internal either.
I agree this could also be a boolean, or simply handle anything but "short" or "long" as "short"
| } | ||
| new RDDInfo(rdd.id, rddName, rdd.partitions.length, | ||
| rdd.getStorageLevel, parentIds, rdd.creationSite.shortForm, rdd.scope) | ||
| rdd.getStorageLevel, parentIds, callSite, rdd.scope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds a general issue. Why we only apply it in RDDInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile As far as I am aware, RDDInfo is the only place the call site is included in the event log.
|
@gatorsmile @cloud-fan I'll just go with a boolean config as there really is no need for more than two options and this simplifies things quite a bit. |
|
@michaelmior Since Spark 2.4 is branch cut, this PR still needs more review. I would revert this PR from branch 2.4 and master first. We can discuss the conf and implementation in the master branch. The preferred conf name is |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's easy enough to update this to a boolean and add a doc and pick it into branch-2.4; it's not imminent as there are still about 30-40 open issues. Still, it's not vital enough that it must be included.
| ConfigBuilder("spark.eventLog.overwrite").booleanConf.createWithDefault(false) | ||
|
|
||
| private[spark] val EVENT_LOG_CALLSITE_FORM = | ||
| ConfigBuilder("spark.eventLog.callsite").stringConf.createWithDefault("short") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this should have been documented, that is a good point. I should have looked more carefully at the configs. The other eventLog configs aren't internal either.
I agree this could also be a boolean, or simply handle anything but "short" or "long" as "short"
|
Given lack of certainty, and that's this is small and easy to add back in a different form, and the fact that 2.4 is quickly teeing up, let me revert this for now. We can proceed with a different approach in a new PR. |
|
Yea we can add this back easily.
…On Tue, Sep 11, 2018 at 12:50 PM Sean Owen ***@***.***> wrote:
Given lack of certainty, and that's this is small and easy to add back in
a different form, and the fact that 2.4 is quickly teeing up, let me revert
this for now. We can proceed with a different approach in a new PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21433 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATvPPKRRxsg30kJA9RAItGJDHPF4mX_ks5uaBQjgaJpZM4UONdo>
.
|
This is a rework of #21433 to address some concerns there. Closes #22398 from michaelmior/long-callsite2. Authored-by: Michael Mior <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ab25c96) Signed-off-by: Wenchen Fan <[email protected]>
This is a rework of #21433 to address some concerns there. Closes #22398 from michaelmior/long-callsite2. Authored-by: Michael Mior <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This is a rework of apache#21433 to address some concerns there. Closes apache#22398 from michaelmior/long-callsite2. Authored-by: Michael Mior <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This adds an option to event logging to include the long form of the callsite instead of the short form.