-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26443][CORE] Use ConfigEntry for hardcoded configs for history category. #23384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26443][CORE] Use ConfigEntry for hardcoded configs for history category. #23384
Conversation
|
Test build #100448 has finished for PR 23384 at commit
|
|
Jenkins, retest this please. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. Adding @vanzin
|
Test build #100450 has finished for PR 23384 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
| val conf = new SparkConf() | ||
| .set("spark.history.fs.logDirectory", logDir) | ||
| .set("spark.history.fs.update.interval", "0") | ||
| .set(EVENT_LOG_DIR, logDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EVENT_LOG_DIR.key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use the config entry as a key directly except for configs the string expression of which is more readable like timeConf or bytesConf. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I am fine with both, this was rather a nit, I just meant to have one notation instead of both them if possible, in order to avoid confusion. But if you prefer this way, I am fine with it, thanks.
| stop() | ||
| val myConf = new SparkConf() | ||
| .set("spark.history.fs.logDirectory", logDir.getAbsolutePath) | ||
| .set(EVENT_LOG_DIR, logDir.getAbsolutePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| private val conf = new SparkConf() | ||
| .set("spark.history.fs.logDirectory", logDir.getAbsolutePath) | ||
| .set("spark.history.fs.updateInterval", "1") | ||
| .set(EVENT_LOG_DIR, logDir.getAbsolutePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: .key?
| .set("spark.history.fs.logDirectory", logDir.getAbsolutePath) | ||
| .set("spark.history.fs.updateInterval", "1") | ||
| .set(EVENT_LOG_DIR, logDir.getAbsolutePath) | ||
| .set(UPDATE_INTERVAL_S, 1L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| val testConf = new SparkConf() | ||
| testConf.set("spark.history.fs.logDirectory", | ||
| Utils.createTempDir(namePrefix = "eventLog").getAbsolutePath()) | ||
| testConf.set(EVENT_LOG_DIR, Utils.createTempDir(namePrefix = "eventLog").getAbsolutePath()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| | | ||
| | spark.history.fs.logDirectory Directory where app logs are stored | ||
| | (default: file:/tmp/spark-events) | ||
| | spark.history.fs.updateInterval How often to reload log data from storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to change this? Seems this pattern is quite used too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like spark.history.fs.updateInterval is deprecated, but let me revert it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, it is good to remove the deprecated one as many as possible in Spark 3.0. When I searched this string during reviews, there were only two instances; here and deprecation checking test case. @mgaido91 , what do you mean by quite used? We already accepted this deprecated one and new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see only now that it was deprecated, sorry. Then, this is ok, sorry for the comment.
@dongjoon-hyun I meant that camel case notation for the last element is quite widespread also for other configs, so I didn't see the need for changing the notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then should I update to include this change back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okie. It's deprecated and we should get rid of it in the doc basically. I see no issue. Yea, I think it can be included back. LGTM without that change as well.
| val DEFAULT_LOG_DIR = "file:/tmp/spark-events" | ||
|
|
||
| val EVENT_LOG_DIR = ConfigBuilder("spark.history.fs.logDirectory") | ||
| .stringConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but EVENT_LOG_DIR doesn't seem a great name to me for this property, as we have a spark.eventLog.dir. Shall we update it to HISTORY_LOG_DIR or something similar in order to disambiguate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the name HISTORY_LOG_DIR. Let me update it.
cc @HyukjinKwon @srowen @dongjoon-hyun @vanzin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. HISTORY_LOG_DIR sounds good to me, too.
|
Test build #100502 has finished for PR 23384 at commit
|
|
LGTM |
This reverts commit a364017.
|
Test build #100541 has finished for PR 23384 at commit
|
|
Thank you, @ueshin , @HyukjinKwon , @srowen , @mgaido91 . |
… category. ## What changes were proposed in this pull request? This pr makes hardcoded "spark.history" configs to use `ConfigEntry` and put them in `History` config object. ## How was this patch tested? Existing tests. Closes apache#23384 from ueshin/issues/SPARK-26443/hardcoded_history_configs. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This pr makes hardcoded "spark.history" configs to use
ConfigEntryand put them inHistoryconfig object.How was this patch tested?
Existing tests.