-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md
#37853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
spark.shuffle.service.db in the documentspark.shuffle.service.db. enabled in the document
spark.shuffle.service.db. enabled in the documentspark.shuffle.service.db.enabled in the document
spark.shuffle.service.db.enabled in the documentspark.shuffle.service.db.enabled in the document
docs/running-on-yarn.md
Outdated
| When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle | ||
| service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle | |
| service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value. | |
| When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used in shuffle | |
| service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d8b39ef fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your correction
docs/spark-standalone.md
Outdated
| eventually gets cleaned up. This config may be removed in the future. | ||
| automatically reload info on current executors. This only affects standalone mode. You should also enable | ||
| <code>spark.worker.cleanup.enabled</code>, to ensure that the state eventually gets cleaned up. | ||
| This config may be removed in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing the yarn related blurb from here ? Essentially, this boolean does not control the behavior in yarn - for yarn, that is configured for the cluster, and inherits the behavior for spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... does yarn always has this behavior enabled mean that YarnShuffleService will always persist data into Level/RocksDB?
Is that incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- when
yarn.nodemanager.recovery.enabledis true,_recoveryPathandregisteredExecutorFileinYarnShuffleServicewill not null, thenYarnShuffleServicepersist data into Level/RocksDB - when
yarn.nodemanager.recovery.enabledis false,_recoveryPathandregisteredExecutorFileinYarnShuffleServicewill null, thenYarnShuffleServicenot persist data into diskstore
The persist behavior of YarnShuffleService is controlled by Yarn's configuration. It seems that it not related to spark.shuffle.service.db.enabled, so I don't think it is necessary to mention yarn always has this behavior enabled in this configuration description.
If we need add yarn related descriptions here, should we also need mention mesos always has this behavior disabled here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can we change
yarn always has this behavior enabled
to
The behavior of yarn (and mesos) not depend on this configuration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.shuffle.service.db.enabled in the documentspark.shuffle.service.db.backend in running-on-yarn.md
docs/running-on-yarn.md
Outdated
| and `LEVELDB` as default value. | ||
| When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used | ||
| in shuffle service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value. | ||
| The original data store in `LevelDB/RocksDB` will not be automatically convert to another kind of storage now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert -> converted ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @mridulm comment. And, could you add additional description about what happens at the runtime when the the store types are mismatched. It's deleted and recreated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old one will not be deleted, but the new one will be created. When the store type is switched, the directory name will change, for example, from registeredExecutors.ldb to registeredExecutors.rdb, YarnShuffleService will create registeredExecutors.rdb if it not exists, but YarnShuffleService did not know that registeredExecutors.ldb existed, so it will not be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add The original data store will be retained and the new type data store will be created when switching storage types. Is that ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.
also cc @panbingkun , what I discussed with you offline yesterday
|
+CC @dongjoon-hyun for review |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you!
|
Merged to master for Apache Spark 3.4. Thank you, @LuciferYang and @mridulm . |
…ervice.db.backend` in `running-on-yarn.md` ### What changes were proposed in this pull request? From the context from [pr](apache#19032) of [SPARK-17321](https://issues.apache.org/jira/browse/SPARK-17321), `YarnShuffleService` will persist data into `Level/RocksDB` when Yarn NM recovery is enabled. So this pr adds the precondition description related to `Yarn NM recovery is enabled` for `spark.shuffle.service.db.backend`. in `running-on-yarn.md` ### Why are the changes needed? Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes apache#37853 from LuciferYang/SPARK-40404. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
From the context from pr of SPARK-17321,
YarnShuffleServicewill persist data intoLevel/RocksDBwhen Yarn NM recovery is enabled. So this pr adds the precondition description related toYarn NM recovery is enabledforspark.shuffle.service.db.backend. inrunning-on-yarn.mdWhy are the changes needed?
Add precondition description for
spark.shuffle.service.db.backendinrunning-on-yarn.mdDoes this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions