-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2995][MLLIB] add ALS.setIntermediateRDDStorageLevel #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 1913. This patch merges cleanly. |
|
QA results for PR 1913: |
|
@mengxr: I would prefer setIntermediateRDDStorageLevel. |
|
QA tests have started for PR 1913. This patch merges cleanly. |
|
QA results for PR 1913: |
As mentioned in SPARK-2465, using `MEMORY_AND_DISK_SER` for user/product in/out links together with `spark.rdd.compress=true` can help reduce the space requirement by a lot, at the cost of speed. It might be useful to add this option so people can run ALS on much bigger datasets. Another option for the method name is `setIntermediateRDDStorageLevel`. Author: Xiangrui Meng <[email protected]> Closes #1913 from mengxr/als-storagelevel and squashes the following commits: d942017 [Xiangrui Meng] rename to setIntermediateRDDStorageLevel 7550029 [Xiangrui Meng] add ALS.setIntermediateDataStorageLevel (cherry picked from commit 69a57a1) Signed-off-by: Xiangrui Meng <[email protected]>
|
Merged into both master and branch-1.1. |
As mentioned in SPARK-2465, using `MEMORY_AND_DISK_SER` for user/product in/out links together with `spark.rdd.compress=true` can help reduce the space requirement by a lot, at the cost of speed. It might be useful to add this option so people can run ALS on much bigger datasets. Another option for the method name is `setIntermediateRDDStorageLevel`. Author: Xiangrui Meng <[email protected]> Closes apache#1913 from mengxr/als-storagelevel and squashes the following commits: d942017 [Xiangrui Meng] rename to setIntermediateRDDStorageLevel 7550029 [Xiangrui Meng] add ALS.setIntermediateDataStorageLevel
As mentioned in SPARK-2465, using
MEMORY_AND_DISK_SERfor user/product in/out links together withspark.rdd.compress=truecan help reduce the space requirement by a lot, at the cost of speed. It might be useful to add this option so people can run ALS on much bigger datasets.Another option for the method name is
setIntermediateRDDStorageLevel.