-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34591][MLLIB][WIP] Add decision tree pruning as a parameter #32813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
1db47ef
[SPARK-34591][MLLIB] Disable decision tree pruning
36a3527
Merge branch 'master' into SPARK-34591
6e52df6
[SPARK-34591][MLLIB] Disable decision tree pruning
b53c0e4
Merge branch 'master' into SPARK-34591
6898a0e
This PR disables a feature created in SPARK-3159 where LearningNodes …
2fc33ec
Merge branch 'master' into SPARK-34591
fb835db
Exposed pruning parameter accessible in Scala WIP
CBribiescas a471d5e
Merge branch 'master' into SPARK-34591
CBribiescas 43ee852
Added to decision tree classifier and to python
CBribiescas dcec830
Merge branch 'master' into SPARK-34591
CBribiescas 4bb58f6
Finished a TODO for comments in Strategy.scala
CBribiescas ea028d4
Merge branch 'master' into SPARK-34591
CBribiescas f51bbae
Merge branch 'master' into SPARK-34591
CBribiescas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Exposed pruning parameter accessible in Scala WIP
- Loading branch information
commit fb835db4bae2dcd2c05ff4408ddc0252353ac569
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -75,6 +75,23 @@ private[ml] trait DecisionTreeParams extends PredictorParams | |
| " discretizing continuous features. Must be at least 2 and at least number of categories" + | ||
| " for any categorical feature.", ParamValidators.gtEq(2)) | ||
|
|
||
| /** | ||
| * If true, the trained tree will undergo a 'pruning' process after training in which nodes | ||
| * that have the same class predictions will be merged. The benefit being that at prediction | ||
| * time the tree will be 'leaner' | ||
| * If false, the post-training tree will undergo no pruning. The benefit being that you | ||
| * maintain the class prediction probabilities | ||
| * (default = false) | ||
|
||
| * @group param | ||
| */ | ||
| final val pruneTree: BooleanParam = new BooleanParam(this, "pruneTree", "" + | ||
| "If true, the trained tree will undergo a 'pruning' process after training in which nodes" + | ||
| " that have the same class predictions will be merged. The benefit being that at prediction" + | ||
| " time the tree will be 'leaner'" + | ||
| " If false, the post-training tree will undergo no pruning. The benefit being that you" + | ||
| " maintain the class prediction probabilities" | ||
| ) | ||
|
|
||
| /** | ||
| * Minimum number of instances each child must have after split. | ||
| * If a split causes the left or right child to have fewer than minInstancesPerNode, | ||
|
|
@@ -137,7 +154,7 @@ private[ml] trait DecisionTreeParams extends PredictorParams | |
| " trees.") | ||
|
|
||
| setDefault(leafCol -> "", maxDepth -> 5, maxBins -> 32, minInstancesPerNode -> 1, | ||
| minWeightFractionPerNode -> 0.0, minInfoGain -> 0.0, maxMemoryInMB -> 256, | ||
| minWeightFractionPerNode -> 0.0, minInfoGain -> 0.0, pruneTree -> false, maxMemoryInMB -> 256, | ||
| cacheNodeIds -> false, checkpointInterval -> 10) | ||
|
|
||
| /** @group setParam */ | ||
|
|
@@ -163,6 +180,9 @@ private[ml] trait DecisionTreeParams extends PredictorParams | |
| /** @group getParam */ | ||
| final def getMinInfoGain: Double = $(minInfoGain) | ||
|
|
||
| /** @group getParam */ | ||
| final def getPruneTree: Boolean = $(pruneTree) | ||
|
|
||
| /** @group expertGetParam */ | ||
| final def getMaxMemoryInMB: Int = $(maxMemoryInMB) | ||
|
|
||
|
|
@@ -183,6 +203,7 @@ private[ml] trait DecisionTreeParams extends PredictorParams | |
| strategy.maxDepth = getMaxDepth | ||
| strategy.maxMemoryInMB = getMaxMemoryInMB | ||
| strategy.minInfoGain = getMinInfoGain | ||
| strategy.pruneTree = getPruneTree | ||
| strategy.minInstancesPerNode = getMinInstancesPerNode | ||
| strategy.minWeightFractionPerNode = getMinWeightFractionPerNode | ||
| strategy.useNodeIdCache = getCacheNodeIds | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key point for users is that this is a good thing if only interested in class predictions. If interested in class probabilities, it won't necessarily give the right result and should be set to
false. The text here is fine just wanting to make the tradeoffs explicit. I.e.leanermeans smaller and faster.