-
Notifications
You must be signed in to change notification settings - Fork 29k
MLI-1 Decision Trees #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
cd53eae
skeletal framework
manishamde 92cedce
basic building blocks for intermediate RDD calculation. untested.
manishamde 8bca1e2
additional code for creating intermediate RDD
manishamde 0012a77
basic stump working
manishamde 03f534c
some more tests
manishamde dad0afc
decison stump functionality working
manishamde 4798aae
added gain stats class
manishamde 80e8c66
working version of multi-level split calculation
manishamde b0eb866
added logic to handle leaf nodes
manishamde 98ec8d5
tree building and prediction logic
manishamde 02c595c
added command line parsing
manishamde 733d6dd
fixed tests
manishamde 154aa77
enums for configurations
manishamde b0e3e76
adding enum for feature type
manishamde c8f6d60
adding enum for feature type
manishamde e23c2e5
added regression support
manishamde 53108ed
fixing index for highest bin
manishamde 6df35b9
regression predict logic
manishamde dbb7ac1
categorical feature support
manishamde d504eb1
more tests for categorical features
manishamde 6b7de78
minor refactoring and tests
manishamde b09dc98
minor refactoring
manishamde c0e522b
updated predict and split threshold logic
manishamde f067d68
minor cleanup
manishamde 5841c28
unit tests for categorical features
manishamde 0dd7659
basic doc
manishamde dd0c0d7
minor: some docs
manishamde 9372779
code style: max line lenght <= 100
manishamde 84f85d6
code documentation
manishamde d3023b3
adding more docs for nested methods
manishamde 63e786b
added multiple train methods for java compatability
manishamde cd2c2b4
fixing code style based on feedback
manishamde eb8fcbe
minor code style updates
manishamde 794ff4d
minor improvements to docs and style
manishamde d1ef4f6
more documentation
manishamde ad1fc21
incorporated mengxr's code style suggestions
manishamde 62c2562
fixing comment indentation
manishamde 6068356
ensuring num bins is always greater than max number of categories
manishamde 2116360
removing dummy bin calculation for categorical variables
manishamde 632818f
removing threshold for classification predict method
manishamde ff363a7
binary search for bins and while loop for categorical feature bins
manishamde 4576b64
documentation and for to while loop conversion
manishamde 24500c5
minor style updates
mengxr c487e6a
Merge pull request #1 from mengxr/dtree
manishamde f963ef5
making methods private
manishamde 201702f
making some more methods private
manishamde 62dc723
updating javadoc and converting helper methods to package private to …
manishamde e1dd86f
implementing code style suggestions
manishamde f536ae9
another pass on code style
mengxr 7d54b4f
Merge pull request #4 from mengxr/dtree
manishamde 1e8c704
remove numBins field in the Strategy class
manishamde File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
more tests for categorical features
Signed-off-by: Manish Amde <[email protected]>
- Loading branch information
commit d504eb1f8a3f7f06226448d42b709f2f7ec6e91c
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -204,15 +204,12 @@ object DecisionTree extends Serializable with Logging { | |
| } | ||
|
|
||
| /*Finds the right bin for the given feature*/ | ||
| def findBin(featureIndex: Int, labeledPoint: LabeledPoint) : Int = { | ||
| //logDebug("finding bin for labeled point " + labeledPoint.features(featureIndex)) | ||
| def findBin(featureIndex: Int, labeledPoint: LabeledPoint, isFeatureContinous : Boolean) : Int = { | ||
|
|
||
| val isFeatureContinous = strategy.categoricalFeaturesInfo.get(featureIndex).isEmpty | ||
| if (isFeatureContinous){ | ||
| //TODO: Do binary search | ||
| for (binIndex <- 0 until strategy.numBins) { | ||
| val bin = bins(featureIndex)(binIndex) | ||
| //TODO: Remove this requirement post basic functional | ||
| val lowThreshold = bin.lowSplit.threshold | ||
| val highThreshold = bin.highSplit.threshold | ||
| val features = labeledPoint.features | ||
|
|
@@ -222,9 +219,9 @@ object DecisionTree extends Serializable with Logging { | |
| } | ||
| throw new UnknownError("no bin was found for continuous variable.") | ||
| } else { | ||
|
|
||
| for (binIndex <- 0 until strategy.numBins) { | ||
| val bin = bins(featureIndex)(binIndex) | ||
| //TODO: Remove this requirement post basic functional | ||
| val category = bin.category | ||
| val features = labeledPoint.features | ||
| if (category == features(featureIndex)) { | ||
|
|
@@ -262,7 +259,8 @@ object DecisionTree extends Serializable with Logging { | |
| } else { | ||
| for (featureIndex <- 0 until numFeatures) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. replace the for loop by a while loop |
||
| //logDebug("shift+featureIndex =" + (shift+featureIndex)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove this comment |
||
| arr(shift + featureIndex) = findBin(featureIndex, labeledPoint) | ||
| val isFeatureContinous = strategy.categoricalFeaturesInfo.get(featureIndex).isEmpty | ||
| arr(shift + featureIndex) = findBin(featureIndex, labeledPoint,isFeatureContinous) | ||
| } | ||
| } | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is called for many times, we should use binary search or at least a while loop instead of for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Was planning to add it but forgot about it while trying to write a working version. I will update this when I add tests for this method.