Skip to content

vinuev/DecisionTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Forest Covers via Decision Trees/Forests

Here, we consider the CoverType dataset to systematically learn the hyper-parameters of a Random Forest classifier.

We utilize the CrossValidator Scala API of Spark 2.2.3 to implement a fully automatic hyper-parameter tuning step. Specifically, we use the 90% split of the data to initialize a 5-fold cross-validation loop over a fixed hyper-parameter setting. We use the classification accuracy over all of the 7 tree types (obtained from the MulticlassMetrics package of Spark) to measure the performance of your model for the given hyper-parameters.

We investigate the following range of parameters to also find the best amount of trees (using the “automatic” training mode) in your random forest:

– impurity <- Array("gini", "entropy"); 
– depth <- Array(10, 25, 50);
– bins <- Array(10, 100, 300));
– numTrees <- Array(5, 10, 20);

We then identify the best choice of all hyper-parameters obtain from the cross-validations as described above, and use the 10% testing set to measure the accuracy of the final model over the 7 classes of trees given by the CoverType dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages