Predicting Forest Covers via Decision Trees/Forests

Here, we consider the CoverType dataset to systematically learn the hyper-parameters of a Random Forest classifier.

We utilize the CrossValidator Scala API of Spark 2.2.3 to implement a fully automatic hyper-parameter tuning step. Specifically, we use the 90% split of the data to initialize a 5-fold cross-validation loop over a fixed hyper-parameter setting. We use the classification accuracy over all of the 7 tree types (obtained from the MulticlassMetrics package of Spark) to measure the performance of your model for the given hyper-parameters.

We investigate the following range of parameters to also find the best amount of trees (using the “automatic” training mode) in your random forest:

– impurity <- Array("gini", "entropy"); 
– depth <- Array(10, 25, 50);
– bins <- Array(10, 100, 300));
– numTrees <- Array(5, 10, 20);

We then identify the best choice of all hyper-parameters obtain from the cross-validations as described above, and use the 10% testing set to measure the accuracy of the final model over the 7 classes of trees given by the CoverType dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.DS_Store		.DS_Store
README.md		README.md
spark-shell-script.scala		spark-shell-script.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Forest Covers via Decision Trees/Forests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting Forest Covers via Decision Trees/Forests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages