Comparing Performance of Machine Learning Algorithms through deterimining Breast Cancer Risk Factors

by Tiffany Yu and Maggie Eberts

Through determining the risk factors of breast cancer using three different Machine Learning Algorithms - C4.5, AdaBoost and RandomForest, we aim to evaluate and contrast the perfomance of these algorithms. Running these algorithms on the same data set, we looked at the accuracy, specificity and sensitivity of the predictions of these algorithms. The results were internally validated using logistic regression and externally validated through the research of risk factors in official medical journals.

To implement the C4.5, AdaBoost and RandomForest algorithms, packages from sklearn were used.

C4.5: from sklearn.tree import DecisionTreeClassifier
AdaBoost: from sklearn.ensemble import AdaBoostClassifier
RandomForest: from sklearn.ensemble import RandomForestClassifier

modifyData.py:

This python programs takes the raw data file (Cancer.dat), modifies the data as needed for experiments, and outputs two data files (ModCancerData.csv - the complete data set used for experiments and debug.csv - a smaller data set used for debugging purposes)

main.py:

This python program includes the entire code to run experiments. To run experiments, run "python main.py [datatextfile]".

projectData Folder:

Cancer.dat - raw data file from IPUMS database.

DataFeatures.txt - File containing information on features of the IPUMS data set.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
projectData		projectData
README.md		README.md
main.py		main.py
modifydata.py		modifydata.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Performance of Machine Learning Algorithms through deterimining Breast Cancer Risk Factors

modifyData.py:

main.py:

projectData Folder:

About

Uh oh!

Releases

Packages

Languages

tiffanyu419/Machine-Learning-Bioinformatics

Folders and files

Latest commit

History

Repository files navigation

Comparing Performance of Machine Learning Algorithms through deterimining Breast Cancer Risk Factors

modifyData.py:

main.py:

projectData Folder:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages