In this project, we plan to analyze the problem of predicting hospital readmission rates among diabetic patients using the "Diabetes 130-US hospitals" dataset. Tradi- tionally, this problem is dealt with by using statistical machine learning algorithms like Naive Bayes, K-Nearest Neighbors, and Logistic regression. These algorithms are known to not perform well on non-separable and high-dimensional datasets. To overcome these pitfalls, we will explore advanced techniques such as random forests, ensemble methods, and neural networks. Missing data, overfitting, and feature engineering are some of the challenges that we will encounter. The ideal outcome of the project would be to gain deeper insights into hospital readmission rates and investigate robust methods that can make improved predictions than the statistical methods. Our experiments show that Random forests performed better than other methods in the predictions.Attributes like gender, race, total number of medications, lab procedures, admission type, time in hospital of the patient had a significant influence in these predictions.
Rajasekhar Mekala([email protected])
Agniraj Baikani([email protected])
Shravan Balamurugan([email protected])