Datasets and codes for the paper: "Chemical Hardness-Driven Interpretable Machine Learning Approach for Rapid Search of Photocatalysts" (https://doi.org/10.1038/s41524-021-00669-4)
This Github repository contains datasets and Jupyter notebooks for the following tasks (as a proof-of-concept):
- Feature ranking for formation energy predictions using elemental and chemical hardness-based features
- Selecting best performing ML algorithm using PyCaret (for formation energy predictions using elemental and chemical hardness-based features)
- Bayesian hyperparameter optimization using Optuna (for formation energy predictions using elemental and chemical hardness-based features)
- Finding best ML model (for formation energy predictions using elemental and chemical hardness-based features)
- SHAP feature importance plot, dependence plots, and force plots for formation energy predictions using elemental and chemical hardness-based features
- Plotting ROC-AUC curve and confusion matrix of the best ML model obtained for multiclass classification of overall stability
- SHAP feature importance plot and multioutput decision plot for multiclass classification of overall stability
- Scikit-learn
- Numpy
- Optuna
- PyCaret
- SHAP
- imblearn
- LightGBM
Please cite the following paper if you intend to use our codes in some form: @article{Kumar_2021, doi = {10.1038/s41524-021-00669-4}, url = {https://doi.org/10.1038/s41524-021-00669-4}, year = 2021, month = {dec}, publisher = {Springer Science and Business Media {LLC}}, volume = {7}, number = {1}, pages = {197}, author = {Ritesh Kumar and Abhishek K. Singh}, title = {Chemical hardness-driven interpretable machine learning approach for rapid search of photocatalysts}, journal = {npj Comput. Mater.} }