This repository contains codes, datasets, model checkpoints and hyperparameters used for the article "Electrolytomics: A unified big data approach for rational design and discovery of liquid electrolytes" (link to paper). The main objective of this work is to utilize machine learning models to screen efficient electrolytes for next-generation batteries, e.g., lithium metal batteries (LMBs), from a large virtual search space.
The repository includes:
- Labeled and unlabeled datasets used in the study.
- Codes and notebooks to replicate main findings and figures in the main text of this study.
- Trained model checkpoints used in the study to screen electrolytes.
.
├── README.md
├── environment.yml
├── datasets
│ ├── featurized
│ │ ├── CE
│ │ ├── conductivity
│ │ └── oxstab
│ ├── other
│ ├── predicted
│ │ ├── CE
│ │ ├── conductivity
│ │ └── oxstab
│ └── raw
│ ├── CE
│ ├── conductivity
│ └── oxstab
├── hyperparameters
│ ├── CE
│ ├── conductivity
│ └── oxstab
├── models
│ ├── Chemprop
│ └──SL
└── notebooks
├── Chemprop_BHO_training_prediction
├── SHAP_sensitivity_analysis
├── SL_BHO_training_prediction
├── creating-data-splits
├── eScore_calculations
├── featurization
├── manuscript-plots
└── reduced_embeddings
The datasets are arranged in the following directories:
datasets/raw: Contain raw datasets without any features for each of the three properties (insideconductivity,oxstab, andCE).datasets/featurized: Contain featurized datasets for both shallow learning and Chemprop models for each of the three properties (insideconductivity,oxstab, andCE).datasets/predicted: Contain files with ML predictions for each of the three properties (insideconductivity,oxstab, andCE)datasets/other: Other relevant files, e.g., t-SNE embeddings used in the present study.
The repository includes the Jupyter notebooks for different purposes inside:
notebooks/creating-data-splits: For generating all data splits used in the present study.notebooks/featurization: For generating all types of features used in the present study.notebooks/Chemprop_BHO_training_prediction: For performing Bayesian hyerparameter optimization (BHO), training, and predicting Chemprop modelsnotebooks/SL_BHO_training_prediction: For performing Bayesian hyerparameter optimization (BHO), training, and predicting shallow learning models (LightGBM and PLSR).notebooks/SHAP_sensitivity_analysis: For performing SHAP and sensitivity analyses on shallow learning models (LightGBM and PLSR).notebooks/eScore_calculations: For calculating eScores.notebooks/reduced_embeddings: For obtaining and plotting t-SNE reduced embeddings.notebooks/manuscript-plots: For reproduing figures in the main text of this study.
The trained model checkpoints for shallow learning (in .sav format) and Chemprop models are stored inside the directory models for each of the three properties (inside conductivity, oxstab, and CE).
The exact hyperparameters obtained after BHO for for shallow learning and Chemprop models are also provided inside the directory hyperparameters for each of the three properties (inside conductivity, oxstab, and CE).
Follow these steps to run the notebooks:
-
Clone the repository:
git clone https://github.com/AmanchukwuLab/electrolytomics cd electrolytomics -
Create virutal environment, install the required dependencies, and activate the virtual environment:
conda env create -f environment.yml conda activate electrolytomics
-
Launch Jupyter Notebook:
jupyter notebook
-
Open the notebooks from the
notebooksdirectory and run them cell-by-cell.
The following libraries are required (refer environment.yml file):
- Python 3.8+
- Jupyter Notebook
- Pandas
- NumPy
- Scipy
- Scikit Learn
- Pickle
- Matplotlib
- Seaborn
- Shap
- SALib
- LightGBM
- Chemprop
- RDKit
- Optuna
- OpenTSNE
Please consider citing this work if you use our datasets or codes:
@article{kumar2024electrolytomics,
title={Electrolytomics: A unified big data approach for electrolyte design and discovery},
author={Kumar, Ritesh and Vu, Minh Canh and Ma, Peiyuan and Amanchukwu, Chibueze},
journal={Chemistry of Materials},
year={2025},
volume={37},
pages={2720-2734},
doi={https://doi.org/10.1021/acs.chemmater.4c03196}
}