DEPF: Reliable Identification and Interpretation of Single-cell Molecular Heterogeneity and Transcriptional Regulation using Dynamic Ensemble Pruning
Release: Source
Links: Getting Started | API Reference | Examples
DEPF: A dynamic ensemble pruning framework (DEPF) is proposed to identify and interpret single-cell molecular heterogeneity. In particular, a silhouette coefficient-based indicator is developed and evaluated to determine the optimization direction of the bi-objective function. In addition, a hierarchical autoencoder is employed to project the high-dimensional scRNA-seq data onto multiple low-dimensional latent space sets, and then a clustering ensemble is produced in the latent space by the basic clustering algorithm. Following that, a bi-objective fruit fly optimization algorithm is designed to prune dynamically the low-quality basic clustering in the ensemble.
DEPF is constructed based on four modules (Normalization, Hierarchical Autoencoder, Clustering Ensemble, Dynamic Ensemble Pruning) developed by ourselves. It provides five highlights:
- 🌞 Dynamic ensemble pruning: many could be better than all.
- 🍎 DEPF can identify rare cell types and small clusters that would not be picked up by other methods.
- 🌞 DEPF can identify novel clusters that other traditional methods failed to detect.
- 🍎 DEPF can provide biological interpretation of scRNA-seq data.
- 🌞 DEPF can discover transcriptional and post-transcriptional regulators in scRNA-seq data.
It is recommended to use git for installation.
# create a virtual environment named DEPF
$ conda create -n DEPF
# activate the environment
$ conda activate DEPF
# install R enviroment
$ conda install -c conda-forge r-base=4.1.3
$ conda install -c conda-forge r-devtools
$ conda install -c conda-forge r-seurat
$ conda install python=3.9
$ pip install leidenalg
# install MATLAB according to the official website tutorial
# clone DEPF repository
$ git clone https://github.com/fanyi21/DEPF.git
# # install the dependencies
$ cd DEPF/DEPF/HierarchicalAutoencoder/
$ Rscript RequirePackage.R🍁 Step 1: Normalizing and mapping the raw scRNA-seq data to multiple low-dimensional latent spaces. A 9_latent_data folder is produced and saved in the ./OutputData.
$ cd DEPF/HierarchicalAutoencoder/
$ Rscript runHA.R🍁 Step 2: Selecting a basic clustering algorithm to generate a clustering ensemble. DEPF provides three basic clustering algorithms, including Louvain, Leiden, and spectral clustering.
- 🍆 Louvain. The Louvain_resolution_1.csv is produced and saved in the ./OutputData.
$ cd DEPF/HierarchicalAutoencoder/
source("runLouvain.R")
#res: resolution
runLouvain(res=1, ensemble_num=10) - 🍆 Leiden. The Leiden_resolution_1.csv is produced and saved in the ./OutputData.
$ cd DEPF/HierarchicalAutoencoder/
source("runLleiden.R")
#res: resolution
runLeiden(res=1, ensemble_num=10) - 🍆 spectral clustering. The spectral_cluster_K_10.csv is produced and saved in the ./OutputData.
$ cd DEPF/BiobjectiveFruitFlyOptimizationAlgorithm/
% K: cluster number; T: ensemble size
runSpectral(K=10, T=10) 🍁 Step 3: Performing dynamic ensemble pruning. The final_clustering.csv is produced and saved in the ./OutputData.
$ cd DEPF/BiobjectiveFruitFlyOptimizationAlgorithm/
runBioFOA("spectral", 10, 1)
% output
NMI: 0.8900
ARI: 0.9100Note:
DEPFis still under development, please see API reference for the latest list.
Thank you for using DEPF! Any questions, suggestions or advices are welcome.
email address:lixt314@jlu.edu.cn, fanyi21@mails.jlu.edu.cn

