Skip to content

fanyi21/DEPF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEPF: Reliable Identification and Interpretation of Single-cell Molecular Heterogeneity and Transcriptional Regulation using Dynamic Ensemble Pruning

Documentation Status

Release: Source

Links: Getting Started | API Reference | Examples

DEPF: A dynamic ensemble pruning framework (DEPF) is proposed to identify and interpret single-cell molecular heterogeneity. In particular, a silhouette coefficient-based indicator is developed and evaluated to determine the optimization direction of the bi-objective function. In addition, a hierarchical autoencoder is employed to project the high-dimensional scRNA-seq data onto multiple low-dimensional latent space sets, and then a clustering ensemble is produced in the latent space by the basic clustering algorithm. Following that, a bi-objective fruit fly optimization algorithm is designed to prune dynamically the low-quality basic clustering in the ensemble.

overview

DEPF is constructed based on four modules (Normalization, Hierarchical Autoencoder, Clustering Ensemble, Dynamic Ensemble Pruning) developed by ourselves. It provides five highlights:

  • 🌞 Dynamic ensemble pruning: many could be better than all.
  • 🍎 DEPF can identify rare cell types and small clusters that would not be picked up by other methods.
  • 🌞 DEPF can identify novel clusters that other traditional methods failed to detect.
  • 🍎 DEPF can provide biological interpretation of scRNA-seq data.
  • 🌞 DEPF can discover transcriptional and post-transcriptional regulators in scRNA-seq data.

Getting Started

DEPF

Table of Contents

Installation

It is recommended to use git for installation.

# create a virtual environment named DEPF
$ conda create -n DEPF    
# activate the environment       
$ conda activate DEPF   
# install R enviroment
$ conda install -c conda-forge r-base=4.1.3
$ conda install -c conda-forge r-devtools
$ conda install -c conda-forge r-seurat
$ conda install python=3.9
$ pip install leidenalg
# install MATLAB according to the official website tutorial
# clone DEPF repository                  
$ git clone https://github.com/fanyi21/DEPF.git
# # install the dependencies
$ cd DEPF/DEPF/HierarchicalAutoencoder/
$ Rscript RequirePackage.R

Example

🍁 Step 1: Normalizing and mapping the raw scRNA-seq data to multiple low-dimensional latent spaces. A 9_latent_data folder is produced and saved in the ./OutputData.

$ cd DEPF/HierarchicalAutoencoder/
$ Rscript runHA.R

🍁 Step 2: Selecting a basic clustering algorithm to generate a clustering ensemble. DEPF provides three basic clustering algorithms, including Louvain, Leiden, and spectral clustering.

  • 🍆 Louvain. The Louvain_resolution_1.csv is produced and saved in the ./OutputData.
$ cd DEPF/HierarchicalAutoencoder/
source("runLouvain.R")
#res: resolution
runLouvain(res=1, ensemble_num=10) 
  • 🍆 Leiden. The Leiden_resolution_1.csv is produced and saved in the ./OutputData.
$ cd DEPF/HierarchicalAutoencoder/
source("runLleiden.R")
#res: resolution
runLeiden(res=1, ensemble_num=10) 
  • 🍆 spectral clustering. The spectral_cluster_K_10.csv is produced and saved in the ./OutputData.
$ cd DEPF/BiobjectiveFruitFlyOptimizationAlgorithm/
% K: cluster number; T: ensemble size
runSpectral(K=10, T=10) 

🍁 Step 3: Performing dynamic ensemble pruning. The final_clustering.csv is produced and saved in the ./OutputData.

$ cd DEPF/BiobjectiveFruitFlyOptimizationAlgorithm/
runBioFOA("spectral", 10, 1)
% output
NMI: 0.8900
ARI: 0.9100

Note: DEPF is still under development, please see API reference for the latest list.

Contact:

Thank you for using DEPF! Any questions, suggestions or advices are welcome.

email address:lixt314@jlu.edu.cn, fanyi21@mails.jlu.edu.cn

About

dynamic ensemble pruning framework

Resources

License

Stars

Watchers

Forks

Contributors