scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Overview

scDrugMap is an integrated Python toolkit and interactive web server designed for predicting drug response at single-cell resolution using large-scale foundation models (FMs). This framework provides a comprehensive benchmarking platform to evaluate model performance across diverse cancer types, therapy classes, tissue types, and drug regimens. In addition to robust performance evaluation, scDrugMap supports various training strategies, including layer freezing, fine-tuning via LoRA, and zero-shot inference, and includes tools for biomarker discovery, model interpretability, and drug resistance analysis.[paper]

Key Features

🧬 Support for 10 Foundation Models:
- 8 single-cell foundation models (e.g., scFoundation, scGPT, tGPT)
- 2 general-purpose LLMs (LLaMa3-8B, GPT4o-mini)
⚙️ Multi-strategy Model Training:
- Layer-freezing
- Fine-tuning with LoRA (Low-Rank Adaptation)
- Zero-shot/few-shot prediction
📊 Two Evaluation Strategies:
- Pooled-data evaluation: Train/test on aggregated datasets
- Cross-data evaluation: Test generalizability across distinct datasets
📁 Rich and Curated Dataset Repository:
- 326,751 single cells (primary collection)
- 18,856 single cells (validation collection)
- 14 cancer types, 5 tissue types, 3 therapy types, and 21 regimens
🌐 Web Interface for Easy Access:
- Explore models, datasets, predictions, and visualizations via https://scdrugmap.com

Web Server 🚀

The web server enables users to: Upload scRNA-seq data; Select models and evaluation settings; Predict drug response at single-cell resolution; Visualize results interactively

🔗 Visit: https://scdrugmap.com

Installation

Clone the repository: git clone https://github.com/QSong-github/scDrugMap.git

Set up your environment: conda env create -f env.yaml

You can also follow the original environment installation instructions for every model. Geneformer, tGPT, UCE, scBERT, CellPLM, OpenBioMed/CellLM, scGPT, scFoundation. The complete code of the project is in zenodo.

Docker Deployment 🐳

For quick deployment using Docker Compose:

# Clone the repository
git clone https://github.com/QSong-github/scDrugMap.git
cd scDrugMap

# Start the services
docker-compose up -d

# Check service status
docker-compose ps

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Getting Started

(0) One-click launch

# entering the project directory
$ cd ./benchmark/
# Specify a model (e.g. scFoundation, scGPT) and training mode (EBD and FT)
$ python launcher.py --model scGPT --mode EBD

(1) Geneformer

# entering the code directory
$ cd ./benchmark/Geneformer-finetuing-lora-prompt_cell_cls/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(2) tGPT

# entering the code directory
$ cd ./benchmark/tGPT-main/tGPT-main/tGPT-main/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(3) UCE

# entering the code directory
$ cd ./benchmark/UCE-main/UCE-main/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(4) scBERT

# entering the code directory
$ cd ./benchmark/scBERT-master/scBERT-master/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(5) CellPLM

# entering the code directory
$ cd ./benchmark/CellPLM-main/CellPLM-main/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(6) OpenBioMed/CellLM

# entering the code directory
$ cd ./benchmark/OpenBioMed-main/OpenBioMed-main/
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(7) scGPT

# entering the code directory
$ cd ./benchmark/scGPT-main/scGPT-main/scgpt/tasks/
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LOAR
$ python benchmarking_main_FT.py

(8) scFoundation

# entering the code directory
$ cd ./benchmark/scFoundation-main/scFoundation-main/model/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_ebd.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py
# train and test by fine-tuning with LORA
$ python benchmarking_main_FT.py

(9) Llama

# entering the code directory
$ cd ./benchmark/Llama/
# creating dataset
$ python dataset_making.py
# generating fixed embedding
$ python get_embeds.py
# train and test by fixed embedding
$ python benchmarking_main_EBD.py

(10) GPT4-mini

# entering the code directory
$ cd ./benchmark/GPT4/
# prediction
$ python main.py

Reference

[1].Shen, H. et al. Generative pretraining from large-scale transcriptomes: Implications for single-cell deciphering and clinical translation. bioRxiv, 2022.2001. 2031.478596 (2022).
[2].Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence 4, 852-866 (2022).
[3].Theodoris, C.V. et al. Transfer learning enables predictions in network biology. Nature 618, 616-624 (2023).
[4].Zhao, S., Zhang, J. & Nie, Z. Large-scale cell representation learning via divide-and-conquer contrastive learning. arXiv preprint arXiv:2306.04371 (2023).
[5].Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nature Methods, 1-11 (2024).
[6].Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 1-11 (2024).
[7].Wen, H. et al. CellPLM: pre-training of cell language model beyond single cells. bioRxiv, 2023.2010. 2003.560734 (2023).
[8].Rosen, Y. et al. Universal cell embeddings: A foundation model for cell biology. bioRxiv, 2023.2011. 2028.568918 (2023).
[9].AI@Meta Llama 3 Model Card. https://huggingface.co/meta-llama/Meta-Llama-3-8B (2024).
[10].OpenAI GPT-4o mini. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ (2024).

Important Notes:

The backend image contains multiple LLM weights and is quite large (~10GB+), so the initial pull may take some time depending on your internet connection
The frontend will be available at: http://localhost:3000
The backend API will be available at: http://localhost:8000
Both services will start automatically and the frontend will wait for the backend to be ready

Citation

@article{wang2025scdrugmap, title={scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction}, author={Wang, Qing and Pan, Yining and Zhou, Minghao and Tang, Zijia and Wang, Yanfei and Wang, Guangyu and Song, Qianqian}, journal={arXiv preprint arXiv:2505.05612}, year={2025} }

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
code		code
src		src
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env.yaml		env.yaml
launcher.py		launcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Overview

Key Features

Web Server 🚀

Installation

Docker Deployment 🐳

Getting Started

Reference

Citation

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

QSong-github/scDrugMap

Folders and files

Latest commit

History

Repository files navigation

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Overview

Key Features

Web Server 🚀

Installation

Docker Deployment 🐳

Getting Started

Reference

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages