spVIPES enables robust integration of multi-group single-cell datasets through a principled shared-private latent space decomposition. The method leverages a Product of Experts (PoE) framework to learn both shared biological signals common across datasets and private representations capturing group-specific variations.
spVIPES provides three complementary approaches for dataset alignment:
| Method | Description | Best Use Case |
|---|---|---|
| Label-based PoE | Uses cell type annotations for direct supervision | High-quality cell type labels available |
| OT Paired PoE | Direct cell-to-cell correspondences via optimal transport | Known cellular correspondences (e.g., time series) |
| OT Cluster-based PoE | Automated cluster matching with transport plans | Similar cell populations, no direct correspondences |
Note: The method automatically selects the most appropriate strategy based on available annotations and transport information.
- Python 3.9-3.10
- PyTorch (GPU support strongly recommended)
Install the latest stable release from PyPI:
pip install spVIPESFor the development version:
pip install git+https://github.com/nrclaudio/spVIPES.git@mainFor optimal performance, ensure CUDA-compatible PyTorch is installed:
# Check GPU availability
nvidia-smi
# Install PyTorch with CUDA support (example for CUDA 11.3)
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# Verify GPU detection
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"See PyTorch installation guide for version-specific instructions.
import spVIPES
import scanpy as sc
# Load your multi-group dataset
adata = sc.read_h5ad("data.h5ad")
# Configure integration strategy
spVIPES.model.setup_anndata(
adata,
groups_key="dataset",
label_key="cell_type", # Optional: for supervised integration
)
# Initialize and train model
model = spVIPES.model(adata)
model.train(max_epochs=200)
# Extract integrated representations
latent = model.get_latent_representation()
adata.obsm["X_spVIPES"] = latent📋 Label-based Integration
Use when high-quality cell type annotations are available:
spVIPES.model.setup_anndata(
adata,
groups_key="dataset",
label_key="cell_type",
batch_key="batch", # Optional batch correction
)🔄 Optimal Transport: Paired Cells
For datasets with known cell-to-cell correspondences:
# Assumes transport plan stored in adata.uns["transport_plan"]
spVIPES.model.setup_anndata(
adata,
groups_key="dataset",
transport_plan_key="transport_plan",
match_clusters=False,
)🧩 Optimal Transport: Cluster Matching
For automatic cluster-based alignment:
spVIPES.model.setup_anndata(
adata,
groups_key="dataset",
transport_plan_key="transport_plan",
match_clusters=True, # Enables automated cluster matching
)# Custom model parameters
model = spVIPES.model(
adata,
n_dimensions_shared=25, # Shared latent dimensions
n_dimensions_private=10, # Private latent dimensions
n_hidden=128, # Hidden layer size
dropout_rate=0.1, # Regularization
)
# Training with custom settings
model.train(
max_epochs=300, batch_size=512, early_stopping=True, check_val_every_n_epoch=10
)📚 Getting Started
- Basic Tutorial — Complete walkthrough of spVIPES functionality
- API Documentation — Comprehensive API reference
💬 Get Help
- Issue Tracker — Report bugs and request features
If you use spVIPES in your research, please cite:
@article{spVIPES2023,
title={Integrative learning of disentangled representations},
author={C. Novella-Rausell, D.J.M Peters and A. Mahfouz},
journal={bioRxiv},
year={2023},
doi={10.1101/2023.11.07.565957},
url={https://www.biorxiv.org/content/10.1101/2023.11.07.565957v1}
}Paper: bioRxiv preprint