Protein Interaction Coevolution Pipeline

A modular pipeline for detecting co-evolving residues between interacting protein pairs, inspired by Green et al., 2021, Nature Communications.

Overview

This pipeline enables the detection and visualization of co-evolving positions between two protein sequences, leveraging large-scale homology search, multiple sequence alignment (MSA), and statistical coupling analysis. It is designed for flexibility, reproducibility, and ease of use.

Features

Automated BLAST homology search with assembly-level filtering for taxonomic diversity
Pairing of homologs from the same genome assembly
Concatenated MSA construction and quality filtering
Coevolution analysis using PLMC
Publication-quality heatmaps and network visualizations
Modular, script-driven workflow

Installation

Clone the repository:

git clone <repo-url>
cd protein-interaction-coevolution

Run setup:
```
chmod +x setup.sh
./setup.sh
```
This will install all required Python dependencies automatically.
External tools:
- MAFFT and PLMC are automatically installed by the setup script via conda.
- If you need to install them manually:
  - MAFFT: conda install -c conda-forge mafft
  - PLMC: conda install -c bioconda plmc

Quick Start

Prepare your input:
- Place your two query FASTA files as queryA.fasta and queryB.fasta in inputs/find_homologues/PROJECT/.
- Set the PROJECT variable in the Makefile (currently set to ABO).

Run the pipeline:

make find        # Find and pair homologs
make msa         # Build and filter concatenated MSA
make coevolution # Run PLMC coevolution analysis
make heatmap     # Generate heatmaps and network visualizations

Pipeline Steps

1. `make find`

Description:
- Runs scripts/blast_and_pair.py to perform BLASTP searches for both query proteins against NCBI nr, filters by assembly, and pairs homologs from the same genome.
Input:
- inputs/find_homologues/PROJECT/queryA.fasta
- inputs/find_homologues/PROJECT/queryB.fasta
Output:
- results/find_homologues/PROJECT/paired_sequences.json

2. `make msa`

Description:
- Runs scripts/build_msas.sh to construct a concatenated MSA of paired homologs using MAFFT, with post-processing to remove low-quality columns.
Input:
- results/find_homologues/PROJECT/paired_sequences.json
Output:
- results/msas/PROJECT/proteinAB.aln

3. `make coevolution`

Description:
- Runs scripts/coevolution_analysis.py to compute coevolutionary couplings using PLMC, outputting a CSV of residue-residue coupling strengths.
Input:
- results/msas/PROJECT/proteinAB.aln
Output:
- results/coevolution/PROJECT/coevolution_results.csv

4. `make heatmap`

Description:
- Runs scripts/generate_heatmaps.py to create heatmaps and network diagrams of the strongest coevolving residue pairs.
Input:
- All previous outputs
Output:
- results/heatmaps/PROJECT/ (PNG images, CSV mappings)

Output

Paired homologs: JSON file with all paired sequences
MSA: Concatenated and filtered alignment in FASTA format
Coevolution results: CSV of coupling scores
Visualizations:
- Global heatmaps at multiple thresholds
- Network diagrams of inter-protein couplings
- Mapped coupling CSVs for downstream analysis

Dependencies

Python 3.7+
Biopython
pandas
numpy
matplotlib
scipy
pyyaml
tqdm
pysam
MAFFT (external, for MSA)
PLMC (external, for coevolution analysis)

Reference

This pipeline is inspired by:

Green, A.G., Elhabashy, H., Brock, K.P. et al. Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences. Nat Commun 12, 1396 (2021). https://doi.org/10.1038/s41467-021-21636-z

If you use this pipeline, please cite the above work and this repository.

License

Distributed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts		scripts
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Interaction Coevolution Pipeline

Overview

Features

Installation

Quick Start

Pipeline Steps

1. `make find`

2. `make msa`

3. `make coevolution`

4. `make heatmap`

Output

Dependencies

Reference

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Protein Interaction Coevolution Pipeline

Overview

Features

Installation

Quick Start

Pipeline Steps

1. make find

2. make msa

3. make coevolution

4. make heatmap

Output

Dependencies

Reference

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `make find`

2. `make msa`

3. `make coevolution`

4. `make heatmap`

Packages