Yilei Fu, Robel Dagnew, Jon Moller, Rohit Kolora, Alex Leonard, Halimat Atanda, Arijita Sarkar, Senthilkumar Kailasam, Zahra Seyfollahi, Nilabja Bhattacharjee, Sarah Eger, Michael Olufemi, Ben Braun
While the gold standard for DNA methylation detection is bisulfite sequencing, the advent of long-read sequencing has made it possible to detect DNA methylation and additional modifications in native nucleic acid. The goal of this project is to evaluating effects of smoothing on long read methylation calling.
Use smoothing with caution, especially where single CpG effect have biological relevance like tissue-specific or cell-type specific methylation analyses. Some specific examples of "lost" CpG signals are highlighted in the slides.
Pipeline/flowchart of MethSmoothEval
This project investigates whether smoothing of DNA methylation signals—historically essential in short-read bisulfite sequencing (WGBS)—is still necessary in the era of high-accuracy long-read sequencing (ONT, PacBio).
We aimed to benchmark single-CpG vs smoothed-CpG analysis to assess:
- When smoothing improves detection of differentially methylated CpGs/regions (DMCpGs/DMRs).
- When smoothing introduces artifacts or masks true single-CpG variation.
- Whether per-read accuracy and sequence context in ONT impact CpG methylation detection.
-
Data Sources
- Short-read: SEQC2 EpiQC data (WGBS / EM-Seq / oxidative bisulfite sequencing).
- Long-read: ONT datasets (HG002, HG005).
-
Benchmarking
- Run smoothing-based tools:
BSmooth
,DSS
. - Run long-read tools:
Modkit
,pb-CpG-tools
. - Compare per-CpG and DMR calls across technologies.
- Run smoothing-based tools:
-
Key Analyses
- CpG density profiling across hg38 (focus on chr22).
- Evaluate whether smoothing disrupts single-CpG interpretation (e.g., IGV inspection of low vs high DNAm CpGs in proximity).
- Identify counter-examples where ONT single-CpGs are biologically informative but missed in short-read smoothing pipelines.
- Assess tissue-specific effects (blood vs brain vs tumor).
-
Validation
- Cross-check ONT read-level calls with EM-Seq / TrueMethyl.
- Ask: Are ONT-only CpG calls false positives, or do they reflect true biological signal?
- Catalog of CpG regions on hg38
- Where single-CpG analysis is reliable.
- Where smoothing remains necessary.
- Benchmarking pipeline
- Reproducible and fast for new long-read chemistries.
- Extensible to tissue- and cell-type–specific studies.
- Case Studies
- Examples of ONT CpGs not recovered by EM-Seq smoothing.
- Markers distinguishing HG002 vs HG005.
- Follow-up Questions
- Sequence context dependencies for CpG calling accuracy.
- Reliability of methylation calling in repetitive / TR regions.
- Tissue-specific CpG resolution requirements.