Single-cell bisulfite-free 5mC and 5hmC sequencing with high sensitivity and scalability

Significance Most current methylation profiling techniques rely on bisulfite treatment, which suffers low DNA recovery. The technique proposed in this study, named Cabernet, can be used to measure 5mC and 5hmC at single-base resolution with high genomic coverage. By using Tn5 transposome, hemi-methylation status can be measured and high-throughput methylome profiling can be achieved. Together, it provides an efficient way to analyze the epigenetic landscape of complicated biological systems.


Fig
Fig. S2.(a) Left: signal intensities of 5hmC DIP-Seq peaks (red) within promoter regions in mESCs and 5mC levels detected by Cabernet (blue) at the corresponding peak regions.Right: signal intensities of 5hmC DIP-Seq peaks (red) within promoter regions in mESCs and 5hmC levels (blue) at the corresponding peak regions.(b) Left: signal intensities of 5hmC DIP-Seq peaks (red) within gene body regions in mESCs and 5mC levels detected by Cabernet (blue) at the corresponding peak regions.Right: signal intensities of 5hmC DIP-Seq peaks (red) within gene body regions in mESCs and 5hmC levels (blue) at the corresponding peak regions.

Fig. S3 .
Fig. S3.Overview of Cabernet-H data quality.(a) Density distribution of 5hmC meanvalue detected by Cabernet-H in lambda DNA.The read dotted line is at log10(0.03).(b) Density distribution of 5hmC meanvalue detected by Cabernet-H in puc19.The read dotted line is at log10(0.03).(c) Meanvalue of 5hmC at each CpG sites detected by Cabernet-H in puc19.

Fig. S4 .
Fig. S4.Genome coverage of Cabernet, Cabernet-H and scBS-seq under different number of downsampled reads in K562 cells.

Fig. S5 .
Fig. S5.Modification pattern of 5mC and 5hmC detected by Cabernet and Cabernet-H on gene body.(a) Modification pattern of 5mC detected by Cabernet and scBS-seq on gene body in K562 cells.Averaged DNA methylation levels along gene bodies between 2 kilobase (kb) upstream of the transcription start sites (TSS) and 2 kb downstream of the transcription end sites (TES) of all RefSeq genes.Green: Cabernet; red: Cabernet_merged; gray: Cabernet_bulk; yellow: scBS-seq.(b) Modification pattern of 5hmC detected by Cabernet-H on gene body in mESCs.Averaged 5hmC levels along the gene bodies between 2 kb upstream of the transcription start sites (TSS) and 2 kb downstream of the transcription end sites (TES) of all RefSeq genes.Green: Cabernet-H; red: Cabernet-H_merged; gray: Cabernet-H_bulk.

Fig. S7 .
Fig. S7.Comparison between 5mC/5hmC levels detected by Cabernet/Cabernet-H and signal intensities of different factors at gene body region.(a) Left: signal intensities of Tet1 ChIP-Seq peaks (red) within gene body regions in mESCs and the 5mC levels detected by Cabernet (blue) at the corresponding peak regions.The horizontal axis from left to right of each box represents the Tet1 peaks, which overlapped with gene body regions, ranked by peak signal intensities from high to low.Right: signal intensities of Tet1 ChIP-Seq peaks within gene body regions in mESCs and the DNA 5hmC levels at the corresponding peak regions.(b) Left: 5mC levels (blue) at gene body regions and the expression levels of corresponding genes (red) in mESCs.The log10 of gene expression levels (transcripts per kilobase per million mapped reads, TPM) were calculated and presented.Right: DNA 5hmC levels at gene body regions and the expression levels of corresponding genes in mESCs.(c) Left: signal intensities of H3K4me3 ChIP-Seq peaks (red) within gene body regions in mESCs and 5mC levels (blue) at the corresponding peak regions.Right: signal intensities of H3K4me3 ChIP-Seq peaks (red) within gene body regions in mESCs and 5hmC levels (blue) at the corresponding peak regions.

Fig. S8 .
Fig. S8.Key features of DNA methylome in early mouse embryos.Distribution of 5mC/hemi-5mC, 5hmC/hemi-5hmC detected by Cabernet/Cabernet-H in early mouse embryos on gene body regions between 2 kilobase (kb) upstream of the transcription start sites (TSS) and 2-kb downstream of the transcription end sites (TES) of all RefSeq genes.Different colors represent different stages during embryonic development.

Fig. S9 .
Fig. S9.Principle for allele counting in Cabernet.The amplicons aligned to the same starting and ending sites on the reference genome are originated from the same allele of the single-cell genomic DNA.This allows for the detection of hemi-methylation in each allele.

Fig. S10 .
Fig. S10.Average fraction of reads from each cell that are informative of hemi-5mC and hemi-5hmC at varied sequencing depths.

Fig. S12 .
Fig. S12.Methylation levels at gene body of active/silenced genes in the maternal/paternal genome of oocytes/sperm cells and early mouse embryos.(a) 5mC level at gene body of active/silenced genes in the maternal (left) and paternal genome (right).(b) hemi-5mC abundance at gene body of active/silenced genes in the maternal (left) and paternal genome (right).(c) 5hmC level at gene body of active/silenced genes in the maternal (left) and paternal genome (right).

Fig. S14 .
Fig. S14.Heatmap of Pearson correlation of 5mC levels between two different batches of early 1-cells and sperm cells.

Fig. S15 .
Fig. S15.Key features of DNA methylome during early mouse embryos.(a) Left: clustered heatmap showing Pearson correlation between 5mC abundance of different cells at early-2-cell stage (E2C) and late-2-cell stage (L2C).Right: clustered heatmap showing Pearson correlation between hemi-5mC abundance of different cells at E2C stage and L2C stage.(b) Left: clustered heatmap showing Pearson correlation between 5hmC abundance of different cells at E2C stage and L2C stage.Right: clustered heatmap showing Pearson correlation between hemi-5hmC abundance of different cells at E2C stage and L2C stage.(c) Abundance of 5hmC modification on gene body regions at different developmental stages.

Fig. S16 .
Fig. S16.Cost-effectiveness and performance of Cabernet technology.(a) Cost of money and time by different sequencing methods.The yellow bar refers to the cost of reagents; the green bar refers to the combined cost of reagents plus Tn5.Detailed calculations of cost are shown in Dataset S6.(b) UMAP showing the clustering of E7.5 mouse embryo cells sequenced by sci-Cabernet.(c) UMAP showing the clustering of E7.5 mouse embryo cells based on 10x scRNA-seq data.(d) Violin plot showing the genome coverage of sci-Cabernet at different sequencing depths in K562 cells.