Skip to main content
Log in

Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5′UTRs and 5′genes but were not significantly different from controls in introns, 3′UTRs and 3′genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5′genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx (http://biomed.nscc-gz.cn/zhaolab/geneprediction/#) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

The source code and data for characterizing genomic regions with repeat expansion are available at https://github.com/wykswr/ItvAnt where an example of input file format can be found under the name “example.bed”. The source code and data for DPREx are available at https://github.com/wykswr/DPREx. The HGMD-RPE and HGMD-RPE-DM datasets, as well as corresponding control files, are available at https://github.com/fanc232CO/HGMD-PREs. The reference genome and annotations were obtained from GENCODE (https://www.gencodegenes.org/). The release version of the annotation is v37: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_37/GRCh37_mapping/gencode.v37lift37.annotation.gtf.gz. The epigenetics data, including CTCF-binding sites, DNase-seq and histone modification data, were obtained from the ENCODE project (https://www.encodeproject.org/). Accession IDs are: ENCFF618DDO (CTCF ChIP-seq, narrowPeak); ENCFF021YPR (H3K27me3 ChIP-seq, bigWig); ENCFF388WCD (H3K36me3 ChIP-seq, bigWig); ENCFF481BLF (H3K4me1 ChIP-seq, bigWig); ENCFF780JKM (H3K3me3 ChIP-seq, bigWig); ENCFF411VJD (H3K9me3 ChIP-seq, bigWig). phastCons conservation scores were downloaded from UCSC: https://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons100way/hg19.100way.phastCons.bw. The pre-computed MMSplice scores were obtained from the annotation of CADD (offline version): https://cadd.gs.washington.edu/download. Non-B DNA structure annotation (hg19): https://ncifrederick.cancer.gov/bids/ftp/?nonb#, https://ncifrederick.cancer.gov/bids/ftp/actions/download/?resource=/bioinfo/static/nonb_dwnld/human_hg19/human_hg19.gff.tar.gz.

References

Download references

Acknowledgements

This work received multiple financial supports with details in the section of Funding.

Funding

This work was funded by the National Key Research and Development Program of China (2020YFB0204803), the Natural Science Foundation of China (81801132, 81971190, 61772566), and the Natural Science Foundation of Guangdong (2021A1515010256). J.A.T. was supported in part by National Institutes of Health (NIH) grants P01 CA092584 and R35 CA220430, by the Cancer Prevention Research Institute of Texas (CPRIT) grant (RP180813), and a Robert A. Welch Chemistry Chair. P.D.S., E.V.B., M.M. and D.N.C. acknowledge financial support from Qiagen Inc through a License Agreement with Cardiff University.

Author information

Authors and Affiliations

Authors

Contributions

CF and KC performed the analysis and co-wrote the manuscript. YW developed the annotation pipline as well as the DPREx model. EVB, PDS, MM, AB, HK-S, JAT and DNC made suggestions regarding project design, supplied data for analysis, and revised the manuscript. HZ designed the overall research strategy and collected the resources required for the project.

Corresponding author

Correspondence to Huiying Zhao.

Ethics declarations

Conflict of interest

The authors are unaware of any conflict of interests or competing interests.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2746 kb)

Supplementary file2 (XLSX 240 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, C., Chen, K., Wang, Y. et al. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 142, 245–274 (2023). https://doi.org/10.1007/s00439-022-02500-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-022-02500-6

Navigation