Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5′UTRs and 5′genes but were not significantly different from controls in introns, 3′UTRs and 3′genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5′genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx (http://biomed.nscc-gz.cn/zhaolab/geneprediction/#) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
Similar content being viewed by others
Availability of data and materials
The source code and data for characterizing genomic regions with repeat expansion are available at https://github.com/wykswr/ItvAnt where an example of input file format can be found under the name “example.bed”. The source code and data for DPREx are available at https://github.com/wykswr/DPREx. The HGMD-RPE and HGMD-RPE-DM datasets, as well as corresponding control files, are available at https://github.com/fanc232CO/HGMD-PREs. The reference genome and annotations were obtained from GENCODE (https://www.gencodegenes.org/). The release version of the annotation is v37: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_37/GRCh37_mapping/gencode.v37lift37.annotation.gtf.gz. The epigenetics data, including CTCF-binding sites, DNase-seq and histone modification data, were obtained from the ENCODE project (https://www.encodeproject.org/). Accession IDs are: ENCFF618DDO (CTCF ChIP-seq, narrowPeak); ENCFF021YPR (H3K27me3 ChIP-seq, bigWig); ENCFF388WCD (H3K36me3 ChIP-seq, bigWig); ENCFF481BLF (H3K4me1 ChIP-seq, bigWig); ENCFF780JKM (H3K3me3 ChIP-seq, bigWig); ENCFF411VJD (H3K9me3 ChIP-seq, bigWig). phastCons conservation scores were downloaded from UCSC: https://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons100way/hg19.100way.phastCons.bw. The pre-computed MMSplice scores were obtained from the annotation of CADD (offline version): https://cadd.gs.washington.edu/download. Non-B DNA structure annotation (hg19): https://ncifrederick.cancer.gov/bids/ftp/?nonb#, https://ncifrederick.cancer.gov/bids/ftp/actions/download/?resource=/bioinfo/static/nonb_dwnld/human_hg19/human_hg19.gff.tar.gz.
References
Abu Diab M, Mor-Shaked H, Cohen E, Cohen-Hadad Y, Ram O, Epsztejn-Litman S, Eiges R (2018) The G-rich repeats in FMR1 and C9orf72 Loci are hotspots for local unpairing of DNA. Genetics 210:1239–1252. https://doi.org/10.1534/genetics.118.301672
Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J (2021) Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 53:354–366. https://doi.org/10.1038/s41588-021-00782-6
Bacolla A, Tainer JA, Vasquez KM, Cooper DN (2016) Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences. Nucl Acids Res 44:5673–5688. https://doi.org/10.1093/nar/gkw261
Bacolla A, Sengupta S, Ye Z, Yang C, Mitra J, De-Paula RB, Hegde ML, Ahmed Z, Mort M, Cooper DN, Mitra S, Tainer JA (2021) Heritable pattern of oxidized DNA base repair coincides with pre-targeting of repair complexes to open chromatin. Nucl Acids Res 49:221–243. https://doi.org/10.1093/nar/gkaa1120
Balendra R, Isaacs AM (2018) C9orf72-mediated ALS and FTD: multiple pathways to disease. Nat Rev Neurol 14:544–558. https://doi.org/10.1038/s41582-018-0047-2
Bassuny WM, Ihara K, Sasaki Y, Kuromaru R, Kohno H, Matsuura N, Hara T (2003) A functional polymorphism in the promoter/enhancer region of the FOXP3/Scurfin gene associated with type 1 diabetes. Immunogenetics 55:149–156. https://doi.org/10.1007/s00251-003-0559-8
Becker JS, Nicetto D, Zaret KS (2016) H3K9me3-dependent heterochromatin: barrier to cell fate changes. Trends Genet TIG 32:29–41. https://doi.org/10.1016/j.tig.2015.11.001
Belokopytova PS, Nuriddinov MA, Mozheiko EA, Fishman D, Fishman V (2020) Quantitative prediction of enhancer-promoter interactions. Genome Res 30:72–84. https://doi.org/10.1101/gr.249367.119
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucl Acids Res 27:573–580. https://doi.org/10.1093/nar/27.2.573
Bird TD (1993) Myotonic dystrophy type 1. In: Adam MP, Ardinger HH, Pagon RA, Wallace SE, Bean LJH, Stephens K, Amemiya A (eds) GeneReviews (®). University of Washington, Seattle Copyright © 1993–2020, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved., Seattle (WA)
Bonasio R, Tu S, Reinberg D (2010) Molecular signals of epigenetic states. Science 330:612. https://doi.org/10.1126/science.1191078
Cai Y, Zhang Y, Loh YP, Tng JQ, Lim MC, Cao Z, Raju A, Lieberman Aiden E, Li S, Manikandan L, Tergaonkar V, Tucker-Kellogg G, Fullwood MJ (2021) H3K27me3-rich genomic regions can function as silencers to repress gene expression via chromatin interactions. Nat Commun 12:719. https://doi.org/10.1038/s41467-021-20940-y
Cer RZ, Donohue DE, Mudunuri US, Temiz NA, Loss MA, Starner NJ, Halusa GN, Volfovsky N, Yi M, Luke BT, Bacolla A, Collins JR, Stephens RM (2013a) Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucl Acids Res 41:D94–D100. https://doi.org/10.1093/nar/gks955
Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371:215–220. https://doi.org/10.1038/371215a0
Cheng J, Nguyen TYD, Cygan KJ, Çelik MH, Fairbrother WG, Avsec Ž, Gagneur J (2019) MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol 20:48. https://doi.org/10.1186/s13059-019-1653-z
Choudhary K, Lai YH, Tran EJ, Aviran S (2019) dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome Biol 20:40. https://doi.org/10.1186/s13059-019-1641-3
Clay FE, Cork MJ, Tarlow JK, Blakemore AI, Harrington CI, Lewis F, Duff GW (1994) Interleukin 1 receptor antagonist gene polymorphism association with lichen sclerosus. Hum Genet 94:407–410. https://doi.org/10.1007/BF00201602
Conlon EG, Lu L, Sharma A, Yamazaki T, Tang T, Shneider NA, Manley JL (2016) The C9ORF72 GGGGCC expansion forms RNA G-quadruplex inclusions and sequesters hnRNP H to disrupt splicing in ALS brains. Elife. https://doi.org/10.7554/eLife.17820
Consortium EP (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e1001046. https://doi.org/10.1371/journal.pbio.1001046
Dashnow H, Lek M, Phipson B, Halman A, Sadedin S, Lonsdale A, Davis M, Lamont P, Clayton JS, Laing NG, MacArthur DG, Oshlack A (2018) STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biol 19:121. https://doi.org/10.1186/s13059-018-1505-2
Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA, Jain K, Baymuradov UK, Narayanan AK, Onate KC, Graham K, Miyasato SR, Dreszer TR, Strattan JS, Jolanki O, Tanaka FY, Cherry JM (2018) The encyclopedia of DNA elements (ENCODE): data portal update. Nucl Acids Res 46:D794–D801. https://doi.org/10.1093/nar/gkx1081
de Wit E, Vos ES, Holwerda SJ, Valdes-Quezada C, Verstegen MJ, Teunissen H, Splinter E, Wijchers PJ, Krijger PH, de Laat W (2015) CTCF binding polarity determines chromatin looping. Mol Cell 60:676–684. https://doi.org/10.1016/j.molcel.2015.09.023
Den Dunnen WFA (2017) Trinucleotide repeat disorders. Handb Clin Neurol 145:383–391. https://doi.org/10.1016/b978-0-12-802395-2.00027-4
Depienne C, Mandel JL (2021) 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am J Hum Genet. https://doi.org/10.1016/j.ajhg.2021.03.011
Dettori LG, Torrejon D, Chakraborty A, Dutta A, Mohamed M, Papp C, Kuznetsov VA, Sung P, Feng W, Bah A (2021) A tale of loops and tails: the role of intrinsically disordered protein regions in R-loop recognition and phase separation. Front Mol Biosci 8:691694. https://doi.org/10.3389/fmolb.2021.691694
Dolzhenko E, van Vugt J, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, Ajay SS, Rajan V, Lajoie BR, Johnson NH, Kingsbury Z, Humphray SJ, Schellevis RD, Brands WJ, Baker M, Rademakers R, Kooyman M, Tazelaar GHP, van Es MA, McLaughlin R, Sproviero W, Shatunov A, Jones A, Al Khleifat A, Pittman A, Morgan S, Hardiman O, Al-Chalabi A, Shaw C, Smith B, Neo EJ, Morrison K, Shaw PJ, Reeves C, Winterkorn L, Wexler NS, Group US-VCR, Housman DE, Ng CW, Li AL, Taft RJ, van den Berg LH, Bentley DR, Veldink JH, Eberle MA (2017) Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res 27: 1895–1903https://doi.org/10.1101/gr.225672.117
Dolzhenko E, Bennett MF, Richmond PA, Trost B, Chen S, van Vugt J, Nguyen C, Narzisi G, Gainullin VG, Gross AM, Lajoie BR, Taft RJ, Wasserman WW, Scherer SW, Veldink JH, Bentley DR, Yuen RKC, Bahlo M, Eberle MA (2020) ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data. Genome Biol 21:102. https://doi.org/10.1186/s13059-020-02017-z
Du X, Wojtowicz D, Bowers AA, Levens D, Benham CJ, Przytycka TM (2013) The genome-wide distribution of non-B DNA motifs is shaped by operon structure and suggests the transcriptional importance of non-B DNA structures in Escherichia coli. Nucl Acids Res 41:5965–5977. https://doi.org/10.1093/nar/gkt308
Eckelmann BJ, Bacolla A, Wang H, Ye Z, Guerrero EN, Jiang W, El-Zein R, Hegde ML, Tomkinson AE, Tainer JA, Mitra S (2020) XRCC1 promotes replication restart, nascent fork degradation and mutagenic DNA repair in BRCA2-deficient cells. NAR Cancer 2: zcaa013. https://doi.org/10.1093/narcan/zcaa013
Eddy J, Maizels N (2008) Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes. Nucl Acids Res 36:1321–1333. https://doi.org/10.1093/nar/gkm1138
Figueroa KP, Farooqi S, Harrup K, Frank J, O’Rahilly S, Pulst SM (2009) Genetic variance in the spinocerebellar ataxia type 2 (ATXN2) gene in children with severe early onset obesity. PLoS ONE 4:e8280. https://doi.org/10.1371/journal.pone.0008280
Flower MD, Tabrizi SJ (2020) A small molecule kicks repeat expansion into reverse. Nat Genet 52:136–137. https://doi.org/10.1038/s41588-020-0577-6
Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, Modi BP, Correard S, Gheorghe M, Baranašić D, Santana-Garcia W, Tan G, Chèneby J, Ballester B, Parcy F, Sandelin A, Lenhard B, Wasserman WW, Mathelier A (2019) JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 48:D87–D92. https://doi.org/10.1093/nar/gkz1001
Fotsing SF, Margoliash J, Wang C, Saini S, Yanicky R, Shleizer-Burko S, Goren A, Gymrek M (2019) The impact of short tandem repeat variation on gene expression. Nat Genet 51:1652–1659. https://doi.org/10.1038/s41588-019-0521-9
Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FCP, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner M-M, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Paten B, Reymond A, Tress ML, Flicek P (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–D773. https://doi.org/10.1093/nar/gky955
Freibaum BD, Taylor JP (2017) The role of dipeptide repeats in C9ORF72-Related ALS-FTD. Front Mol Neurosci 10:35. https://doi.org/10.3389/fnmol.2017.00035
Freudenreich CH (2018) R-loops: targets for nuclease cleavage and repeat instability. Curr Genet 64:789–794. https://doi.org/10.1007/s00294-018-0806-z
Gatto EM, Rojas NG, Persi G, Etcheverry JL, Cesarini ME, Perandones C (2020) Huntington disease: advances in the understanding of its mechanisms. Clin Park Relat Disord 3:100056. https://doi.org/10.1016/j.prdoa.2020.100056
Gijselinck I, Van Mossevelde S, van der Zee J, Sieben A, Engelborghs S, De Bleecker J, Ivanoiu A, Deryck O, Edbauer D, Zhang M, Heeman B, Baumer V, Van den Broeck M, Mattheijssens M, Peeters K, Rogaeva E, De Jonghe P, Cras P, Martin JJ, de Deyn PP, Cruts M, Van Broeckhoven C (2016) The C9orf72 repeat size correlates with onset age of disease, DNA methylation and transcriptional downregulation of the promoter. Mol Psychiatry 21:1112–1124. https://doi.org/10.1038/mp.2015.159
Ginno PA, Lott PL, Christensen HC, Korf I, Chedin F (2012) R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol Cell 45:814–825. https://doi.org/10.1016/j.molcel.2012.01.017
Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinform (oxf, Engl) 27:1017–1018. https://doi.org/10.1093/bioinformatics/btr064
Gray LT, Vallur AC, Eddy J, Maizels N (2014) G quadruplexes are genomewide targets of transcriptional helicases XPB and XPD. Nat Chem Biol 10:313–318. https://doi.org/10.1038/nchembio.1475
Grishchenko IV, Purvinsh YV, Yudkin DV (2020) Mystery of expansion: DNA metabolism and unstable repeats. Adv Exp Med Biol 1241:101–124. https://doi.org/10.1007/978-3-030-41283-8_7
Groh M, Lufino MM, Wade-Martins R, Gromak N (2014) R-loops associated with triplet repeat expansions promote gene silencing in Friedreich ataxia and fragile X syndrome. PLoS Genet 10:e1004318. https://doi.org/10.1371/journal.pgen.1004318
Guo J, Chen L, Li GM (2017) DNA mismatch repair in trinucleotide repeat instability. Sci China Life Sci 60:1087–1092. https://doi.org/10.1007/s11427-017-9186-7
Hallinan JP, Doyle LA, Shen BW, Gewe MM, Takushi B, Kennedy MA, Friend D, Roberts JM, Bradley P, Stoddard BL (2021) Design of functionalised circular tandem repeat proteins with longer repeat topologies and enhanced subunit contact surfaces. Commun Biol 4:1240. https://doi.org/10.1038/s42003-021-02766-y
Hambarde S, Tsai CL, Pandita RK, Bacolla A, Maitra A, Charaka V, Hunt CR, Kumar R, Limbo O, Le Meur R, Chazin WJ, Tsutakawa SE, Russell P, Schlacher K, Pandita TK, Tainer JA (2021) EXO5-DNA structure and BLM interactions direct DNA resection critical for ATR-dependent replication restart. Mol Cell 81(2989–3006):e9. https://doi.org/10.1016/j.molcel.2021.05.027
Hammel M, Tainer JA (2021) X-ray scattering reveals disordered linkers and dynamic interfaces in complexes and mechanisms for DNA double-strand break repair impacting cell and cancer biology. Protein Sci 30:1735–1756. https://doi.org/10.1002/pro.4133
Hannan AJ (2018) Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet 19:286–298. https://doi.org/10.1038/nrg.2017.115
Hanson J, Paliwal K, Zhou Y (2018) Accurate single-sequence prediction of protein intrinsic disorder by an ensemble of deep recurrent and convolutional architectures. J Chem Inf Model 58:2369–2376. https://doi.org/10.1021/acs.jcim.8b00636
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33:2842–2849. https://doi.org/10.1093/bioinformatics/btx218
Hefferon TW, Groman JD, Yurk CE, Cutting GR (2004) A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing. Proc Natl Acad Sci USA 101:3504–3509. https://doi.org/10.1073/pnas.0400182101
Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA, Wang W, Weng Z, Green RD, Crawford GE, Ren B (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39:311–318. https://doi.org/10.1038/ng1966
Hire RR, Katrak SM, Vaidya S, Radhakrishnan K, Seshadri M (2011) Spinocerebellar ataxia type 17 in Indian patients: two rare cases of homozygous expansions. Clin Genet 80:472–477. https://doi.org/10.1111/j.1399-0004.2010.01589.x
Holmes SE, O’Hearn E, Rosenblatt A, Callahan C, Hwang HS, Ingersoll-Ashworth RG, Fleisher A, Stevanin G, Brice A, Potter NT, Ross CA, Margolis RL (2001) A repeat expansion in the gene encoding junctophilin-3 is associated with Huntington disease-like 2. Nat Genet 29:377–378. https://doi.org/10.1038/ng760
Hui J, Hung LH, Heiner M, Schreiner S, Neumuller N, Reither G, Haas SA, Bindereif A (2005) Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J 24:1988–1998. https://doi.org/10.1038/sj.emboj.7600677
Jenjaroenpun P, Wongsurawat T, Sutheeworapong S, Kuznetsov VA (2017) R-loopDB: a database for R-loop forming sequences (RLFS) and R-loops. Nucl Acids Res 45:D119–D127. https://doi.org/10.1093/nar/gkw1054
Jorda J, Xue B, Uversky VN, Kajava AV (2010) Protein tandem repeats—the more perfect, the less structured. FEBS J 277:2673–2682. https://doi.org/10.1111/j.1742-464X.2010.07684.x
Kang H, Shokhirev MN, Xu Z, Chandran S, Dixon JR, Hetzer MW (2020) Dynamic regulation of histone modifications and long-range chromosomal interactions during postmitotic transcriptional reactivation. Genes Dev 34:913–930. https://doi.org/10.1101/gad.335794.119
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Genome Aggregation Database C, Neale BM, Daly MJ, MacArthur DG (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581:434–443. https://doi.org/10.1038/s41586-020-2308-7
Ke Y, Rao J, Zhao H, Lu Y, Xiao N, Yang Y (2020) Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting. Bioinformatics 36:4576–4582. https://doi.org/10.1093/bioinformatics/btaa534
Kentepozidou E, Aitken SJ, Feig C, Stefflova K, Ibarra-Soria X, Odom DT, Roller M, Flicek P (2020) Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains. Genome Biol 21:5. https://doi.org/10.1186/s13059-019-1894-x
Khristich AN, Mirkin SM (2020) On the wrong DNA track: molecular mechanisms of repeat-mediated genome instability. J Biol Chem 295:4134–4170. https://doi.org/10.1074/jbc.REV119.007678
Kim MW, Chelliah Y, Kim SW, Otwinowski Z, Bezprozvanny I (2009) Secondary structure of Huntingtin amino-terminal region. Structure 17:1205–1212. https://doi.org/10.1016/j.str.2009.08.002
Kloster E, Saft C, Epplen JT, Arning L (2013) CNR1 variation is associated with the age at onset in Huntington disease. Eur J Med Genet 56:416–419. https://doi.org/10.1016/j.ejmg.2013.05.007
Koutsis G, Karadima G, Pandraud A, Sweeney MG, Paudel R, Houlden H, Wood NW, Panas M (2012) Genetic screening of Greek patients with Huntington’s disease phenocopies identifies an SCA8 expansion. J Neurol 259:1874–1878. https://doi.org/10.1007/s00415-012-6430-9
Kristensen VN, Andersen TI, Lindblom A, Erikstein B, Magnus P, Borresen-Dale AL (1998) A rare CYP19 (aromatase) variant may increase the risk of breast cancer. Pharmacogenetics 8:43–48. https://doi.org/10.1097/00008571-199802000-00006
Krzyzosiak WJ, Sobczak K, Wojciechowska M, Fiszer A, Mykowska A, Kozlowski P (2012) Triplet repeat RNA structure and its role as pathogenic agent and therapeutic target. Nucl Acids Res 40:11–26. https://doi.org/10.1093/nar/gkr729
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu Y-C, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh K-H, Feizi S, Karlic R, Kim A-R, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, Jager PLD, Farnham PJ, Fisher SJ, Haussler D, Jones SJM, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai L-H, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M (2015) Integrative analysis of 111 reference human epigenomes. Nature 518:317–330. https://doi.org/10.1038/nature14248
Kuznetsov VA, Bondarenko V, Wongsurawat T, Yenamandra SP, Jenjaroenpun P (2018) Toward predictive R-loop computational biology: genome-scale prediction of R-loops reveals their association with complex promoter structures, G-quadruplexes and transcriptionally active enhancers. Nucleic Acids Res 46:7566–7585. https://doi.org/10.1093/nar/gky554
Lai Y, Beaver JM, Laverde E, Liu Y (2020) Trinucleotide repeat instability via DNA base excision repair. DNA Repair (amst) 93:102912. https://doi.org/10.1016/j.dnarep.2020.102912
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. https://doi.org/10.1038/35057062
Lanni S, Pearson CE (2019) Molecular genetics of congenital myotonic dystrophy. Neurobiol Dis 132:104533. https://doi.org/10.1016/j.nbd.2019.104533
Laverde EE, Lai Y, Leng F, Balakrishnan L, Freudenreich CH, Liu Y (2020) R-loops promote trinucleotide repeat deletion through DNA base excision repair enzymatic activities. J Biol Chem 295:13902–13913. https://doi.org/10.1074/jbc.RA120.014161
Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221. https://doi.org/10.1093/oxfordjournals.molbev.a040442
Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, Imielinski M, Group PSVW, Weischenfeldt J, Beroukhim R, Campbell PJ, Consortium P (2020) Patterns of somatic structural variation in human cancer genomes. Nature 578: 112–121.https://doi.org/10.1038/s41586-019-1913-9
Libby RT, Hagerman KA, Pineda VV, Lau R, Cho DH, Baccam SL, Axford MM, Cleary JD, Moore JM, Sopher BL, Tapscott SJ, Filippova GN, Pearson CE, La Spada AR (2008) CTCF cis-regulates trinucleotide repeat instability in an epigenetic manner: a novel basis for mutational hot spot determination. PLoS Genet 4:e1000257. https://doi.org/10.1371/journal.pgen.1000257
Liquori CL, Ricker K, Moseley ML, Jacobsen JF, Kress W, Naylor SL, Day JW, Ranum LP (2001) Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9. Science 293:864–867. https://doi.org/10.1126/science.1062125
Liu Y, Wilson SH (2012) DNA base excision repair: a mechanism of trinucleotide repeat expansion. Trends Biochem Sci 37:162–172. https://doi.org/10.1016/j.tibs.2011.12.002
Loomis EW, Sanz LA, Chedin F, Hagerman PJ (2014) Transcription-associated R-loop formation across the human FMR1 CGG-repeat region. PLoS Genet 10:e1004294. https://doi.org/10.1371/journal.pgen.1004294
Lorentzon M, Swanson C, Eriksson AL, Mellstrom D, Ohlsson C (2006) Polymorphisms in the aromatase gene predict areal BMD as a result of affected cortical bone size: the GOOD study. J Bone Miner Res 21:332–339. https://doi.org/10.1359/JBMR.051026
Lorenz R, Bernhart SH, Honer Z, Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA package 2.0. Algorithms Mol Biol 6:26. https://doi.org/10.1186/1748-7188-6-26
Loureiro JR, Oliveira CL, Silveira I (2016) Unstable repeat expansions in neurodegenerative diseases: nucleocytoplasmic transport emerges on the scene. Neurobiol Aging 39:174–183. https://doi.org/10.1016/j.neurobiolaging.2015.12.007
Ma X, Qi X, Chen C, Lin H, Xiong H, Li Y, Jiang J (2010) Association between CYP19 polymorphisms and breast cancer risk: results from 10,592 cases and 11,720 controls. Breast Cancer Res Treat 122:495–501. https://doi.org/10.1007/s10549-009-0693-6
Mackay RP, Xu Q, Weinberger PM (2020) R-Loop physiology and pathology: a brief review. DNA Cell Biol 39:1914–1925. https://doi.org/10.1089/dna.2020.5906
Madeira JLO, Souza ABC, Cunha FS, Batista RL, Gomes NL, Rodrigues AS, de Haidar M, Jorge F, Chadi G, Callegaro D, Mendonca BB, Costa EMF, Domenice S (2018) A severe phenotype of Kennedy disease associated with a very large CAG repeat expansion. Muscle Nerve 57:E95-e97. https://doi.org/10.1002/mus.25952
Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C, Narang M, Barceló J, O’Hoy K et al (1992) Myotonic dystrophy mutation: an unstable CTG repeat in the 3′ untranslated region of the gene. Science 255:1253–1255. https://doi.org/10.1126/science.1546325
Maiuri T, Suart CE, Hung CLK, Graham KJ, Barba Bazan CA, Truant R (2019) DNA damage repair in huntington’s disease and other neurodegenerative diseases. Neurotherapeutics 16:948–956. https://doi.org/10.1007/s13311-019-00768-7
Malik I, Kelley CP, Wang ET, Todd PK (2021) Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat Rev Mol Cell Biol 22:589–607. https://doi.org/10.1038/s41580-021-00382-6
Malla B, Guo X, Senger G, Chasapopoulou Z, Yildirim F (2021) A systematic review of transcriptional dysregulation in huntington’s disease studied by RNA sequencing. Front Genet 12:751033. https://doi.org/10.3389/fgene.2021.751033
Melamed O, Behar DM, Bram C, Magal N, Pras E, Reznik-Wolf H, Borochowitz ZU, Davidov B, Mor-Cohen R, Baris HN (2015) Founder mutation for Huntington disease in Caucasus Jews. Clin Genet 87:167–172. https://doi.org/10.1111/cge.12344
Minnoye L, Marinov GK, Krausgruber T, Pan L, Marand AP, Secchia S, Greenleaf WJ, Furlong EEM, Zhao K, Schmitz RJ, Bock C, Aerts S (2021) Chromatin accessibility profiling methods. Nat Rev Methods Primers 1:1–24. https://doi.org/10.1038/s43586-020-00008-9
Mirkin SM (2007) Expandable DNA repeats and human disease. Nature 447:932–940. https://doi.org/10.1038/nature05977
Mitsuhashi S, Matsumoto N (2020) Long-read sequencing for rare human genetic diseases. J Hum Genet 65:11–19. https://doi.org/10.1038/s10038-019-0671-8
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, Oma Y, Kino Y, Mitsuhashi H, Matsumoto N (2019) Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol 20:58. https://doi.org/10.1186/s13059-019-1667-6
Mitsuhashi S, Frith MC, Matsumoto N (2021) Genome-wide survey of tandem repeats by nanopore sequencing shows that disease-associated repeats are more polymorphic in the general population. BMC Med Genom 14:17. https://doi.org/10.1186/s12920-020-00853-3
Mooers BH, Logue JS, Berglund JA (2005) The structural basis of myotonic dystrophy from the crystal structure of CUG repeats. Proc Natl Acad Sci USA 102:16626–16631. https://doi.org/10.1073/pnas.0505873102
Neil AJ, Liang MU, Khristich AN, Shah KA, Mirkin SM (2018) RNA-DNA hybrids promote the expansion of Friedreich’s ataxia (GAA)n repeats via break-induced replication. Nucleic Acids Res 46:3487–3497. https://doi.org/10.1093/nar/gky099
Niehrs C, Luke B (2020) Regulatory R-loops as facilitators of gene expression and genome stability. Nat Rev Mol Cell Biol 21:167–178. https://doi.org/10.1038/s41580-019-0206-3
Oldfield CJ, Dunker AK (2014) Intrinsically disordered proteins and intrinsically disordered protein regions. Annu Rev Biochem 83:553–584. https://doi.org/10.1146/annurev-biochem-072711-164947
Ong C-T, Corces VG (2014) CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet 15:234–246. https://doi.org/10.1038/nrg3663
Orr HT, Zoghbi HY (2007) Trinucleotide repeat disorders. Annu Rev Neurosci 30:575–621. https://doi.org/10.1146/annurev.neuro.29.051605.113042
Paulson H (2018) Repeat expansion diseases. Handb Clin Neurol 147:105–123. https://doi.org/10.1016/B978-0-444-63233-3.00009-9
Peters JM (2021) How DNA loop extrusion mediated by cohesin enables V (D)J recombination. Curr Opin Cell Biol 70:75–83. https://doi.org/10.1016/j.ceb.2020.11.007
Peters AHFM, Kubicek S, Mechtler K, O’Sullivan RJ, Derijck AAHA, Perez-Burgos L, Kohlmaier A, Opravil S, Tachibana M, Shinkai Y, Martens JHA, Jenuwein T (2003) Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. Mol Cell 12:1577–1589. https://doi.org/10.1016/S1097-2765(03)00477-5
Phillips JE, Corces VG (2009) CTCF: master weaver of the genome. Cell 137:1194–1211. https://doi.org/10.1016/j.cell.2009.06.001
Polak P, Domany E (2006) Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genom 7:133. https://doi.org/10.1186/1471-2164-7-133
Pugacheva EM, Kubo N, Loukinov D, Tajmul M, Kang S, Kovalchuk AL, Strunnikov AV, Zentner GE, Ren B, Lobanenkov VV (2020) CTCF mediates chromatin looping via N-terminal domain-dependent cohesin retention. Proc Natl Acad Sci. https://doi.org/10.1073/pnas.1911708117
Qin Q, Fan J, Zheng R, Wan C, Mei S, Wu Q, Sun H, Brown M, Zhang J, Meyer CA, Liu XS (2020) Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol 21:32. https://doi.org/10.1186/s13059-020-1934-6
Quarrell OW, Rigby AS, Barron L, Crow Y, Dalton A, Dennis N, Fryer AE, Heydon F, Kinning E, Lashwood A, Losekoot M, Margerison L, McDonnell S, Morrison PJ, Norman A, Peterson M, Raymond FL, Simpson S, Thompson E, Warner J (2007) Reduced penetrance alleles for Huntington’s disease: a multi-centre direct observational study. J Med Genet 44:e68. https://doi.org/10.1136/jmg.2006.045120
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ (2016) Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucl Acids Res 44:3750–3762. https://doi.org/10.1093/nar/gkw219
Quinlan AR, Hall IM (2010a) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033
Rao Suhas SP, Huntley Miriam H, Durand Neva C, Stamenova Elena K, Bochkov Ivan D, Robinson James T, Sanborn Adrian L, Machol I, Omer Arina D, Lander Eric S, Aiden Erez L (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680. https://doi.org/10.1016/j.cell.2014.11.021
Ratmeyer L, Vinayak R, Zhong YY, Zon G, Wilson WD (1994) Sequence specific thermodynamic and structural properties for DNA. RNA duplexes. Biochemistry 33:5298–5304. https://doi.org/10.1021/bi00183a037
Reddy K, Zamiri B, Stanley SYR, Macgregor RB Jr, Pearson CE (2013) The disease-associated r (GGGGCC)n repeat from the C9orf72 gene forms tract length-dependent uni- and multimolecular RNA G-quadruplex structures. J Biol Chem 288:9860–9866. https://doi.org/10.1074/jbc.C113.452532
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M (2019) CADD: predicting the deleteriousness of variants throughout the human genome. Nucl Acids Res 47:D886–D894. https://doi.org/10.1093/nar/gky1016
Rentzsch P, Schubach M, Shendure J, Kircher M (2021) CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genom Med 13:31–31. https://doi.org/10.1186/s13073-021-00835-9
Rice G, Rebeiz M (2019) Evolution: how many phenotypes do regulatory mutations affect? Curr Biol 29:R21–R23. https://doi.org/10.1016/j.cub.2018.11.027
Robin G, Lopez JR, Espinal GM, Hulsizer S, Hagerman PJ, Pessah IN (2017) Calcium dysregulation and Cdk5-ATM pathway involved in a mouse model of fragile X-associated tremor/ataxia syndrome. Hum Mol Genet 26:2649–2666. https://doi.org/10.1093/hmg/ddx148
Rodriguez CM, Todd PK (2019) New pathologic mechanisms in nucleotide repeat expansion disorders. Neurobiol Dis 130:104515. https://doi.org/10.1016/j.nbd.2019.104515
Roh TY, Cuddapah S, Cui K, Zhao K (2006) The genomic landscape of histone modifications in human T cells. Proc Natl Acad Sci USA 103:15782–15787. https://doi.org/10.1073/pnas.0607617103
Roman T, Schmitz M, Polanczyk GV, Eizirik M, Rohde LA, Hutz MH (2002) Further evidence for the association between attention-deficit/hyperactivity disorder and the dopamine-beta-hydroxylase gene. Am J Med Genet 114:154–158. https://doi.org/10.1002/ajmg.10194
Santoro M, Masciullo M, Silvestri G, Novelli G, Botta A (2017) Myotonic dystrophy type 1: role of CCG, CTC and CGG interruptions within DMPK alleles in the pathogenesis and molecular diagnosis. Clin Genet 92:355–364. https://doi.org/10.1111/cge.12954
Santos-Pereira JM, Aguilera A (2015) R loops: new modulators of genome dynamics and function. Nat Rev Genet 16:583–597. https://doi.org/10.1038/nrg3961
Schmidt MHM, Pearson CE (2016) Disease-associated repeat instability and mismatch repair. DNA Repair (amst) 38:117–126. https://doi.org/10.1016/j.dnarep.2015.11.008
Schmidt D, Schwalie PC, Wilson MD, Ballester B, Goncalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT (2012) Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148:335–348. https://doi.org/10.1016/j.cell.2011.11.058
Schoenfelder S, Fraser P (2019) Long-range enhancer–promoter contacts in gene expression control. Nat Rev Genet 20:437–455. https://doi.org/10.1038/s41576-019-0128-0
Schreiber J, Durham T, Bilmes J, Noble WS (2020) Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol 21:81. https://doi.org/10.1186/s13059-020-01977-6
Sims RJ 3rd, Reinberg D (2009) Processing the H3K36me3 signature. Nat Genet 41:270–271. https://doi.org/10.1038/ng0309-270
Smedley D, Schubach M, Jacobsen JOB, Köhler S, Zemojtel T, Spielmann M, Jäger M, Hochheiser H, Washington NL, McMurry JA et al (2022) A whole-genome analysis framework for effective identification of path ogenic regulatory variants in Mendelian disease. Am J Hum Genet 99:595–606
Smola MJ, Calabrese JM, Weeks KM (2015) Detection of RNA-protein interactions in living cells with SHAPE. Biochemistry 54:6867–6875. https://doi.org/10.1021/acs.biochem.5b00977
Sobczak K, de Mezer M, Michlewski G, Krol J, Krzyzosiak WJ (2003) RNA structure of trinucleotide repeats associated with human neurological diseases. Nucl Acids Res 31:5469–5482. https://doi.org/10.1093/nar/gkg766
Stenson PD, Mort M, Ball EV, Chapman M, Evans K, Azevedo L, Hayden M, Heywood S, Millar DS, Phillips AD, Cooper DN (2020) The human gene mutation database (HGMD ( (R))): optimizing its use in a clinical diagnostic or research setting. Hum Genet 139:1197–1207. https://doi.org/10.1007/s00439-020-02199-3
Su XA, Freudenreich CH (2017) Cytosine deamination and base excision repair cause R-loop-induced CAG repeat fragility and instability in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 114:E8392–E8401. https://doi.org/10.1073/pnas.1711283114
Sun H, Satake W, Zhang C, Nagai Y, Tian Y, Fu S, Yu J, Qian Y, Qian Y, Chu J, Toda T (2011) Genetic and clinical analysis in a Chinese parkinsonism-predominant spinocerebellar ataxia type 2 family. J Hum Genet 56:330–334. https://doi.org/10.1038/jhg.2011.14
Swami M, Hendricks AE, Gillis T, Massood T, Mysore J, Myers RH, Wheeler VC (2009) Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset. Hum Mol Genet 18:3039–3047. https://doi.org/10.1093/hmg/ddp242
Tabrizi SJ, Flower MD, Ross CA, Wild EJ (2020) Huntington disease: new insights into molecular pathogenesis and therapeutic opportunities. Nat Rev Neurol 16:529–546. https://doi.org/10.1038/s41582-020-0389-4
Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, Trzaskoma P, Magalska A, Wlodarczyk J, Ruszczycki B, Michalski P, Piecuch E, Wang P, Wang D, Tian SZ, Penrad-Mobayed M, Sachs LM, Ruan X, Wei CL, Liu ET, Wilczynski GM, Plewczynski D, Li G, Ruan Y (2015) CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163:1611–1627. https://doi.org/10.1016/j.cell.2015.11.024
Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C, Hicks B, Heckerman D, Och FJ, Caskey CT, Venter JC, Telenti A (2017) Profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes. Am J Hum Genet 101:700–715. https://doi.org/10.1016/j.ajhg.2017.09.013
Tankard RM, Bennett MF, Degorski P, Delatycki MB, Lockhart PJ, Bahlo M (2018) Detecting expansions of tandem repeats in cohorts sequenced with short-read sequencing data. Am J Hum Genet 103:858–873. https://doi.org/10.1016/j.ajhg.2018.10.015
Thapar R, Wang JL, Hammel M, Ye R, Liang K, Sun C, Hnizda A, Liang S, Maw SS, Lee L, Villarreal H, Forrester I, Fang S, Tsai MS, Blundell TL, Davis AJ, Lin C, Lees-Miller SP, Strick TR, Tainer JA (2021) Mechanism of efficient double-strand break repair by a long non-coding RNA. Nucl Acids Res 49:1199–1200. https://doi.org/10.1093/nar/gkaa1233
Tsuge M, Hamamoto R, Silva FP, Ohnishi Y, Chayama K, Kamatani N, Furukawa Y, Nakamura Y (2005) A variable number of tandem repeats polymorphism in an E2F–1 binding element in the 5′ flanking region of SMYD3 is a risk factor for human cancers. Nat Genet 37:1104–1107. https://doi.org/10.1038/ng1638
Tsutakawa SE, Thompson MJ, Arvai AS, Neil AJ, Shaw SJ, Algasaier SI, Kim JC, Finger LD, Jardine E, Gotham VJB, Sarker AH, Her MZ, Rashid F, Hamdan SM, Mirkin SM, Grasby JA, Tainer JA (2017) Phosphate steering by flap endonuclease 1 promotes 5′-flap specificity and incision to prevent genome instability. Nat Commun 8:15855. https://doi.org/10.1038/ncomms15855
Uversky VN (2020) Functions of short lifetime biological structures at large: the case of intrinsically disordered proteins. Brief Funct Genom 19:60–68. https://doi.org/10.1093/bfgp/ely023
van Ruiten MS, Rowland BD (2021) On the choreography of genome folding: a grand pas de deux of cohesin and CTCF. Curr Opin Cell Biol 70:84–90. https://doi.org/10.1016/j.ceb.2020.12.001
Wan Y, Qu K, Zhang QC, Flynn RA, Manor O, Ouyang Z, Zhang J, Spitale RC, Snyder MP, Segal E, Chang HY (2014) Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505:706–709. https://doi.org/10.1038/nature12946
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucl Acids Res 38:e164–e164. https://doi.org/10.1093/nar/gkq603
Wang E, Thombre R, Shah Y, Latanich R, Wang J (2021) G-Quadruplexes as pathogenic drivers in neurodegenerative disorders. Nucl Acids Res. https://doi.org/10.1093/nar/gkab164
Wen X, Tan W, Westergard T, Krishnamurthy K, Markandaiah SS, Shi Y, Lin S, Shneider NA, Monaghan J, Pandey UB, Pasinelli P, Ichida JK, Trotti D (2014) Antisense proline-arginine RAN dipeptides linked to C9ORF72-ALS/FTD form toxic nuclear aggregates that initiate in vitro and in vivo neuronal death. Neuron 84:1213–1225. https://doi.org/10.1016/j.neuron.2014.12.010
Whitfield TW, Wang J, Collins PJ, Partridge EC, Aldred SF, Trinklein ND, Myers RM, Weng Z (2012) Functional analysis of transcription factor binding sites in human promoters. Genome Biol 13:R50. https://doi.org/10.1186/gb-2012-13-9-r50
Wongsurawat T, Jenjaroenpun P, Kwoh CK, Kuznetsov V (2012) Quantitative model of R-loop forming structures reveals a novel level of RNA-DNA interactome complexity. Nucl Acids Res 40:e16. https://doi.org/10.1093/nar/gkr1075
Wu Q, Liu P, Wang L (2020) Many facades of CTCF unified by its coding for three-dimensional genome architecture. J Genet Genom 47:407–424. https://doi.org/10.1016/j.jgg.2020.06.008
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G (2021) clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (NY) 2: 100141. https://doi.org/10.1016/j.xinn.2021.100141
Xi W, Beer MA (2021) Loop competition and extrusion model predicts CTCF interaction specificity. Nat Commun 12:1–15
Xita N, Chatzikyriakidou A, Stavrou I, Zois C, Georgiou I, Tsatsoulis A (2010) The (TTTA)n polymorphism of aromatase (CYP19) gene is associated with age at menarche. Hum Reprod 25:3129–3133. https://doi.org/10.1093/humrep/deq276
Xu EH, Tang Y, Li D, Jia JP (2009) Polymorphism of HD and UCHL-1 genes in Huntington’s disease. J Clin Neurosci 16:1473–1477. https://doi.org/10.1016/j.jocn.2009.03.027
Xu P, Pan F, Roland C, Sagui C, Weninger K (2020) Dynamics of strand slippage in DNA hairpins formed by CAG repeats: roles of sequence parity and trinucleotide interrupts. Nucl Acids Res 48:2232–2245. https://doi.org/10.1093/nar/gkaa036
Ye Z, Xu S, Shi Y, Bacolla A, Syed A, Moiani D, Tsai CL, Shen Q, Peng G, Leonard PG, Jones DE, Wang B, Tainer JA, Ahmed Z (2021) GRB2 enforces homology-directed repair initiation by MRE11. Sci Adv. https://doi.org/10.1126/sciadv.abe9254
Zhang Y, An L, Xu J, Zhang B, Zheng WJ, Hu M, Tang J, Yue F (2018) Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9:1–9
Acknowledgements
This work received multiple financial supports with details in the section of Funding.
Funding
This work was funded by the National Key Research and Development Program of China (2020YFB0204803), the Natural Science Foundation of China (81801132, 81971190, 61772566), and the Natural Science Foundation of Guangdong (2021A1515010256). J.A.T. was supported in part by National Institutes of Health (NIH) grants P01 CA092584 and R35 CA220430, by the Cancer Prevention Research Institute of Texas (CPRIT) grant (RP180813), and a Robert A. Welch Chemistry Chair. P.D.S., E.V.B., M.M. and D.N.C. acknowledge financial support from Qiagen Inc through a License Agreement with Cardiff University.
Author information
Authors and Affiliations
Contributions
CF and KC performed the analysis and co-wrote the manuscript. YW developed the annotation pipline as well as the DPREx model. EVB, PDS, MM, AB, HK-S, JAT and DNC made suggestions regarding project design, supplied data for analysis, and revised the manuscript. HZ designed the overall research strategy and collected the resources required for the project.
Corresponding author
Ethics declarations
Conflict of interest
The authors are unaware of any conflict of interests or competing interests.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, C., Chen, K., Wang, Y. et al. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 142, 245–274 (2023). https://doi.org/10.1007/s00439-022-02500-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-022-02500-6