Compilation and functional classification of telomere length-associated genes in humans and other animal species

Telomeres are the terminal regions of chromosomes that ensure their stability while cell division. Telomere shortening initiates cellular senescence, which can lead to degeneration and atrophy of tissues, so the process is associated with a reduction in life expectancy and predisposition to a number of diseases. An accelerated rate of telomere attrition can serve as a predictor of life expectancy and health status of an individual. Telomere length is a complex phenotypic trait that is determined by many factors, including the genetic ones. Numerous studies (including genome-wide association studies, GWAS) indicate the polygenic nature of telomere length control. The objective of the present study was to characterize the genetic basis of the telomere length regulation using the GWAS data obtained during the studies of various human and other animal populations. To do so, a compilation of the genes associated with telomere length in GWAS experiments was collected, which included information on 270 human genes, as well as 23, 22, and 9 genes identified in the cattle, sparrow, and nematode, respectively. Among them were two orthologous genes encoding a shelterin protein (POT1 in humans and pot-2 in C. elegans). Functional analysis has shown that telomere length can be influenced by genetic variants in the genes encoding: (1) structural components of telomerase; (2) the protein components of telomeric regions (shelterin and CST complexes); (3) the proteins involved in telomerase biogenesis and regulating its activity; (4) the proteins that regulate the functional activity of the shelterin components; (5) the proteins involved in telomere replication and/or capping; (6) the proteins involved in the alternative telomere lengthening; (7) the proteins that respond to DNA damage and are responsible for DNA repair; (8) RNA-exosome components. The human genes identified by several research groups in populations of different ethnic origins are the genes encoding telomerase components such as TERC and TERT as well as STN1 encoding the CST complex component. Apparently, the polymorphic loci affecting the functions of these genes may be the most reliable susceptibility markers for telomere-related diseases. The systematized data about the genes and their functions can serve as a basis for the development of prognostic criteria for telomere length-associated diseases in humans. Information about the genes and processes that control telomere length can be used for marker-assisted and genomic selection in the farm animals, aimed at increasing the duration of their productive lifetime


Introduction
Telomeres are the terminal regions of chromosomes that en sure their stability and represented by evolutionary conserved tandemly repeated DNA sequences (e. g., a hexanucleotide TTAGGG repeat in vertebrates) of several kb in length (Pod levsky, 2008;Monaghan, Ozanne, 2018). For example, their lengths in humans at birth are 10-15 kb (Jafri et al., 2016). 3ʹ terminal end of a telomere is a single-stranded guanine-rich DNA region (150-200 nucleotides), whose end interacts with the doublestranded region to form the socalled Tloop at the telomere end. Tloop formation and stabilization are ensured by a shelterin complex (Fig. 1). This structure prevents re cognition of a chromosome terminal region by repair proteins (de Lange, 2018).
DNA polymerase is unable to fully replicate the 3ʹ-end of a linear DNA during cell division, which leads to a loss of 50-200 nucleotides of the telomeric sequence at each cell division (Fan et al., 2021). Telomere shortening can also be facilitated by other factors and processes (Suppl. Material 1) 1 , such as oxidative stress, inflammation, UV irradiation, effects of toxic agents, DNA replication errors, etc. (Aviv, Shay, 2018;Monaghan, Ozanne, 2018). These factors are likely to produce different effects depending on cell types and the organism's development stage and species (Monaghan, Ozanne, 2018).
Telomere shortening initiates cellular senescence. Activa tion of DNA damage response signaling pathways results in a cell cycle arrest, which may in turn lead to apoptosis and eventually -to progressive tissue degeneration (Jafri et al., 2016;Aviv, Shay, 2018;Monaghan, Ozanne, 2018). The data collected at the cellular level in vitro and in model organisms lay the foundation for using the telomere length as a predictor of life expectancy and health status of an individual. Indeed, studies in humans (Crocco et al., 2021), mice (Vera et al., 2012), sheep (Wilbourn et al., 2018), cattle (Seeker et al.,1 Supplementary Materials 1-11 are available in the online version of the paper: http://vavilov.elpub.ru/jour/manager/files/Suppl_Ignatieva_Engl_27_3.pdf. 2021), wild birds (Bichet et al., 2020), and other animals have shown that shorter telomere length may be associated with reduced life expectancy. The studies in humans discovered an association between the telomere length and cardiovascular diseases, cancer, diabetes, inflammation, and other pathologi cal states (Kong et al., 2013;Jafri et al., 2016;Aviv, Shay, 2018).
Telomere shortening is prevented by telomerase, a spe cialized ribonucleoprotein complex acting as a reverse tran s crip tase. In humans, telomerase is active in almost all the cancer cells studied (Jafri et al., 2016), in blastocyst, in most somatic tissues at 16-20 weeks of development (except for brain cells), and ovary and sperm cells at all ontogenetic stages (except for mature spermatozoids and oocytes) (Wright et al., 1996).
Telomerase activity is controlled by the proteins regulat ing expression of telomerase components, their movement to various cell compartments, processing, and assembly as well as by the proteins maintaining stability of the telomerase complex or, on the contrary, activating its degradation (Egan, Collins, 2012;Tseng et al., 2015;Schrumpfová, Fajkus, 2020). The main stages of telomerase biogenesis are presented in Figure 2. The examples of proteins affecting the telomerase activity are presented in Suppl. Material 2. In addition, the telomerase activity is also affected by the shelterin (Diotti, Loayza, 2011;de Lange, 2018) and CST complex (Fig. 3) (Chen et al., 2012).
Telomere length is a complex phenotypic trait determined by multiple factors including genetic ones. The metaanalysis of heritability data for this trait performed in eighteen verte brate species showed the averaged heritability index of 45 %. The studies showed the value of 52 % in humans, 42 % -in Holstein cattle breed, 35 % -in hamadryas baboons, and 5 % -in sheep (Chik et al., 2022).
The problem of genetic basis of telomere length regulation is of interest for many researchers. The information on telo Shelterin complex   ACD/ TPP1  TINF2/ TIN2  POT1   TERF1 TERF2   TERF2IP/Rap1 T-loop Telomeric DNA is presented as a T-loop reconstructed following the black-and-white illustration from Fan and co-workers (2021); nucleotide sequences in the DNA strands are not shown (Fan et al., 2021). The top left is simplified illustration of the relative positions of shelterin subunits, following the description from Jafri and co-workers (2016). Since the ACD/TPP1 and POT1 are much less abundant in nuclei (de Lange, 2018), some shelterin complexes are depicted without these subunits. D-loop is a structure, where two strands of the double-stranded DNA are separated, and one of them connects with the third DNA strand (a single-stranded 3'-end of the telomeric DNA region). The names of proteins corresponding to the human genes associated with telomere length according to the GWAS data are underlined. merase components and proteins involved in telomere length regulation (including 20 proteins identified in mammals) can be found in The Telomerase Database (http://telomerase.asu. edu/) (Podlevsky et al., 2008). Joyce and coworkers (2018) presented a set of 80 human genes with telomererelated func tions (Joyce et al., 2018). The GWAS data also indicate a polygenic nature of telomere length heritability. For instance, the GWAS Catalog (https:// www.ebi.ac.uk/gwas/) cites 99 human genes that either include or neighbor the telomere lengthassociated allelic variants. One of the largest GWA studies presents the data on 138 hu man genomic loci, whose allelic variants are associated with telomere length (Codd et al., 2021).

БИОИНФОРМАТИКА / BIOINFORMATICS
In addition to the telomere length association data gathered using GWA studies in various human population samples, the data obtained in other animal species also appear to be of interest. However, these studies are rather scarce and are available only for Holstein-Friesian cattle (IlskaWarner et al., 2019), house sparrow nestlings (Pepke et al., 2021), and C. elegans (Cook et al., 2016). The objective of this review was to characterize the genetic basis of telomere length regulation using the GWAS data collected in various human populations and to compare them with the results of similar experiments in other animal species. For this purpose, (1) the data on genes identified in GWAS telomere length experiments were systematized; (2) functional annotation of genes was performed, and the set of biological processes affecting telomere length was identified.

Materials and methods
The data on telomere lengthassociated genes were obtained from the papers available in the PubMed database (https:// pubmed.ncbi.nlm.nih.gov/) using such keywords as 'telomere length' and 'GWAS'. Functional annotation of genes was performed using information obtained from the papers present ing GWAS data, PubMed, The Telomerase Database (http:// telomerase.asu.edu/) queries, and the DAVID knowledgebase (https://david.ncifcrf.gov/) (Sherman et al., 2022).

Human genes identified through GWA studies
PubMed queries produced 18 scientific papers presenting the results of identifying telomere lengthassociated polymorphic loci in human genome based on GWAS data. These papers were analyzed, and the data on 270 telomere lengthassociated genes were collected (Suppl. Material 3). Most genes (262) were identified in European-ancestry population samples, the data on 15 genes were obtained from the studies in Southeast Asian population samples (natives of China, Bangladesh, and India), five genes were identified as a result of trans-ethnic metaanalysis of Singaporean Chinese and European ancestry data (Dorajoo et al., 2019), and one gene was found in African Americans (Zeiger et al., 2018).
The data on functional significance in the context of telomere length regulation were presented by the authors of GWA studies for 52 genes out of 270 (see Suppl. Material 3). The fact that the data on gene significance in the context of telomere length regulation were unavailable for a number of loci reflects the capabilities and limitations of GWAS methodology. Most loci identified by GWAS and associated with the trait of interest are located in intergenic regions. As a rule, in these cases, the set of candidate genes includes the nearest genes, whose functional significance is often difficult to interpret. To identify the mechanisms and genes, through which intergenic variants affect the studied traits, additional experiments are required. For example, it was shown that TtoC substitution of rs1421085 in the intron of FTO gene affects the expression of IRX3 and IRX5, whose transcription start sites are far away (~520 and ~1160 kb) from rs1421085 (Claussnitzer et al., 2015).

Main functional groups of human telomere length-associated genes
A functional classification was performed for a set of 52 hu man genes for which there was information about their func tional significance in the context of telomere length regulation (Suppl. Material 4). As a result, several functional groups of genes have been identified (Fig. 4): Genes encoding telomerase components: TERC is the telomerase RNA component acting as a matrix for DNA strand extension at the telomere end and TERT is a reverse tran scriptase enzyme subunit (Egan, Collins, 2012;Tseng et al., 2015).
Genes encoding CST proteins: CTC1, STN1, TEN1. CST complex acts as a telomerase negative regulator at the late S to G2 phase of the cell cycle (see Fig. 3) (Chen et al., 2012).
Genes encoding subunits of RNA exosomes: EXOSC6, EXOSC9 (Codd et al., 2021) and MPHOSPH6 (Dorajoo et al., 2019). These proteins are functionally significant, because it is known that TERC may be subjected to 3ʹ-processing, and the RNAexosomes are involved in this process (Tseng et al., 2015).

Human candidate genes identified in more than one study
As mentioned above, we have analyzed 18 papers on identify ing telomere lengthassociated human genome loci based on GWA studies and collected the data on 270 such genes (see Suppl. Material 3). Notably, only 16 genes were identified in at least two studies (Fig. 5).
The most frequently identified genes were the ones encod ing both telomerase components (TERC and TERT ) and STN1 encoding a component of the CST complex (revealed in 7, 5, and 7 studies, respectively). Three genes POT1, TERF1, and TERF2 encoding components of the shelterin complex were mentioned in 4, 3, and 2 publications, respectively. Three more genes DCAF4, RTEL1, and NAF1 controlling the DNA da mage response, telomere replication, and telomerase biogenesis were identified in four studies. ATM, PARP1, MPHOSPH6, RFWD3, SENP7, and TYMS were identified in 3 or 2 papers.

Identification of the genes related to telomere length regulation according to DAVID
Using DAVID, we found the terms from the GOTERM_BP_ DIRECT dictionary that were significantly (FDR < 0.05) as sociated with the list of 270 human genes presented in Suppl. Material 3. Sixteen terms indicating biological processes that directly control telomere length are presented in Suppl. Ma terial 6, and the remaining fifteen terms are listed in Suppl. Material 7. There were 30 genes associated with the terms from the first group (see Suppl. Material 6), with two of them

БИОИНФОРМАТИКА / BIOINFORMATICS
Telomere length-associated animal genes (see Suppl. Materials 9-11 for additional data) (SIRT6 and TP53) previously not recognized as biologically interpretable (these genes were presented in (Codd et al., 2021) without comment on their functional significance in the context of telomere length regulation). The analysis of scientific papers showed that proteins encoded by both genes can function in the subtelomeric regions of chromosomes (Tennen et al., 2021;Tutton et al., 2016), which means they could be indirectly involved in telomere length regulation. Then, the genes associated with the second group of GOTERM_BP_DIRECT terms identified at FDR < 0.05 (see Suppl. Material 7) were analyzed. Among them, 29 genes were found that had no biological interpretation (highlighted in red in Suppl. Material 7, and listed in Suppl. Material 8). This group of 29 genes included the above mentioned SIRT6 and TP53, as well as BRCA1, SAMHD1, and BRCC3 asso ciated with the maximum number of GO terms (six, four, and four, respectively). Apparently, the genes from the list thus obtained may also be of interest in the context of telomere length regulation.

Telomere length-associated genes found in other animal species
Genomewide search for telomere lengthassociated loci and genes was carried out in three animal species: cattle (Bos taurus), sparrows (Passer domesticus), and nematodes (Caenorhabditis elegans).
A GWA study to investigate the species Bos taurus was carried out on 702 animals of the Holstein-Friesian breed (IlskaWarner et al., 2019). The study of the DNA isolated from the whole blood of cows sampled at birth showed six telomere lengthassociated polymorphic loci, and three addi tional loci were identified when analyzing the DNA from blood samples at the first lactation. An analysis of the quantitative trait loci (QTL) corresponding to the identified genetic variants revealed 14 candidate genes at birth and 9 at the first lacta tion (see the Table and Suppl. Material 9). The authors were unable to find any data on direct involvement of the identified genes in processes associated with telomere length regulation.
NUP93 nucleoporin gene encoding a nuclear pore component was considered a potential regulator, because it was shown earlier in yeast that nucleoporins facilitated silencing of genes in proximity of telomeric regions ( Van de Vosse et al., 2013).
The recently published results of GWA study (Pepke et al., 2021) in house sparrow (Passer domesticus) nestlings made it possible to identify 22 candidate genes (see the Table and Suppl. Material 10). According to the authors, the genes of interest in the context of telomere length regulation seem to be as follows: (1) WNT9B encoding a protein component of Wnt/ β-catenin signaling pathway due to β-catenin involvement in Tert activation in embryonic stem cells of mice; (2) CDCA4, GH, and GHRHR regulating cell proliferation, apoptosis, and body growth; (3) RHOF involved in cytoskeletal organization; (4) RNF34 (E3 ubiquitinprotein ligase RNF34) regulating ubiquitination; (5) AQP1 due to involvement of aquaporin protein in transport of nitrogen oxide and active forms of oxy gen, which increases oxidative stress which can in turn affect telomerase activity; (6) SCN4A, because its expression in human stem cells correlates with telomere length.
Our analysis showed that none of the candidate genes identi fied in cattle (23 genes) and house sparrows (22 genes) (see the  Table) had orthologs among the 270 genes identified based on GWA studies in humans and presented in Suppl. Ma te rial 3.
The study in C. elegans (Cook et al., 2016) produced 9 candidate genes (see the Table and Suppl. Material 11). One out of nine genes, pot-2, is orthologous to POT1 encod ing a shelterin complex component in humans. The authors assume that another gene ZK1127.4 may also be involved in telomere length regulation, because BCCIP encoded by an orthologous human gene interacts with BRCA2 involved in DNA replication.
In general, when comparing sets of candidate genes iden tified in humans and three other animal species, almost no orthologous genes are detected, which may be due to species specific features of telomere length regulation, some peculiari ties of regulation at various ontogenetic stages, and differences in sampled tissues or gender of the studied individuals.

Conclusions
In the present paper, a compilation of telomere lengthasso ciated genes identified based on GWA studies and including the data on 270 human genes (see Suppl. Material 3), as well as 23, 22, and nine genes identified in cattle, house sparrow, and nematode (see the Table) is presented. The analysis of functions of 52 human genes with functional interpretation available (see Fig. 4, Suppl. Material 4) showed that telomere length may be affected by variants of genes encoding: (1) the structural components of telomerase; (2) the protein compo nents of telomeric chromosome regions (shelterin complex and CST complex); (3) the proteins involved in telomerase biogenesis and regulating its activity; (4) the proteins regulat ing functional activity of shelterin subunits; (5) the proteins involved in telomere replication and/or capping; (6) the pro teins controlling the alternative telomere lengthening pathway; (7) DNA damage response and repair proteins; (8) RNA exo some components.
Candidate human genes identified by several research groups in population samples of different ethnic origin are determined: genes encoding telomerase components (TERC and TERT ) and STN1 encoding a subunit of CST complex (see Fig. 5). It seems that polymorphic loci that affect the functions of these genes can potentially be the most reli able predisposition markers for telomere lengthassociated diseases.
Comparison of the data obtained from GWA studies in humans (see Suppl. Material 3) with the results of similar experiments obtained for other animal species (see the Table  and Suppl. Materials 9-11) confirmed and expanded the under standing of the complex polygenic nature of telomere length regulation. In addition, a pair of orthologous genes encoding a shelterin protein (POT1 in humans and pot-2 in C. elegans) was identified; this finding demonstrates the high biological significance of this gene in various species.
Systematized data on genes and their functions may lay the foundation for development of prognostic criteria for human pathologies explicitly associated with telomere length. In addition, the data on biological processes affecting telomere length and genes regulating these processes may be used for markerassisted and genomic selection of the farm animals aimed at increasing the duration of their productive lifetime.