Functional significance of Microsatellite Markers

summary. The review summarizes literature data on the positive results of association studies between the length of microsatellite repeats and predisposition to pathologies. Actually, the data can be classified according to the localization of the microsatellite: in the gene promoter, in the part of exon 1 coding the signal sequence, in gene introns, in the coding areas of genes, and in 3 ' -untrans-lated regions. The functional significance of microsatellite length changes can be evaluated in many cases. The authors came up to the conclusion that further studies on microsatellite associations with diseases remain prospective as they reflect changes in the gene functional activity.


Introduction
Microsatellites (MSs) are repeating sequences of 2-6 base pairs of DNA (1). They are used as molecular markers in genetics for kinship, population, and other studies. They can also be used for the studies of gene duplication or deletion, marker-assisted selection, and fingerprinting. Microsatellites are distributed throughout the genome (1). Being variable genetic elements, microsatellites provide a potent tool for the individual characterization of genomes. Variability is generated due to replication slippage caused by mismatches between DNA strands while being replicated during meiosis (2), and the event can occur once per 1000 generations (3). This slippage is much more common compared with point mutations (4). Microsatellite repeats mutagenize human genomes and alter the human genomic landscape across generations (5). The utility of microsatellites has been demonstrated by the study comprising 2058 germline changes discovered by analyzing 85 289 Icelanders at 2477 microsatellites. The paternal-to-maternal mutation rate ratio is 3.3, and the rate in fathers doubles from the age 20 to 58, whereas there is no association with age in mothers. Longer microsatellite alleles are more mutagenic and tend to decrease in length, whereas the opposite is seen for shorter alleles (6).
Microsatellites remain highly informative and useful measures of genomic variation for linkage and association studies despite the fact that general preference is given to single-nucleotide polymorphisms (SNPs). Microsatellites are much more genetically diverse compared to SNPs; they generate a greater haplotype diversity (7).
Although mostly used as structural genetic markers, microsatellites perform several functions in the genome, which are still far from being completely understood. However, actually it has become clear that MSs and their flanking regions are involved in multiple gene and genome functions.
Expansions of microsatellite DNA repeats cause nearly 30 developmental and neurological inherited disorders (19). Further, we shall try to summarize and classify the data about the association between MSs and human diseases in connection to the localization of repeats in the genes and their possible functional role. Triplet expansion diseases will be excluded from our analysis, as the problem has been extensively reviewed (20,21).
Promoter Microsatellites. MSs can determine the activity of the upstream gene regulation elements like the locus control region of the beta-globin gene domain. The (AT)(8)N(12)GT(AT)(7) configuration of a microsatellite found in the hypersensitive site of the structure is associated with a special form of sickle cells, Tunisian βs chromosomes (22). The most common allele of the MS marker in STAT4, the STAT4-MS1-254 allele, located in the 5' flanking region of the gene, is significantly associated with sarcoidosis (23). Changes in the length of microsatellites within promoters and other cis-regulatory regions can also change the level of gene expression, and they are linked to abundant variations in cis-regulatory control regions in the human genome (24). For example, a CA-repeat mi-crosatellite in the insulin-like growth factor 1 promoter is associated with the level of this growth factor. It has turned out that the intensity of the gene transcription is regulated by the interaction of several SNPs and microsatellite-generating haplotypes with lower or higher levels of the gene transcription (25). Promoter microsatellites tend to be guaninecytosine rich; they are often found at the start of genes and are probably associated with the regulatory elements such as CpG islands, G-quadruplexes (G4), and untranslated regulatory regions. Numerous promoter microsatellites possess the potential to influence human phenotypes by generating mutations in regulatory elements, which may ultimately lead to a disease (26). A CpG-CA repeat within the human endothelin-converting enzyme 1 promoter is highly polymorphic, harbors transcriptional start sites, is able to recruit the transcription factors and poly(ADP-ribose) polymerase-1 and splicing factors, and is functional regarding haplotype-specific promoter activity. The overall CpG-CA repeat composition of patients with Alzheimer's disease and nondemented control individuals has been found to be distinct (27). A length polymorphism of GT repeats in the promoter region of the human heme oxygenase-1 (HO-1) gene modulates the transcription of this gene (28). Numerous studies have linked human HO-1 gene promoter polymorphisms to a risk of vascular diseases (29). Persons carrying longer (GT)(n) repeats in the HMOX1 gene (L allele) promoter may be at a higher risk of type 2 diabetes mellitus (30). Functional analyses have shown that the persons with impaired glucose regulation and type 2 diabetes mellitus, carrying the L/L (GT) (n) genotype, have significantly lower HO-1 protein expression levels than those with the S/S genotype (31). The same microsatellite is associated with susceptibility to cardiovascular complications of the disease. Patients with longer lengths of GT repeats in the heme oxygenase-1 gene promoter exhibit higher inflammation and oxidative stress. These patients have a higher risk of long-term cardiovascular events and mortality (32). A short allele of the same microsatellite might be associated with an abdominal aortic aneurysm (33). Long (GT)n repeats in the microsatellite polymorphism region of the HMOX1 gene have been reported to be associated with symptomatic malaria (34).
The aldose reductase (AKR1B1) gene promoter harbors a (CA)n microsatellite significantly associated with diabetic retinopathy. The z-2 microsatellite has been found to confer risk in type 1 and type 2 diabetes mellitus and the z+2 microsatellite to confer protection against diabetic retinopathy in type 2 diabetes mellitus regardless of ethnicity (35,36). The S allele of a ((CCTTT)(n) repeat in the promoter of the NOS2 gene is associated with both hypertension and responsiveness to antihypertensive drug therapy (37). The same microsatellite is also associated with diabetic retinopathy, as well as the (GT)n promoter repeat in the tumor necrosis factor β gene (36).
Promoter MSs might also be associated with mental problems; the promoter TA microsatellite repeat in the estrogen receptor alpha gene is significantly associated with postpartum depression (38). The arginine vasopressin receptor 1A gene (AVPR1A) is widely expressed in the brain and is considered to be a key receptor in the regulation of social behavior. 5'-Flanking region polymorphisms in the human AVPR1A, RS3, and RS1 show differences in relative promoter activity by length. Shorter repeat alleles of RS1 and RS3 have decreased relative promoter activity in the human neuroblastoma cell line SH-SY5Y. The short alleles of RS1 are associated with autism (39).
A recent meta-analysis by Shen et al. has reported that CYP11A1 promoter microsatellite [TTTA] n repeat polymorphisms may contribute to increasing susceptibility to the risk of polycystic ovary syndrome (40).
Signal Sequence Microsatellites. Some microsatellites are localized in translated areas of the genes. For example, the carnosinase gene contains a D18S880 microsatellite formed of a leucine triplet repeat in its signal sequence. Homozygotes for 5 trinucleotide repeats in this microsatellite are susceptible to diabetic nephropathy (41). The human signal transducer and activator of transcription 6 (STAT6) gene represents one of the most promising candidate genes for asthma and other inflammatory diseases on the chromosomal region 12q13-q24. The gene exon 1 contains a GT repeat upstream the first methionine codon. Allele A4 of the GT repeat polymorphism is associated with an increase in the eosinophil cell count (42). The genotype of 13/15-GT repeat allele heterozygosity is significantly associated with allergic subjects (43).
Microsatellites of Coding Regions. Besides trinucleotide expansion diseases, characterized mostly by polyglutamine tracts (poly-Q), which cannot be analyzed here due to space limitations, an interesting trinucleotide repeat has been identified in the MIC-A gene. The exon 5 microsatellite polymorphism of the MIC-A gene consists of 5 alleles based on the number of GCT triplet repeat units (alleles A4, A5, A6, and A9) and the presence of an additional nucleotide insertion (allele A5.1). CGT repeats regulate the number of Ala residues in the protein, and the A5.1 leads to a frameshift mutation. The exon encodes the membrane-binding domain of the protein (44). The microsatellite alleles are associated with Addison's disease (44) and type 1 diabetes mellitus (45)(46)(47). Some alleles are protective against juvenile idiopatic arthritis (48). Variations of CAG (Gln) repeats in the androgen receptor gene in physiological limits, not causing insensitivity to androgens, can influence certain physiological parameters. Shorter androgen receptor (AR) CAG is associated with low HDL-C and testosterone levels (49).
Intronic Microsatellites. Microsatellites within introns also influence a phenotype, through ways that are not currently understood; this is the cause of numerous associations of microsatellite repeat polymorphisms with human diseases. For example, a GAA triplet expansion in the first intron of the X25 gene appears to interfere with transcription and causes Friedreich ataxia (50). Subjects having more CA repeats in the first intron of the type 2 11β-hydroxysteroid dehydrogenase gene (HSD11B2) are susceptible to developing abnormal glucose tolerance (51). A repeat polymorphism in the fourth intron of the NOS3 gene is linked to hypertension (52). We have detected an association between type 2 diabetes mellitus and microsatellite markers of the region 14q13 localized in the introns of the PSMA6 and KIAA0391 genes, rs63749745, rs71444202, and rs34580276 (53).
Three microsatellite loci, i.e., (ATCC)n1, D1S1621, and (ATCC)n2, in the DISC1 gene show a significant association with schizophrenia. The microsatellites occur in intronic sequences in the vicinity of a critical splice junction that gives rise to the expression of the DISC1 isoforms (54).
Intronic microsatellite polymorphisms determine susceptibility to certain neoplasias. For example, polymorphisms in the CT dinucleotide repeat in intron 3 of the transcription factor GATA3 gene are associated to a certain extent with the risk of breast cancer, i.e., women who carry 17-CT or 18-CT alleles of the GATA3 gene are at a lower risk of developing breast cancer (55). The polymorphic dinucleotide CA tandem repeat (ESR2_CA), located in intron 5 of the estrogen receptor gene 2 gene ESR2 (14q23.2), is associated with the risk of breast cancer in African women (56). Intronic D19S884 marker A7 allele of the fibrillin 3 gene is associated with polycystic ovary syndrome (57).
Intronic microsatellites repeats are implicated in the pathogenic mechanisms of several autoimmune diseases. The SLC26A4 gene, involved in the genetic susceptibility of autoimmune thyroid disease, harbors 2 microsatellites in introns 10 and 20, and longer alleles of these markers appear to be associated with Hashimoto thyroiditis (58). The intronic rs63749745 marker of the 14q13 locus has manifested a high level of association with Graves' disease (59), and rs71928782, rs5807818, rs71444202, and rs345802276 have been found to be in association with juvenile idiopathic arthritis in children (60).
Polymorphisms present in the first intron of IFN-γ may have an important role in the regulation of the immune response, which could have functional consequences for gene transcription. The microsatellite encoding 16 CA repeats has been shown to be significantly associated with the paucibacillary form of lepra compared with multibacillary patients (61). The microsatellite marker IFNGR2-MS1, located in the 50-upstream region of the interferon gamma receptor 2 gene (IFNGR2), shows a significant association with tuberculosis (62). One allele of the D6S1276 microsatellite in intron 1 of the BMP5 gene is associated with the risk of osteoarthritis in women; 2 alleles are protective (63).
3'-UTR Microsatellites. Microsatellites localized in the 3'-untranslated regions (3'-UTRs) may affect the final mRNA stability, the localization, the export from the nucleus and the translation efficiency. The androgen receptor CAG repeat polymorphism (AR CAG) affects receptor transcriptional activity (the shorter repeats, the more sensitive AR) and is associated with androgenic parameters and obesity (49). The conserved regulatory sequences within the 3'-UTRs and the specific elements binding to them enable gene expression control at the posttranscriptional level, and all these processes reflect the actual state of the cell (64). Shorter alleles of the microsatellites in the 3' flanking region of the leptin gene, coding for a protein hormone, mainly synthesized in adipocytes, which regulates the food intake and energy expenditure of the body, are significantly associated with hypertension (65). Reduced repeat lengths in the EGFR gene 3'-UTR polyA repeat are linked with osteosarcomas (66). Several alleles of the microsatellite (AT)n in the 3'-UTR of anticytotoxic T lymphocyte antigen-4 (CTLA-4) gene, namely 104-, 106-, 110-, and 116-bp alleles, were observed to be predisposing to recurrent miscarriage (67).
Remote and Locus-Specific Microsatellites. In some cases, the association with diseases is found for microsatellites localized far from the candidate genes. Marker D12S96 is localized 5.653 cM downstream the vitamin D receptor (VDR) gene. Despite this long distance and obscure functional relations, statistically significant linkage disequilibrium has been detected between allele 22 of locus D12S96 and osteoporosis (68). In some cases, the association between microsatellites and candidate genes is not traced at all as these are sooner the locus than gene markers. The 8p21-23 region microsatellites D8S136 and D8S520 are consistently and strongly related with prostate cancer (69). The locus has been traced due to a frequent loss of heterozigocity in tumors, but not as a result of association studies. The D1S2726 microsatellite, located 30 kb from the KCNA3 gene, which encodes the voltage-gated potassium channel Kv1.3, is associated with susceptibility to autoimmune pancreatitis (70).

Conclusions
The above data clearly indicate that most microsatellites manifesting the association with human pathologies are harbored in genes encoding enzymes involved in pathogenesis of the pathologies; in many cases, the impact of the changes in the microsatellite length on the gene function can be evaluated. Thus, further studies on microsatellite associations with diseases remain prospective, despite numerous whole-genome association studies. Contribution of studies with individual polymorphisms to the understanding of genetic background of diseases should not be underestimated.

acknowledgments
The costs of this work were covered from the European Regional Development Foundation project No. 2010/0315/2DP/2.1.1.1.0/10/APIA/VIAA/026 and the scientific co-operation project No. 10. 0010 of the Latvian Council of Sciences "Genetic Study of Susceptibility to Diseases and Ageing in the Latvian Population" subproject "Wide Scanning of the Proteasomal Gene Polymorphism in the Latvian Population and Its Association With Autoimmune Diseases."

statement of Conflict of Interest
The authors state no conflicts of interest.