Computational DIO2 rSNP analysis, transcriptional factor binding sites and disease.

Purpose: The DIO2 gene transcribes the deiodinase type 2 enzyme that changes the thyroid prohormone, thyroxine (T4), to the biologically active triiodothyronine (T3) hormone. T3 plays a vital part in the regulation of energy balance and glucose metabolism. DIO2 single-nucleotide polymorphisms (SNPs) were computationally examined with respect to changes in punitive transcriptional factor binding sites (TFBS) and these changes were discussed in relation to human disease. Methods: The JASPAR CORE and ConSite databases were instrumental in identifying the TFBS. The Vector NTI Advance 11.5 computer program was employed in locating all the TFBS in the DIO2 gene from 2.4 kb upstream of the transcriptional start site to 508 bp past the 3’UTR. The JASPAR CORE database was also involved in computing each nucleotide occurrence (%) within the TFBS. Results: Regulatory SNPs (rSNPs) in the promoter region novel SNP (-2035bp), 5’UTR (rs12885300), intron one (rs225010, 225011 and rs225012), exon two [rs225014 (Thr92Ala)] and 3’ UTR (rs6574549, rs225015 and rs225017) of the DIO2 gene are in linkage disequilibrium. These rSNP alleles were found to alter the DNA landscape for potential transcriptional factors (TFs) to attach resulting in changes in TFBS. Conclusion: The alleles of each rSNP were found to generate unique TFBS resulting in potential Short Research Article Buroker; BJMMR, 9(4): 1-24, 2015; Article no.BJMMR.18535 2 changes in TF DIO2 regulation. These regulatory changes were discussed with respect to changes in human health resulting in disease or sickness.


INTRODUCTION
The type 2 deiodeinase gene (DIO2) encodes a deiodinase that coverts the thyroid prohormone, thyroxine (T4), to the biologically active triiodothyronine (T3) hormone where T3 is involved in the vital role of regulating energy balance and glucose metabolism [1][2][3][4]. DIO2 is found in the thyroid gland, cardiac and skeletal muscle, brown adipose tissue, placenta, pituitary, central nervous system (CNS) and at low levels in kidney and pancreases [5][6][7]. The DIO2 gene maps to human chromosome 14q24.3 and is about 15 kb in size. The coding region consists of two exons separated by a gap of approximately 7.4 kb [8]. Several single nucleotide polymorphisms (SNPs) have been found in the gene which have been studied is association with mental retardation (MR) [9], osteoarthritis [10], early-onset type 2 diabetes mellitus (T2DM) [11] and insulin resistance (IR) [12,13]. Three of the common SNPs in the gene (rs225014, rs225012 and rs225010) have been found to be in strong linkage disequilibrium (LD) with each other while the rs225012 and rs225010 SNPs have been shown to have a positive association with MR [9]. The haplotypes of two SNPs (rs225014 and rs12885300) have been shown to have a significant association with symptomatic osteoarthritis in Dutch women [10]. Three SNPs (rs225011, rs225014 and rs225015) which are in LD were found to be modestly associated with early-onset of T2DM in Pima Indians while these SNPs and rs6574549 were found to be nominally associated with hepatic glucose output [11]. Two of the SNPs (rs225014 and rs225017) which are in partial LD have been found to be associated with IR in Caucasian T2DM patients [13]. The rs6574549 SNP was also found to be associated with fasting insulin, insulin action and energy expenditure [11]. These studies suggest that some DIO2 SNPs may be affecting the regulatory network for the gene expression in humans. When LD is found between SNPs in a gene's regulatory region, it can result from strong associations of certain haplotypes with sickness or disease [14][15][16]. Consequently, a computational examination was made between DIO2 SNPs in LD and the transcription factor binding site (TFBS) changes resulting from the SNPs. In this report LD is considered to be the non-random association of SNP alleles within the gene.
Nucleotide changes that influence gene expression by altering gene regulatory sequences such as in promoters, enhances, and silencers are known as regulatory SNPs (rSNPs) [17][18][19][20]. A rSNP within a transcriptional factor binding motif can alter a transcriptional factor's (TF) ability to bind the motif [21][22][23][24] in which case the TF would not effectively regulate the gene [25][26][27][28][29]. This concept is examined for the above DIO2 rSNPs and their allelic association with TFBS, where computation analyses [30][31][32][33] was used to identify TFBS alterations created by the DIO2 rSNPs. In this study, the rSNP associations with nucleotide substitutions in punitive TFBS are examined with their possible relationship to disease in humans.

METHODS
The JASPAR CORE database [34,35] and ConSite [36] were used to identify the potential DIO2 TFBS in this study. JASPAR is a database of transcription factor DNA-binding preferences used for scanning genomic sequences where ConSite is a web-based tool for finding cisregulatory elements in genomic sequences. The TFBS and rSNP location within the binding sites have previously been discussed [14,16,37]. The Vector NTI Advance 11.5 computer program (Invitrogen, Life Technologies) was used to locate the TFBS in the DIO2 gene (NCBI Ref Seq NM_013989) from 2.4 kb upstream of the transcriptional start site to 508 bp past the 3'UTR which represents a total of 16.9 kb. The JASPAR CORE database was also used to calculate each nucleotide occurrence (%) within the TFBS, where upper case lettering indicate that the nucleotide occurs 90% or greater and lower case less than 90%. The occurrence of each SNP allele in the TFBS is also computed from the database (Table & Supplement).

DIO2 rSNPs and TFBS
The DIO2 gene transcribes the deiodinase type 2 enzyme that changes the thyroid prohormone, thyroxine (T4), to the biologically active triiodothyronine (T3) hormone. The thyroid hormones play an important role in energy homeostasis and glucose metabolism. Due to the importance of this gene in energy homeostasis, DIO2 SNPs associated with disease were computationally evaluated with regard to TFBS. The novel -2035bp SNP is located 5' upstream from the TSS, the rs12885300 SNP is located in the 5'UTR and the rs225010, rs225011 and rs225014 SNPs are found in intron one. The rs224014 (Thr92Ala) SNP is located in exon two while the rs674549, rs225015 and rs225017 SNPs are located in the 3' UTR. The novel -2035bp, rs225011, rs225014, rs674549 and rs225015 SNPs are all in LD with each other [11]. The rs225010 and rs225012 SNPs are also in LD but not with the other SNPs [9]. The novel -2035bp and rs6574549 SNPs have very rare alleles with a frequency of 0.004 and 0.007, respectively. Since the minor allele frequencies (MAF) of the other SNPs are rather large ranging from 0.229 to 0.421, the minor alleles that alter BS which give rise to different TFs would be expected to have an impact on DIO2 regulation (Table).
The DIO2 SNPs (rs225010 and rs225012) which are in LD have been found to be significantly associated with MR in Chinese [9]. The common rs225010 SNP DIO2-C allele creates six unique TFBS for the ELF5, ELK1, GATA2, GATA4, JUN:FOS and SREBF1 TFs, which are involved with the ETS transcriptional factor family, the rasraf-MAPK signaling cascade, the proliferation of hematopoietic and endocrine cell lineages, myocardial differentiation and function, steroidogenic gene expression, and lipid homeostasis, respectively (Table, supplement Two DIO2 SNPs (rs225014 and rs12885300) have been shown to have a significant association with symptomatic osteoarthritis in Dutch women [10] while a third SNP (rs225017) has been found to be significantly associated with IR [13]. The rs225014 SNP results in a nonsynonymous amino acid substitution (Thr92Ala) in exon 2 and has also been associated with IR in obese Caucasian women [12]. The common rs225014 SNP DIO2-T allele creates five unique TFBS for the FOXC1, HOXA5, SPI1, STAT5A:STAT5B and THAP1 TFs which are involved with cell viability and resistance to oxidative stress, cell development and myeloid and B-lymphoid cell development, signal transduction and activation of transcription, and G1/8 cell-cycle progression respectively ( Three DIO2 SNPS (rs225011, rs225014 and rs225015) have been modestly associated with early-onset T2DM in Pima Indians [11] while the rs225014 and rs225017 SNPs have been found to be associated with IR in Caucasian morbidly obese subjects and T2DM patients [12,13]. The common rs225011 SNP DIO2-C allele creates two unique TFBS for the CRX and RXRA TFs which are involved with photoreceptor cells and retinoic acid-mediated gene activation, respectively (Table, supplement). The minor rs225011 SNP DIO2-T allele creates three unique TFBS for the FOXL1, MEF2A and PDX1 TFs which are involved with metabolism, cell proliferation and gene expression, skeletal and cardiac muscle development, glucose-dependent regulation of insulin gene transcription, respectively (Table, supplement). There are also eight conserved TFBS between the rs125011 SNPs alleles for the ESRRA, ESRRB, GATA4, NKX2-5, NR5A2, PRRX2, RORA_1 and RORA_2 TFs which are involved with sitespecific transcription regulation, myocardial differentiation and function, negative regulation of chondrocyte maturation, regulation of cholesterol expression in liver, proliferating fetal fibroblasts, and nuclear hormone receptors, respectively (Table, supplement).
The common rs225015 SNP DIO2-G allele creates five unique TFBS for the EBF1, ESRRA, PPARG:RXRA, RFX5 and THAP1 TFs which are involved with transcription activation, site-specific transcription regulation, regulation of adipocyte differentiation and glucose homeostasis, and regulation of endothelial cell proliferation and G1/8 cell-cycle progression, respectively (Fig. 1 There are also ten conserved TFBS between the SNPs alleles for the BRCA1, ELF5, HLTF, NFATC2, NFKB1, SOX2, SOX3, SOX6, SOX10 and SPIB TFs which are involved with genomic stability, epithelium cells, altering chromatin structure, cytokine genes in T-cells, signal transduction, regulation of embryonic development, neuronal development, central nervous system, and lymphoid-specific enhancement, respectively (Table, supplement).
Four DIO2 SNPs (rs225011, rs225014, rs225015 and rs6574549) were nominally associated with hepatic glucose output while the rs6574549 SNP was also associated with fasting insulin, insulin action and energy expenditure in Pima Indians [11]. The common rs6574549 SNP DIO2-T allele creates five unique TFBS for the ARID3A, HNF1B, HOXA5, LHX3 and NKX3-2 TFs which are involved with cell cycle progression, embryonic pancreas development, specific positional identities of cells, pituitary development and chondrocyte maturation, respectively (Table, supplement). The minor rs6574549 SNP DIO2-G allele creates four unique TFBS for the FOXA1, FOXA2, NFIL3 and POU2F2 TFs which are involved with embryonic development, expression of interleukin-3, POU domain family, respectively (Table, supplement). There are also seven conserved TFBS between the SNP alleles for the FOXC1, FOXD3, FOXI1, FOXL1, HLTF, NKX3-1 and NKX2-5 TFs which are involved with cell viability and resistance to oxidative stress, activation and repression, normal hearing, sense of balance and kidney function, ontogenesis, altering chromatin structure, epithelial cell growth, and chondrocyte maturation, respectively (Table, supplement). These four DIO2 SNPS and a nov 5'UTR flanking region were found to be in LD in the Pima Indian study [11]. The common novel SNP DIO2-C allele creates four unique TFBS for the BRCA1, NFYA, RUNX1 and RUNX2 TFs which are involved with genomic stability, stimulation of transcription of many genes, development of normal hematopoiesis and maturation of osteoblasts, respectively (Table, supplement). The rare novel SNP creates nine unique TFBS for the ARID3A, CDX2, GFI1, HOXA9, NKX2-5, NOBOX, PBX1, 5 are involved with cell viability and resistance to dative stress, activation and repression, normal hearing, sense of balance and kidney function, ontogenesis, altering chromatin structure, epithelial cell growth, and chondrocyte maturation, respectively (Table, supplement).
SNPS and a novel SNP in the 5'UTR flanking region were found to be in LD in The common novel C allele creates four unique TFBS for the BRCA1, NFYA, RUNX1 and RUNX2 TFs genomic stability, stimulation of transcription of many genes, development of normal hematopoiesis and pectively (   . Table). The novel -2035bp rSNP common DIO2 TFBS. As shown, this rSNP is located in potential TFBS is their % sequence homology to the duplex

DISCUSSION
The genome-wide association studies (GWAS) has over the past decade provided us with nearly 6,500 disease or trait-predisposing S seven percent of these SNPs are located in protein-coding regions of the genome while the remaining 93% are located within non coding regions [40,41] such as gene regulatory or intergenic areas of the genome. Much attention has been drawn to SNPs that occur in the putative regulatory of a gene where a single nucleotide change in the DNA sequence of a potential TF motif may affect the process of gene regulation [17,19,42]. A nucleotide change in a TFBS can have multiple consequences. Since a TF can usually recognize a number of different binding motifs in a gene, the SNP may not change the TFBS interaction with the T consequently not alter the process of gene expression. In other cases the nucleotide change may increase or decrease the TF's ability to bind DNA which would result in alleleexpression. In some cases a nucleotide change may eliminate the natural binding motif or generate a new BS as a result the gene is no longer regulated by the original TF Therefore, functional rSNPs in TFBS may result in differences in gene expression, phenotypes and susceptibility to environmental exposure [42]. Examples of rSNPs associated with disease susceptibility are numerous and several reviews have been published [42][43][44][45]. wide association studies (GWAS) has over the past decade provided us with nearly predisposing SNPs. Only seven percent of these SNPs are located in coding regions of the genome [38,39] while the remaining 93% are located within nonas gene regulatory or intergenic areas of the genome. Much attention has been drawn to SNPs that occur in the putative regulatory of a gene where a single nucleotide change in the DNA sequence of a potential TF motif may affect the process of gene A nucleotide change in a TFBS can have multiple consequences. Since a TF can usually recognize a number of different binding motifs in a gene, the SNP may not change the TFBS interaction with the TF and consequently not alter the process of gene expression. In other cases the nucleotide change may increase or decrease the TF's ability to bind -specific gene expression. In some cases a nucleotide change he natural binding motif or generate a new BS as a result the gene is no longer regulated by the original TF [14,16]. Therefore, functional rSNPs in TFBS may result in differences in gene expression, phenotypes susceptibility to environmental exposure NPs associated with disease susceptibility are numerous and several reviews The rs225012 rSNP DIO2-G allele [G ( or C (+ strand)] located in the E2F6 and ELF1 TFBSs have a 100% occurrence in humans while the EGR1 and SPI1 TFBS have a 94% and 92% occurrence, respectively (Table). binding sites (BS) occurs only once in the gene, this rSNP would probably have a major impact on these TFs regulating the gene. The ERG SP1 TFBSs also have a 100% occurrence in humans but these BS occur more than once in the gene and should be the rSNP would not have much of an impact gene regulation (Table). alternate rs225012 rSNP DIO2 (-strand) or T (+ strand)] located in the PRRX2 TFBS has a 100% occurrence in humans but occurs 55 times in the gene therefore the rSNP would not be expected to have an impact on the TFs regulating the gene (Table). On the other hand, the rs225012 rSNP DIO2-A allele lo in HOXA5 and NKX3-2 TFBSs also have a 100% occurrence in humans and occur only once in the gene and therefore, should have a major impact on gene regulation since these BS only occur with the minor allele (Table). The E2F6 TFBS provided by the rs225012 rSNP common G allele and not present with the minor A allele is a BS for a TF which is involved with the control of the cell cycle and the action of tumor suppressor proteins. Consequently individuals carrying the rs225012 rSNP DIO2-A allele maybe a sickness or disease. In fact, the rs225012 rSNP DIO2 AA genotype [TT genotype ( frequency has been significantly associated with MR [9] in Chinese patients. 00% occurrence in humans while the EGR1 and SPI1 TFBS have a 94% and 92% currence, respectively (Table). Since these binding sites (BS) occurs only once in the gene, this rSNP would probably have a major impact on these TFs regulating the gene. The ERG and SP1 TFBSs also have a 100% occurrence in humans but these BS occur more than once in the rSNP would not have mpact gene regulation (Table). The

DIO2-A allele [A T (+ strand)] located in the PRRX2
TFBS has a 100% occurrence in humans but occurs 55 times in the gene therefore the rSNP would not be expected to have an impact on the TFs regulating the gene (Table). On the other A allele located 2 TFBSs also have a 100% occurrence in humans and occur only once in the gene and therefore, should have a major impact on gene regulation since these BS only occur The E2F6 TFBS 12 rSNP common G allele and not present with the minor A allele is a BS for a TF which is involved with the control of the cell cycle and the action of tumor suppressor proteins. Consequently individuals carrying the A allele maybe at risk for In fact, the rs225012 rSNP AA genotype [TT genotype (-strand)] frequency has been significantly associated with Table 1. The DIO2 SNPs that were examined in this study where the minor allele is in red. Also listed are the transcriptional factors (TF), their potential binding sites (TFBS) containing these SNPs and DNA strand orientation. TFs in red differ between the SNP alleles. Where upper case nucleotide designates the 90% conserved BS region and red is the SNP location of the alleles in the TFBS. Below the TFBS is the nucleotide occurrence (%) obtained from the Jaspar Core database. Also listed are the number (#) of binding sites in the gene for the given TF. Note: TFs can bind to more than one nucleotide sequence. The rs225017 rSNP DIO2-T allele [A (-strand) or T (+ strand)] located in the JUND (var.2) and STAT3 TFBS have in humans a 75% and 100% occurrence, respectively (Table). Since these BS occurs only once in the gene, this rSNP would probably have a major impact on these TFs regulating the gene. The HOXA5, NFE2L1: TCFE2 and PDX1 TFBSs have an 88%, 85% and 97% occurrence, respectively, in humans but these BS occur more than once in the gene and consequently, the rs225017 rSNP might not have much of an impact on gene regulation by these TFs (Table). The minor rs225017 rSNP DIO2-A allele [T (-strand) or A (+ strand)] located in the CEBP&  have a 100% occurrence in humans and are found only once in the gene. Since these BS only occur once in the gene, the SNP would probably have a major impact on these enhancer and inflammation TFs regulating the gene. The NKX2-5, PRRX2 and SRY TFBS have in humans a 100%, 98% and 96% occurrence, respectively; however, these BS occur more than once in the gene and consequently this rSNP might not have much of an impact on DIO2 regulation by these TFs (Table).

SNP
Similar logic can be used to evaluate the potential TFBS within the other DIO2 rSNPs found in the Table. It should be noted that the minor -2035bp novel rSNP T allele creates ten unique potential TFBS compared to the common C allele which creates only four BS while the rs225012 rSNP DIO2 alleles each generate eight unique potential TFBS. In fact, 57 potential TFBS are created by the minor alleles of the nine SNPs compared to 39 TFBS created by the common alleles with 51 TFBS being shared by both alleles. Since the MAF of the nine SNPs ranges from 0.004 to 0.421, the potential TFBS generated by the minor alleles should have a tremendous impact on thyroid related illnesses and other sickness in humans. As an example, the POU2F2 (POU class 2 homeobox 2) TFBS is only created by the minor rare allele of rs6574549 and occurs only once in the gene which is important because it's a TF that binds in immunoglobulin gene promoters (supplement). This rSNP has been associated with fasting insulin, insulin action and energy expenditure in Pima Indians [11].
Human diseases or conditions can be associated with rSNPs of the DIO2 gene as illustrated above. What a change in the rSNP alleles can do, is to alter the DNA landscape around the SNP for potential TFs to attach and regulate a gene. As an example, the potential TFBS associated with the novel -2035bp common rSNP DIO2-C allele from Table are illustrated in Fig. 2 as well as the rs225015 rSNP DIO2-G allele illustrated in Fig. 1. As can be seen in Table, these potential TFBS change when an individual carries the minor allele. The importance of this can be illustrated with the BRCA1 TFBS where the common allele has this function and the minor allele does not. The BRCA1 TF plays a role in maintaining genomic stability and also acts as a tumor suppressor. Another example would be the PPARG::RXRA TFBS where the common allele has this function while the minor allele does not. This TF has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer.

CONCLUSION
SNPs that alter the TFBS are not only found in the promoter regions but in the introns, exons and the UTRs of a gene (Table). The nucleus of the cell is where epigenetic alterations occur and TFs operate to convert chromosomes into single stranded DNA for mRNA transcription while it is the cytoplasm where mRNA is processed by separating exons and introns for protein translation. Consequently, it doesn't matter where TFs bind the DNA in the nucleus because it is only there that TFs function. The SNPs outlined in this report should be considered as rSNPs since they change the DNA landscape for TF binding and have been associated with disease. In this report, examples have been described to illustrate that a change in rSNP alleles in the DIO2 gene can provide different TFBS which in turn are also associated with disease in humans. The potential alterations in TFBS obtained by computational analyses need to be verified by future protein/DNA electrophoretic mobility gel shift assays and gene expression studies.

CONSENT
It is not applicable.

ETHICAL APPROVAL
It is not applicable.

COMPETING INTERESTS
Author has declared that no competing interests exist.