The genetic variants in 3’ untranslated region of voltage-gated sodium channel alpha 1 subunit gene affect the mRNA-microRNA interactions and associate with epilepsy

mRNA expression in a cell or subcellular organelle is precisely regulated for the purpose of gene function regulation. The 3’ untranslated region (3’UTR) of mRNA is the binding target of microRNA and RNA binding proteins. Their interactions regulate mRNA level in specific subcellular regions and determine the intensity of gene repression. The mutations in the coding region of voltage-gated sodium channel alpha 1 subunit gene, SCN1A, were identified in epileptic patients and confirmed as causative factors of epilepsy. We investigated if there were genetic variants in 3’UTR of SCN1A, affecting the microRNA-mRNA 3’UTR interaction and SCN1A gene repression, potentially associated with epilepsy. In this case–control study, we identified twelve variants, NM_001202435.1:n.6277A > G, n.6568_6571del, n.6761C > T, n.6874A > T, n.6907 T > C, n.6978A > G, n.7065_7066insG, n.7282 T > C, n.7338_7344del, n.7385 T > A, n.7996C > T, and n.8212C > T in 3’UTR of SCN1A gene. We found that the variant of n.6978A > G in all our samples was completely mutated (G/G). In male group, T allele in n.7282 T > C was associated with epilepsy, while C allele was significantly less frequent in epileptic patients than in normal males (OR 0.424). Consequently, the haplotype “CTTACATGACGA” / “CTCTA” was significantly less frequent in male epileptic patients (0.173) than in normal males (0.305). The frequency of haplotype block found in females, "TTTAACA", "TTCAACA", and "CTTAACA" was 0.499, 0.254 and 0.234 respectively. Within STarMir model analysis, the “CTCTA” haplotype showed significantly higher site accessibility to microRNA targeting and higher downstream sequence accessibility for nonconserved binding than that of other haplotypes. Overall, the male genotypes have the higher accessibility of the downstream 30nt block of nonconserved site than the female genotypes. NM_001202435.1:n.7282 T > C is the genetic variant associated with epilepsy in males, and the related haplotype “CTTACATGACGA” / “CTCTA” in the region of chr2: 165991297–165989081, which has high site accessibility for microRNA binding, is the genetic protective factor against epilepsy in males. In female subset, the frequencies of haplotype block "TTTAACA", "TTCAACA", and "CTTAACA" were 0.499,0.254 and 0.234 respectively. Alleles and haplotypes distribution did not differ in female cases in comparison to female controls.

Background mRNA stability, transport, and local translation are critical for gene function regulation. The mRNA intrinsic sequence of a particular gene and other intracellular factors determines the half-life of the mRNA. MicroRNA (miRNA) binds to a site or a part of sequences in three prime untranslated region (3'UTR) of mRNA, destabilizes mRNA, and represses the targeted gene translation [1]. The variant sequences in 3' UTR alter the binding feature of miRNA, and influence gene repression process. For example, some SNPs in the responsible genes (IL23R, LCE3D, et al.) destroyed or created miRNA binding sites and were associated with the clinical psoriasis phenotypes [2]. In a group of schizophrenia patients, rs3219151 (C > T, GABRA6) was identified and related to the decreased risk for schizophrenia [3]. miRNA works in a particular way for activity-dependent regulation of mRNA stability and translation [4,5]. The variants in miRNA coding genes altered miRNA expression, processing, function, and then associated with diseases. Studies showed that rs353291 in miR-145 associated with breast cancer [6], rs11614913 in miR-196a2 associated with bladder cancer [7]. The local concentrations of miRNA and RNA binding proteins determine the binding site occupancies, which in turn regulate mRNA stability and localization, protein production [8].
To analyze the mRNA-miRNA interactions, STarMir is a helpful resource for miRNA studies [9]. It describes many detailed features of predicted sites based on the logistic prediction models [10]. Those features could represent the mRNA-microRNA binding with the probability related to mRNA structure and microRNA binding location. The predicted sites could be distinguished and analyzed with the binding energy, the availability of binding and adjacent sequences, AU percentage in binding sites, and relative starting location of predicted sites. For example, would particular mRNA sequence have higher site availability than others and would the high concentrations of mRNA and microRNA facilitate the mRNA-microRNA binding or induce the competition? STarMir model analysis hypothesizes that miRNAbinding has two steps, nucleation, and hybrid elongation, and both of them requires energy [11]. The power of distinct binding sites displays the difference of each mRNA-miRNA interaction, due to the variant sequence in mRNA binding sites. It is novel to analyze the whole 3'UTR sequences and its binding ability to microRNA, with the usage of STarMir parameters. The imitation of microRNA binding to 3'UTR could provide the information that could be investigated for the potential effects of 3'UTR genetic variants on gene repression.
The abnormality in the gene coding voltage-gated sodium channel alpha1 subunit (SCN1A) is a causative factor in febrile seizure related epilepsy syndromes, such as Dravet syndrome, and Genetic Epilepsy with Febrile seizure plus [12]. The mutations of the protein-coding region (or exon) of SCN1A gene are also pathogenic factors to neuronal hyperexcitability [13]. Another factor that regulates the SCN1A expression is also the potentially epileptogenic factor, such as 5' untranslated region of SCN1A gene [14,15]. We hypothesize that the variant sequence in 3'UTR is also a critical regulatory mechanism through miRNA-binding 3'UTR interactions [16].

Study subjects
We enrolled 101 epileptic patients and 126 healthy individuals in 2004 through 2010 in The Second Affiliated Hospital of Guangzhou Medical University. The clinical diagnosis of epilepsy or epilepsy syndrome was based on the criteria of the Commission on Classification and Terminology of the International League Against Epilepsy (ILAE) (1981,1989). All individuals enrolled in this study were not related with one another by the family relationship or consanguinity.

Genotyping and allele analysis of variants
The genomic DNA was extracted from peripheral white blood cells of participants with guanidine/SDS method (see Additional file 1: Supplemental material 1). We applied the final genomic DNA collection directly in PCR experiment for target fragments replication (3'UTR in SCN1A gene). The four pairs of primers for genomic DNA replication were listed below, wp615-5'-TGATCT GACCATGTCCACTGC, wp616-5'-CCCTCATGCAAA CCACGAC (680 bp); wp617-5'-TTTTGTAAACGAA GTTTCTGTTGAG, wp618-5'-GAAACCAGATACAGC AGCATGG (732 bp); wp619-5'-TGTAGAGTGCAAGC TTTACACAGG, wp620-5'-GAATCGTGAACCTATTTT GCTCC (601 bp); wp621-5'-CACAATCACTTTTCTTA CTTTCTGTCC, wp622-5'-CCTTCTCCCCCAATTTGT AATG (660 bp). We then sent the PCR products to BGI Guangzhou Office (ABI 3730xl sequencer) for sequencing. If the sequencing files indicated the deletion or insertion mutations, we would use molecular cloning method to amplify the single genomic DNA chain within cloning vectors for re-evaluation. The variants identified by genotyping were summarized in male and female group respectively. The number of "aa", "ab", and "bb" genotypes were summed into chi-square test table for genotype-associated study. At the same time we calculated the frequency and number of two alleles (a or b) with the formula (a = 2*aa + ab; b = 2*bb + ab) and summarized them in another chi-square test table for allele-associated study. The p-value less than 0.05 was the criteria for the statistically significant difference.

Haplotype analysis
The findings of genetic variants from male and female individuals were summarized into two tables (Additional file 2: Table S1 and Table S2) with Haploview program [17] Linkage Format (*.ped). The form or file listed the variables of family ID (enrollment ID into the study), Individual ID (case ID in Epilepsy Center or in Healthy Center), Father ID (zero, independent without family relationship between subjects), Mother ID (zero), gender (1 means male, 2 means female), affection status (1 means unaffected, 2 means affected or with epilepsy), and the alleles labeled with two numbers from zero to four (1 = A, 2 = C, 3 = G, 4 = T, 0 = missing data or deletion). By choosing "Four Gamete Rule" to define blocks, the haplotype data were displayed with frequency and chi-square value, the association p-value.
STarMir input, output, and data analysis miRNA data from Landgraf et al. [18] were downloaded and the approximate 50 miRNAs mostly expressed in each part of CNS (hippocampus, frontal cortex, midbrain, and cerebellum) were input into the miRbase database (http://www.mirbase.org//cgi-bin/starmirtest2.pl/) for mature 22 nt-miRNA form. In the STarMir web (http://sfold. wadsworth.org/cgi-bin/starmirtest2.pl), we uploaded one 50-miRNA set miRNA files into "microRNA sequence(s), manual sequence entry". The 3'UTR sequence files were modified with background genetic variant, n.6978A > G (G/G), and each of seven haplotypes, two deletions, one insertion, or one wild-type (reference sequence). The modified sequence files were input into the option of "single target sequence, manual sequence entry". By choosing "V-CLIP based model (human)", "Human (homo sapiens)", and "3'UTR" [10], the following parameters [19] would be displayed in the output window and ready for further analysis: "LogitProb", the probability of the site being a miRNA-binding site as predicted by STarMir logistic model. "site_position", start to end position of the predicted miRNA target region in mRNA; "seed_position", start to end position of the target sub-region in mRNA complementary to the miRNA seed (corresponding to positions 2-7/8 of the miRNA); "seed_type", 6mer, offsite 6mer, 7mer-A1, 7mer-m8, and 8mer seed sites described in Bartel 2009 [20]; "site_access", the structural accessibility as calculated by the average probability of a nucleotide in mRNA being single-stranded for binding the nucleotides in microRNA; "seed_access", the structural accessibility as calculated by the average probabilities being singlestranded of the nucleotides in the target sub-region of mRNA complementary to the miRNA seed; "upstream_access (#nt)", the structural accessibility described by the average probabilities of single-stranded nucleotides in upstream block of the predicted binding site (# is the block size); "dwstream_access (#nt)", the structural accessibility displayed by the average of single-stranded probabilities for the downstream block of the predicted site; "upstream_AU (#nt)", the percentage of AU for the upstream block of the binding site (# is the block size); "dwstream_AU (#nt)", the percentage of AU for the downstream block of the binding site (# is the block size); "site_location", the location of the predicted site relative to the start and end along the entire length of sequence (for 3'UTR, zero represents the 5' end and one accounts for the 3' end); ΔGhybrid, the stability of miRNA:target hybrid as calculated by RNAhybrid [21] in Rehmsmeier et al. 2004; ΔGnucl, the potential nucleation for miRNA:target hybridization [10]; ΔGtotal, the total energy change during the entire process of hybridization [11]. We filtered the output data with criteria of LogitProb of 0.5 or higher [10] and ΔGhybrid of −15 kcal/mol or lower [22]. We compared the means of those parameters in each genotype group with one-way ANOVA or Kruskal-Wallis test. We also added the miRNA copy number [18] and mRNA expression amount (reads per kilobase transcript per million reads, RPKM, http://www.gtexportal.org/home) [23] into the data pool for the correlation and regression analysis with STarMir parameters.

Statistical analysis
Manual chi-square table was applied to describe the distribution of common genetic variants in case/control study, to calculate and compare the genotype frequency and allele frequency in case group and control group. Fisher's exact test was used to analyze the distribution of rare genetic variants in case and control groups. Haploview software 4.2 was applied to calculate the haplotype frequency in groups and the chi-square value, p-value in haplotype association test. Using IBM SPSS Statistics 23 (IBM Corporation, Armonk, NY, USA), we compared the means, standard deviation (SD), standard errors of means (SEM) of STarMir parameters in "genotype" groups (11 genotypes) with one-way ANOVA, and "Dunnett T3" in Post Hoc. If the data of some STarMir parameters did not pass the "homogeneity for variants" test (p < 0.05) the non-parameter test, "independent samples Kruskal-Wallis test" was used to compare the mean ranks of multiple genotype groups. "Bivariate Correlation" and "Spearman" were applied in the correlation analysis (p < 0.05 and r ≠ 0) with miRNA expression weight or mRNA expression quantities as "x" variable, and other STarMir parameters as "y" variable. In linear regression analysis, "miRNA expression weight" or "mRNA expression" was "Independent(s)" and other significantly-correlated STarMir parameters were "Dependent", p < 0.05 in "ANOVA" table to define the significant regression. We used the B values of constant and mRNA_expr to fill the linear vector form (y = ax + b).

Study subjects
We collected SCN1A 3'UTR genotyping data from 101 epileptic patients (52 males and 43 females) and 126 controls (59 males and 68 females). All patients, when enrolled in this study, were older than two years old. In the male subgroup, 15 (29 %) patients had a positive history of febrile seizures. 28 (53.8 %) patients were at the age of 2-9 years, 11 (21.1 %) patients were at the age of 10-19 years, and 7 (13.5 %) patients were at the age of 20-29 years. In the female subgroup, eight (18.6 %) patients had a positive history of febrile seizures. 18 (41.9 %) patients were at the age of 2-9 years. 13 (30.2 %) patients were at the age of 10-19 years. And five (11.7 %) patients were at the age of 20-29 years. In the male healthy control group, 29 (49.2 %) males were at the age of 20-29 years. Among female controls, three (4.9 %) females were at the age of 10-19 years and 45 (73.8 %) females were at the age of 20-29 years. All participants were Chinese Han, and most of them lived in southern China.   [18], the four parts of CNS, hippocampus, frontal cortex, cerebellum, and midbrain, have distinctive miRNA expression profile. The approximate 50 miRNAs most expressed in each part were described in a color-coded histogram with copyright permission (Additional file 1: Table S4). b the means of LogitProb in genotype groups were significantly different. * represents that the mean of LogitProb in the genotype group was significantly different from that of wild type group. c the means of ΔGtotal in genotype groups were significantly different. * represents that the mean of ΔGtotal in the genotype group was significantly different from that of wild type group. d the means of site_access in genotype groups were significantly different. * represents that the mean of site_access in the genotype group was significantly different from that of wild type group. e the means of Dwstream_access_30 nt in genotype groups were significantly different. * represents that the mean of Dwstream_access_30 nt in the genotype group was significantly different from that of wild type group . The genetic variant n.6978A > G was fully deviated (G/G, 100 %) from that of the reference (A/A). We set its genotype (G/G) as the sample background genotype in the miRNA-3'UTR interaction study. The variants frequencies of n.6277A > G, n.6568_6571del, n.6761C > T, n.6874A > T, n.6907 T > C, n.7065_7066insG, n.7338_ 7344del, n.7385 T > A, were quite low, one or two cases in some gender group (male group or female group). The p-values from their Fisher's exact tests showed that none of them was associated with case/control differences (Additional file 2: Table S4). Besides the genotype association test, we also calculated the single allele frequency and allele association test between case and control group.
Consistently, the C allele of n.7282 T > C in male patients (0.191) was significantly less frequent (p < 0.05) than that in male controls (0.305). We displayed the detailed chisquare test in Table 1.
Haplotype analysis in case/control study Seven major haplotypes in male (4 haplotypes) and female (3 haplotypes) subset were identified (Table 2). There were "CTTTA" (0.445), "CTCTA" (0.241), "CCTTA" (0.186), and "TTTTA" (0.114) in male subset. The most frequent two haplotypes had the higher distinction between cases and controls. The frequency of "CTTTA" was 0.490 in male patients and 0.361 in male controls. But the haplotype association test showed that only the frequency of haplotype "CTCTA" was significantly lower in male patients (0.173) than in male controls (0.305) (p = 0.026).
In the female subset, the frequency of haplotype block "TTTAACA", "TTCAACA", and "CTTAACA" was 0.499, 0.254, and 0.234 respectively. They distributed quite even between patients and controls. In female patients, the frequency of three haplotypes were 0.512, 0.279, and 0.209, while in female controls, the frequency of three haplotypes were 0.490, 0.239, and 0.249 respectively. In the female subset, there was no statistical difference in case/control distributive frequency. (Table 2).
STarMir model analysis of SCN1A-3'UTR haplotype and miRNA interaction We summarized the 50 highly-expressed mature miRNAs in four parts of the central nervous system (CNS) in Fig. 1a, Additional file 2: Table S5.

The comparable means of StarMir parameters from gender-deviated genotypes
We calculated the haplotypes from male and female groups separately and correspondingly the male group and the female group had distinctive haplotypes, four haplotypes from males, three haplotypes and three insertion/deletions from females. After the STarMir parameters were calculated based on the male genotypes (CTTTA, CTCTA, CCTTA, and TTTTA) and female genotypes (CTTAACA, TTCAACA, TTTAACA, 6568_ 6571del, 7065_7066insG, and 7338_7344del). We found that dwstream_access_30 nt_nonconserved was significantly different (p < 0.001) between males (0.427 ± 0.103, mean ± SD) and females (0.423 ± 0.099) (Fig. 2 g, Additional file 2: Table S9).

Discussion
Gene mutations in a coding region caused amino acid replacement, deletion, abnormal protein structure, or truncated/incomplete protein sequence. In non-coding regions of DNA sequence, the variants may exist in order to Fig. 2 STarMir parameter comparison between conserved/ nonconserved sites and gender groups. a the free energy change in conserved and nonconserved binding. * represents that the mean of the free energy parameter in nonconserved sites was significantly different from that of conserved sites. b the comparison of STarMir parameters commonly used in both conserved and nonconserved sites. * represents that the mean of the marked STarMir parameter in nonconserved sites was significantly different from that in conserved sites. c the accessibility of downstream 30 nt block of the nonconserved site in male and female groups. * represents that the mean of the Dwstream_access_30nt_nonconserved of male genotypes was significantly different from that of female genotypes  regulate the gene product quantity or preserve characteristic information [4]. In the genotyping results, n.6978A > G was fully deviated from the reference (A/A). It may indicate that our subjects' SCN1A gene was originated from a narrow genetic source with distinctive inherent background (racial factor), or our subjects are under the influence of the small range of residency area or of similar social factors. The allele of n.7282 T > C had significantly lower frequency of C in male patients than that in normal males. Consequently, the frequency of haplotype "CTCTA" was significantly lower in patients group. It indicated that C allele in n.7282 T > C should be a protective factor against epilepsy (OR 0.424). The miRNA-mRNA (3'UTR) interaction STarMir prediction data supported this finding: the structural accessibility of the predicted site in "CTCTA" 3'UTR (0.408 ± 0.128) was as high as that in wild-type (0.406 ± 0.117), while site_access of other genotypes was significantly lower than that of wild-type. Additionally, "CTCTA" genotype had also the higher accessibility of downstream sequence of non-conserved binding sites (0.431 ± 0.107) than wild-type genotype (0.426 ± 0.099). "CTCTA" should be friendly accessible for miRNA binding and gene repression process, which should not be halted or handicapped by the mutated 3'UTR sequences. Instead, the 3'UTR sequences of other genotypes could influence the miRNA-related gene repression negatively. In the female subset, although we identified several novel mutations and recognized the major alternation in STarMir parameters of two deletions and one insertion mutation, due to few mutated case (only one for each mutation) we could not reach the statistically significant difference between female patients and controls. Other mutations were distributed very even in female patients and female controls. On the contrary, in males, there were no severe mutations that changed the microRNA-mRNA (3'UTR) interaction, but the common variants were distributed differently in male patients and male controls, which contributed to the statistically significant and comparative In comparison to other studies, the authors analyzed the genetic variants in 3'UTR with other mechanisms on gene regulation, such as AU-rich sequence [24], GAPDH-binding site [25]. Our study could have the limitation of the chi-square test and the fundamental theory, the microRNA-mRNA (3UTR') interaction, to analyze the genetic variants in 3'UTR.
Overall, the miRNA expression negatively influenced microRNA-mRNA (3'UTR) interaction according to the correlation and regression results on STarMir parameters. The predominantly negative effects of higher miRNA expression were the lower probability of being the site predicted, the lower stability of microRNA:target, the lower percentage of AU in upstream block, and the lower structural accessibility of microRNA-complementary sequence. We might have interpreted that more microRNA would induce the competition among the possible binding sites, the excessive possibility of microRNA:target binding could interfere the existing microRNA:target hybridization, more microRNA would impair the accessibility of microRNA-complementary sequences, and the more microRNA could shift the binding sites to Less AU upstream sites. On the contrary, the mRNA quantities could positively influence the microRNA-mRNA (3'UTR) interaction by promoting 3'end-proximal site binding, and higher downstream sequence of conserved site accessibility. Based on the negative effects of microRNA and mRNA quantity on seed_access, we could reasonably assume that on the physiological baseline, the microRNA expressed in CNS and mRNA of SCN1A are relatively excessive to conserved binding of microRNA-mRNA (3'UTR) interactions, which might be an interesting study direction in the future.
STarMir principally uses two free energy values to predict and calculate the microRNA-mRNA binding features [11]. It is believed that STarMir has less putative targets with additional specific threshold condition [26]. It required the input mRNA sequence less than 5000 nt. However, our study subject, SCN1A gene, has the long mRNA sequence (NM_001202425.1), 8342 bp. Although STarMir provides the optimal option for mRNA input, the full length of mRNA for complex structural assessment, we were able to input 3'UTR sequence of SCN1A gene, a part of mRNA sequence, for binding site prediction. For the purpose of the description of binding sites, STarMir had better performance on our task, providing multiple parameters for the site comparison and analysis. Other microRNA predictive tools, such as PITA [26], and RNAhybrid [21] could not provide those binding features for our purposes.
The diseases of multifactorial inheritance have complex etiology with the function of multiple genes and the complex epigenetic mechanisms are involved. Epilepsy is one of the diseases of multifactorial inheritance. The causative factors include the dysfunctional gene products of SCN1A, GABRA1 (alpha subunit of GABA receptor), and CHRNA4 (subunits of nicotinic AChr receptor) gene [27]. Many factors are contributing to the gender difference of human brains, such as sex hormone physiology, the fine tune of neuroendocrine system functioning [28,29], and the environmental and educational interferences. Consistently, the males and females have different mRNA expression baseline of SCN1A gene in the brain [23], which further supports our finding and could also be the outcome of gene regulation based on microRNA-mRNA interaction. Generally speaking, our study presented the gender-different data probably resulting from epigenetic mechanism (actively involving environmental factors), and dynamic miRNA-mRNA (3'UTR) interaction. On the other side, it could also be the outcome of human genderdifferent adaption and selection in the genetic evolution. How the sex factors fine-tune the gene expression and repression or the function of voltage-gated sodium channels would be an attractive topic for clinical scientists and neurobiologists in the future.

Conclusions
Using case/control association study and STarMir model, we efficiently analyzed the genetic variants in 3'UTR of SCN1A gene in epileptic patients and smallsized controls. The male epileptic patients had significantly lower frequency of C allele in n.7282 T > C than normal males, and the OR value was 0.424. The related haplotype "CTCTA" also had the significantly lower frequency in male epileptic patients. The frequencies of haplotype block "TTTAACA", "TTCAACA", and "CTTAACA" in female subset were 0.499,0.254 and 0.234. Their frequencies were not significantly different between case and controls. The STarMir analysis displayed that male haplotype "CTCTA" had high site accessibility and other favorable features for miRNA binding, compared with other genotypes and haplotypes. The 3'UTR or related miRNAs could be the potential targets of therapeutic strategies in the future study of epilepsy.