Pathway-based analysis of genome-wide association study of circadian phenotypes

Sleepiness affects normal social life, which attracts more and more attention. Circadian phenotypes contribute to obvious individual differences in susceptibility to sleepiness. We aimed to identify candidate single nucleotide polymorphisms (SNPs) which may cause circadian phenotypes, elucidate the potential mechanisms, and generate corresponding SNP-gene-pathways. A genome-wide association studies (GWAS) dataset of circadian phenotypes was utilized in the study. Then, the Identify Candidate Causal SNPs and Pathways analysis was employed to the GWAS dataset after quality control filters. Furthermore, genotype-phenotype association analysis was performed with HapMap database. Four SNPs in three different genes were determined to correlate with usual weekday bedtime, totally providing seven hypothetical mechanisms. Eleven SNPs in six genes were identified to correlate with usual weekday sleep duration, which provided six hypothetical pathways. Our results demonstrated that fifteen candidate SNPs in eight genes played vital roles in six hypothetical pathways implicated in usual weekday bedtime and six potential pathways involved in usual weekday sleep duration.


Introduction
Sleepiness impairs social function, reduces quality of life and causes occupational and motor vehicle accidents [1] . While behavioral factors, circadian factors (time of day), duration of wakefulness and sleep disorders are closely linked to daytime sleepiness [2] , there are great interindividual differences in susceptibility to sleepiness [3] . Accumulating evidence shows that excessive sleepiness is heritable [4][5] . In modern society, nearly one-fifth of employees are involved in long-term night shift [6] . As a result, work performance and scheduling have a significant impact on individual variability in diurnal preference. Studies also indicate that diurnal preference (namely usual weekday bedtime) is heritable [7][8][9] . In addition, usual weekday sleep duration plays a critical role in daytime sleepiness. It has been investigated whether short or long sleep duration has been related to coronary heart disease [10] , diabetes mellitus [11][12] , hypertension [13] , and mortality [14] . Likewise, usual day sleep duration is heritable [15] .
To date, several single nucleotide polymorphisms (SNPs) associated with circadian phenotypes in some genes were detected from three genome-wide association studies (GWASs) [16][17][18] , but the functions of these SNPs remain undefined, which is a challenge in interpreting GWAS results [19] . Thus, pathway-based approaches were optimized gradually, and the Identify Candidate Causal SNPs and Pathways (ICSNPathway) was created to determine potential SNPs and hypothetical mechanisms through GWAS data, using linkage disequilibrium (LD) analysis, functional SNP annotation and pathway-based analysis (PBA) [20] . Herein, we used bioinformatics methods combining ICSNPathway analysis and HapMap database to identify candidate SNPs and relevant pathways, aiming to develop SNPgene-pathway hypotheses regarding circadian phenotypes.

Study population and data extraction
We applied publicly available databases to identify eligible GWASs on circadian phenotypes, which are the National Human Genome Research Institute GWAS catalog (http://www.genome.gov/26525384), the National Center for Biotechnology Information (NCBI) dbGap (http://www.ncbi.nlm.nih.gov/gap/), and the GWAS central (http://www.gwascentral.org/). In addition, both EMBASE and PUBMED databases were searched with the following key words: "GWAS" or "genome-wide association study" and "circadian". All searches were completed up to April 20th, 2016 without language limitation. In order to reduce the effect of genotyping errors, two independent authors (DZ and JYuan) filtered the primary GWAS data set and removed individuals with a call rate < 95%, minor allele frequency < 0.01, and deviating from the Hardy-Weinberg equilibrium (HWE) test (P < 0.001). During data extraction, discussion with a third author (YW) helped resolve the discrepancies, with consensus on each item reached in the end. After extracting data from the original papers and contacting the corresponding authors, we ruled out the studies without details as needed.
During the second stage, PBA algorithm was employed to annotate biological pathways of selected SNPs by integrating data from four databases, including BioCarta (http://www.biocarta.com), MsiDB (http:// www.broadinstitute.org/gsea/msigdb), Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www. genome.jp/kegg) and gene ontology (GO, http://www. geneontology.org). Furthermore, SNP label normalization and permutation were adopted to correct gene variations and generate the distribution of significant proportion based enrichment score (SPES).According to the distributions of SPESs, a nominal P-value and a FDR (false discovery rate; cutoff value: 0.05) were calculated.

Statistical analysis
The expression levels were shown as meanAESEM, and the difference between two genotypes was evaluated by two-side Student's t test. Furthermore, one way ANOVA was utilized to assess the difference of transcript expression levels in more than two genotypes. The statistical analysis was performed with SPSS version 21.0. P values < 0.05 were considered statistically significant.

Characteristics of the study population
One GWAS drawn from NCBI dbGap (study accession: phs000007) was finally adopted in our study [16] with publicly available summary data after a thorough search. In the GWAS on circadian phenotypes (including usual weekday bedtime and usual weekday sleep duration), totally 749 subjects were collected from the Framingham Offspring Study containing 2848 participants who accomplished sleep habit questionnaires between 1995 and 1998 (Offspring Examination Cycle 6) for the Sleep Heart Health Study [30] . For usual weekday bedtime, 65,514 candidate causative SNPs were originally generated with an Affymetrix 100K SNP Gene Chip, and afterwards 47,285 SNPs passed the quality control filters which were employed for ultimate bioinformatics analysis. Besides, for usual weekday sleep duration, 65,514 SNPs were generated with the gene chip, while 47,301 SNPs met the quality control criterions and were then applied for subsequent analysis.
Then, we examined the roles of different genotypes in mRNA expression levels via HapMap c-DNA expression database which was publicly available. No significant association between all SNPs with the mRNA expressions of corresponding genes was found in Caucasians as presented in Table 2. However, the SLC17A1 rs13213957 polymorphisms might tend to affect the mRNA expression levels of SLC17A1 (with marginal P value = 0.0785), which is consistent with the functional class indicated in Table 1. In addition, the functions of the corresponding proteins were examined, which demonstrated that all SNPs could cause residue change except for HSPD1 rs8539, summarized in Table 3. In addition, MT-ND5 rs10517616 was not  ). blog 10 (P) for candidate SNP in the original genome wide association study (GWAS). clog 10 (P) for the SNP in the original GWAS which was in LD with candidate SNP. estimated here because no data was available publicly. During the ICSNPathway analysis, six pathways about usual weekday bedtime were detected and are summarized in Table 4. The first mechanism involved MT-ND5 rs10517616 polymorphism (nonsynonymous coding) in pathways such as NADH dehydrogenase activity (nominal P < 0.001, FDR = 0.011), respiratory electron transport chain (nominal P = 0.001, FDR = 0.011), oxidoreductase activity (nominal P = 0.002, FDR = 0.017), and oxidative phosphorylation (nominal P = 0.004, FDR = 0.047). The second was GRSF1 rs3775728 polymorphism (nonsynonymous coding) in mRNA binding pathway (nominal P < 0.001, FDR = 0.014). The third one included ENAM rs7671281, rs3796704 polymorphisms (nonsynonymous coding) in pathway of biomineral formation (nominal P < 0.001, FDR = 0.021).

Discussion
A compound molecular network may make a significant contribution to the development of circadian phenotypes, containing several cellular pathways [31] . GWASs are limited to detect single SNP associations and identify new loci, so we applied a pathway-based pattern to take the biological interplay between multiple genes into consideration, and propose novel views into how genes might help the development of circadian phenotypes [32] .
In this study, we applied ICSNPathway analysis to identify six potential regulating mechanisms, respectively, in usual weekday bedtime and sleep duration.
The most significant SNP-to-gene-to-effect hypothesis was that rs10517616 changes the feature of MT-ND5 in NADH dehydrogenase activity [33] . It was reported that NADH promoted the transcription of the lactate dehydrogenase (LDH) gene under redox state. This is based on the activation of E-box by binding heterodimer Bmal1/NPAS2, the master brain clock to regulate circadian rhythmicity [34] . The second candidate gene GRSF1 found in this study and previous studies has been implied in the pathway of mRNA binding through SNP rs3775728 [35][36] . The third biological mechanism involves the modulation of ENAM by rs7671281 and rs3796704 to affect its role in mineral formation [37][38] . The forth one involves the influence of rs8539 on HSPD1 in unfolded protein binding [39] . The fifth involves the modulation of APOBEC2 by rs2076472 to affect mRNA processing. The sixth involves the modulation of TTN by rs9808377, rs1001238, rs2042995, rs3829746, and rs2042996 as well as CENPE by rs2243682 and rs2615542 to influence its role in cell cycle [40] . The seventh involves the modulation of SLC17A1 by rs13213957 to affect anion transport [41][42] , which could influence the mRNA expression of SLC17A1. As far as we know, these mechanisms of circadian phenotypes, including MT-ND5, GRSF1, ENAM, HSPD1, APOBEC2, TTN, CENPE and SLC17A1, have been firstly identified in our study. The ICSNPathway analysis has been conducted to identify candidate causal genes relevant to disease-related phenotypes such as rheumatoid arthritis [20] . Thus, the results received in our study might help the development of novel hypotheses for the further investigations.
Even though the abovementioned biological mechanisms may affect circadian phenotypes, several limitations should be acknowledged. Firstly, the data was obtained from only 749 subjects [16] , which may limit the application to the whole populations and weaken the authority to identify the candidate SNPs. Secondly, with no study supplying strong supports for these results, the candidate SNP-gene-pathways should be verified in more studies.