Unveiling Subtle Geographical Clines: Phenotypic Effects and Dynamics of Circadian Clock Gene Polymorphisms

Simple Summary Circadian clocks are molecular pacemakers that drive daily rhythms in physiology, metabolism, and behavior. They are present in a diverse range of organisms, from cyanobacteria to humans. In the last few decades, enormous progress has been made in understanding the molecular details of the circadian clock. The fruit-fly Drosophila has been instrumental in identifying ‘clock’ genes, which are well conserved in mammals, both in sequence and function. Many ‘clock’ genes have been identified using laboratory mutagenesis screens, which look for mutants that show aberrant circadian behavior. The recent accumulation of genomic sequences from wild populations of Drosophila provides an opportunity to identify natural genetic variations in ‘clock’ genes, which serve as molecular adaptations to local habitats. Here, we focused on nine polymorphic sites in genes associated with the clock and tested their impact on diurnal behavior. We found seven sites whose alleles confer different circadian behavior or seasonal response. This analysis of natural variation can provide information about fine-tuning of the clock, and it is essential for understanding the evolution of the circadian system. Abstract Our understanding of the gene regulatory network that constitutes the circadian clock has greatly increased in recent decades, notably due to the use of Drosophila as a model system. In contrast, the analysis of natural genetic variation that enables the robust function of the clock under a broad range of environments has developed more slowly. In the current study, we analyzed comprehensive genome sequencing data from wild European populations of Drosophila, which were densely sampled through time and space. We identified hundreds of single nucleotide polymorphisms (SNPs) in nine genes associated with the clock, 276 of which exhibited a latitudinal cline in their allele frequencies. While the effect sizes of these clinal patterns were small, indicating subtle adaptations driven by natural selection, they provided important insights into the genetic dynamics of circadian rhythms in natural populations. We selected nine SNPs in different genes and assessed their impact on circadian and seasonal phenotypes by reconstructing outbred populations fixed for either of the SNP alleles, from inbred DGRP strains. The circadian free-running period of the locomotor activity rhythm was affected by an SNP in doubletime (dbt) and eyes absent (Eya). The SNPs in Clock (Clk), Shaggy (Sgg), period (per), and timeless (tim) affected the acrophase. The alleles of the SNP in Eya conferred different levels of diapause and the chill coma recovery response.


Introduction
The circadian clock is an evolutionarily conserved system, which is present in a broad range of organisms, from cyanobacteria to humans. The circadian clock drives a large number of physiological and cellular diurnal rhythms. The ubiquity of the circadian clock A functional molecular polymorphism was also found in cryptochrome (cry), a gene that in Drosophila encodes a blue light circadian-dedicated photoreceptor. Cry also serves as an electromagnetic field sensor and as a geotaxis behavior regulator. A recent survey of wild alleles of cry [12] revealed a substantial polymorphism, including two major haplogroups, of which one is a 12 bp insertion/deletion polymorphism (B1/B2 alleles). Across Europe, the frequency of B1/B2 alleles correlated with seasonal bioclimatic variables. This spatial distribution of cry polymorphisms corroborates the importance of these haplotypes for local adaptation of the clock.
Until recently, studying genetic variation at the genomic level in natural populations was challenging due to the cost of sequencing whole genomes in a large number of individuals. However, the development of new sequencing strategies, such as sequencing DNA from pools of individuals (Pool-Seq [13]) made genome-wide population genetic analysis feasible and paved the way for a new field of population genomics. In a recent study, Kapun et al. [14] carried out the first comprehensive, continent-wide genomic analysis of genetic variation in European populations of Drosophila melanogaster. The analysis consisted of 48 Pool-seq samples from 32 population samples collected in 2014 by the European Drosophila Population Genomics Consortium (DrosEU) [14]. Each pool consisted of 40 flies.
The current study was aimed at analyzing the genetic variation that is encapsulated in the DrosEU dataset and identifying polymorphisms that serve as molecular adaptations of the circadian clock. We identified single nucleotide polymorphisms (SNPs) that show latitudinal clines and tested their functional role. We analyzed eight SNPs in the core clock genes per, tim, Clk, Dbt, and Sgg and in the output genes pdf and pdfr. In addition, we analyzed a clinal SNP in the gene eyes absent (eya), which was recently shown to drive the photoperiodic diapause in Drosophila through interaction with TIM [15].

Fly Maintenance
Flies were maintained using standard sugar food at 20 • C under a 12:12 light:dark (LD) cycle. The flies were kept in population cages, which were constructed from 1.5 litter plastic bottles. A 50 mL polypropylene tube containing 30 mL of food was attached to the opening of the bottle. The base of the bottle was made of wire mesh for ventilation of the cage. Sixty-four flies were transferred to each cage using CO 2 anesthesia (see details below).

Identification of Clinal SNPs
The number of reads supporting each allele at a specific SNP locus in each Pool-seq library was previously counted [14], and the relative abundance of each allele within the pool was then estimated based on the read counts (each pool consisted 40 flies). To test the effect of latitude on the allele frequency of a given SNP, we used a generalized linear model (GLM) with binomial error structure and logit function. The analysis was performed on the raw allele counts. The SNPs in each gene were tested separately, and the significance values were adjusted for multiple testing using the Benjamini-Hochberg method [16]. McFadden's pseudo-R 2 measure was used for estimating the variance explained with the logistic regression models. We also used extended models that included 'longitude' and 'altitude' variables in addition to 'latitude'. The simple ('latitude' only) and extended models were compared using a χ 2 difference test.

Experimental Populations
To test the effects of candidate alleles at a particular SNP on the phenotype, we used the Mendelian randomization (MR) approach [17]. By crossing different fly strains that have the same allele, new strains were generated whose genetic background was completely mixed (randomized). The phenotype of MR strains carrying different alleles of the SNP was then compared. If a difference was found, it was likely due to the SNP, and not due to other loci in the genome. The MR populations were constructed from eight different parental strains, which ensured that the probability of having other loci fixed between the populations was extremely low.
To establish the MR populations, we used strains from the Drosophila Genetic Reference Panel (DGRP), which is a community resource of 205 inbred lines derived from a wild population in Raleigh, USA [18]. The genome of these strains has been sequenced, identifying 4,853,802 (SNPs) and 1,296,080 insertions or deletions (indel) variants. We searched for candidate SNPs using the DGRP2 genome browser [19]. For each SNP, four independent populations were generated, of which two carried the reference allele (called set A and set B) and two sets fixed for the alternate allele (set C and set D). Each set was constructed of eight different DGRP strains, using four females and four males from each.

Locomotor Activity Analysis
The locomotor activity behavior of male offspring from each of the MR populations was analyzed using Drosophila Activity Monitors (TriKinetics system, Waltham, MA, USA) as previously described [20]. Adhering to previous studies, only male flies were used since the egg-laying activity of females may mask the circadian system output [21].
Flies were held at 20 • C for 8 days under LD 12:12 h, followed by 8 days of continuous darkness (DD). The data were analyzed using the TriKinetics software and the BioDare2 web server [22]. The free-running period (FRP) and the acrophase (phase in DD) were calculated using the MESA method, where the averaged phase was determined from the wave fit. We excluded flies with an FRP (free-running period) shorter than 20 h and longer than 30 h from the analysis. This filtering was performed because these flies did not exhibit a robust circadian rhythm and were flagged using the analysis software as potential outliers. In total, 90 flies were excluded from the analysis out of the initial sample size of 861. The statistical analysis of the FRP was carried out using a mixed model analysis of variance (ANOVA) with 'Allele' and 'Set' as the fixed and random effects, respectively. The analysis of the phase data was carried out using circular ANOVA with the 'aov.circular' R package (using the LRT method).

Diapause Response
To analyze ovarian diapause, females were collected within 7 h after eclosion, a period during which ovaries are previtellogenic [18]. Diapause was induced by placing the flies in plastic vials at 12 • C for two weeks for a long photoperiod (16 h) and a short one (8 h). A group of control flies, in which the vials were covered with aluminum foil, was also included. After two weeks, the female ovaries were dissected in Dulbecco's phosphatebuffered saline (PBS), and the diapause status of the female was determined as previously described [23].

Chill Coma Recovery
Emerging flies were maintained at LD 12:12 for 3-4 d and were tested for their chill coma recovery time as previously described [24]. Briefly, at ZT 3 (ZT, Zeitgeber time, hours after lights on), the flies were anesthetized using ice and then transferred to tubes, and cotton plugs were used to cover the tube. The flies were kept on ice at 4 • C for 3 h. At ZT 6, the flies were transferred to a small Petri dish at room temperature, their behavior was videotaped, and the recovery time of each fly was determined. Data analysis was carried out using the Survival R library to fit Kaplan-Meier curves, and log-rank tests were used to compare the CCR of different genotypes using χ 2 statistics with one degree of freedom.

Identifying Clinal SNPs
Analysis of the 11 genes associated with the circadian clock revealed 9776 SNPs, of which 276 showed a significant latitudinal cline ( Table 1). The most polymorphic genes were Pdf (0.46 SNP/bp) and cry (0.15 SNP/bp), while sgg and Pdfr showed the lowest density (both 0.02). The largest number of SNPs that were clinal were present in dbt and vri, while sgg and per harbored the smallest number of clinal SNPs. We selected seven SNPs in various clock genes that showed a significant latitudinal cline for further experimental analysis ( Table 2). Two other SNPs in Clk and eya showed a strong correlation with bioclimatic variables in the DrosEU study [10] (but not a latitudinal cline) and were also selected for further analysis. The latitudinal cline in these SNPs is shown in (Figure 1). We also tested the effect of longitude and latitude on the allele distribution using an extended model. The difference between the extended and null (latitude only) models was not significant in Dbt569TG > T (p = 0.22), Pdfr7165A > T (p = 0.30), tim4740T > A (p = 0.22), or Clk5472G > T (p = 0.95). In sgg3812A > G, sgg3833C > A, and per7600A > T, the contribution of longitude and altitude was highly significant (p < 0.001, for both) and dominated latitude (Supplementary Figure S1).   Previous studies have shown that 19 climatic variables can be summarized using two principal components, PC1 and PC2 [14]. PC1 strongly correlates with latitude, whereas PC2 correlates with longitude to a lesser extent, as well as with latitude and altitude. We conducted a GLM analysis to assess the association between SNP alleles and climatic PC1 and PC2, as presented in Supplementary Table S1. In SNPs sgg3812A > G and sgg3833C > A, there was a highly significant association with both PCs (McFadden's R 2 = 0.16 and 0.11, respectively). In the case of Pdfr7165A > T and Clk5472G > T, a significant association was observed only with PC1 (McFadden's R 2 = 0.16 and 0.08, respectively). For per7600A > T, only PC2 exhibited a significant association (McFadden's R 2 = 0.34). No significant association was found for eya1849A > G or Dbt569G > T.

Association between SNPs and Circadian Phenotypes
The selected SNPs were assessed for their impact on circadian phenotypes (note, in the following, we designate each SNP using the last four digits of the genomic position). For each SNP, we compared the circadian free-running period (FRP) between the MR populations that carry different alleles. Two SNPs were significantly associated with FRP variation (Figure 2). In eya1849A > G, the median FRP for the "A" allele (across two sets) was nearly one hour shorter than that of the "G" allele (23.2 vs. 24 h), and the difference was significant (nested ANOVA, F 1,79 = 9.91, p = 0.002). In dbt6569G > T, the median FRP for the "G" allele was nearly 20 min longer (F 1,54 = 7.5, p = 0.008).  or the alternate allele (purple). In each SNP, the reference allele is the first nucleotide in the SNP ID (i.e., in Eya1849A > G, "A" is the reference allele). The FRP is depicted using a boxplot, where the middle line represents the median, and the box indicates the interquartile range (IQR). The FRP of individual flies (blue dots) is also shown. The error bars represent the 1.5 × IQR interval. ** p < 0.01, *** p < 0.001.

Polymorphism in Eya Is Associated with Seasonal Phenotypes
While not recognized as a circadian clock protein, EYA has been implicated in photoperiodism and seasonal timing both in mammals [25,26] and in fruit flies [15]. Therefore, we examined whether the eya1849A > G polymorphism is associated with variation in seasonal response phenotypes. This SNP was previously shown to correlate with a bioclimatic compound variable but not a latitudinal cline [14]. However, GLM analysis revealed significant longitudinal (z = 3.7 p < 0.001) and altitudinal (z = 2.8 p < 0.005) clines. The frequency of reference allele "A" was higher in the east (where the climate is more continental) and in higher altitudes (Supplementary Figure S1).
Under both short and long photoperiods, developmental arrest of ovaries was significantly higher in females carrying the allele 'G' than in females with the 'A' allele (median across photoperiod 40% vs. 25%; Figure 4a). The difference was highly significant (binomial general linear model, z = −3.78, p < 0.0001).

Polymorphism in Eya Is Associated with Seasonal Phenotypes
While not recognized as a circadian clock protein, EYA has been implicated in photoperiodism and seasonal timing both in mammals [25,26] and in fruit flies [15]. Therefore, we examined whether the eya1849A > G polymorphism is associated with variation in seasonal response phenotypes. This SNP was previously shown to correlate with a bioclimatic compound variable but not a latitudinal cline [14]. However, GLM analysis revealed significant longitudinal (z = 3.7 p < 0.001) and altitudinal (z = 2.8 p < 0.005) clines. The frequency of reference allele "A" was higher in the east (where the climate is more continental) and in higher altitudes (Supplementary Figure S1).
Under both short and long photoperiods, developmental arrest of ovaries was significantly higher in females carrying the allele 'G' than in females with the 'A' allele (median across photoperiod 40% vs. 25%; Figure 4a). The difference was highly significant (binomial general linear model, z = −3.78, p < 0.0001).
We also tested the association between the eya1849A > G polymorphism and variation in the cold response using the chill coma recovery assay (Figure 4b). Flies carrying allele 'A' were more cold-adapted and recovered significantly faster than flies with the 'G' allele (median 177 vs. 243 s; log-rank test, χ 2 = 4.5, df = 1, p = 0.03).

Discussion
One of the main goals of analyzing genetic variation in wild populations is the potential for identifying local molecular adaptations that are driven by natural selection. Here, we identified 276 SNPs in genes associated with the circadian clock, whose spatial allelic distribution follows a latitudinal cline. We tested the functional role of nine SNPs and found phenotypic effects in seven of them.
In the SNP in dbt, the reference allele "G" is present at a higher frequency at higher latitudes. The FRP of flies carrying this allele is longer (Figure 2). Such correlation between FRP and latitude was previously observed in the eclosion rhythm of Drosophila auraria [27] and in the leaf movement rhythm in Arabidopsis thaliana [28]. It was suggested that longer We also tested the association between the eya1849A > G polymorphism and variation in the cold response using the chill coma recovery assay (Figure 4b). Flies carrying allele 'A' were more cold-adapted and recovered significantly faster than flies with the 'G' allele (median 177 vs. 243 s; log-rank test, χ 2 = 4.5, df = 1, p = 0.03).

Discussion
One of the main goals of analyzing genetic variation in wild populations is the potential for identifying local molecular adaptations that are driven by natural selection. Here, we identified 276 SNPs in genes associated with the circadian clock, whose spatial allelic distribution follows a latitudinal cline. We tested the functional role of nine SNPs and found phenotypic effects in seven of them.
In the SNP in dbt, the reference allele "G" is present at a higher frequency at higher latitudes. The FRP of flies carrying this allele is longer (Figure 2). Such correlation between FRP and latitude was previously observed in the eclosion rhythm of Drosophila auraria [27] and in the leaf movement rhythm in Arabidopsis thaliana [28]. It was suggested that longer FRP in the north allows the pacemaker to track the considerable variation in photoperiod that is experienced throughout the year in these regions.
Principal component analysis of 19 bio-climatic variables across Europe revealed two principal components that accounted for 63% of the variation [14]. The allele frequencies of the eya1849A > G polymorphism were associated with the second principal component (PC2), which represented mostly seasonal variables (e.g., isothermality, temperature seasonality). Geographically, PC2 is highly correlated with longitude, reflecting the fact that seasonal fluctuations intensify in Eastern Europe. Notably, we found that the eya polymorphism is correlated with longitude, with the "A" allele being more frequent in the east. This allele was also more abundant at higher altitudes. The enhanced cold resistance of this allele, as demonstrated using the CCR (Figure 4), is congruent with its geographical distribution. Intriguingly, the "A" allele confers lower diapause propensity compared to females carrying the "G" allele. Previous studies showed that the CCR and diapause phenotypes are often not correlated and may represent different adaptation mechanisms [24,29]. The SNP in eya was also associated with the FRP (Figure 2), but given the longitudinal cline in this polymorphism, its adaptive role is unlikely to be related to the photoperiod and awaits further investigation.
We also found a longitudinal cline in the two SNPs in sgg and in per, which surpassed their latitudinal cline. Given the correlation between longitude and seasonal bio-climatic variables in Europe, we expect that alleles sgg3812A, sgg3833C, and per7600T, which are more abundant in the east, would be associated with cold adaptations. Interestingly, previous artificial selection experiments using Drosophila identified a quantitative trait locus in sgg that was associated with increased thermal performance and exhibited a clinal variation in wild populations [30]. Overall, the longitudinal clines that we identified underscore the importance of this climatic gradient in the Eurasian Palearctic biogeographic zone (longer and colder season in the East compared to the West), which is frequently overlooked compared to latitudinal clines [31].
We acknowledge that while our study found statistically significant clinal SNPs, the coefficient of determination (R2) value associated with their effect was low, indicating that the models do not explain much of the variability in the data. This suggests that there are likely other factors at play that we did not measure, or that the spatial distribution of the alleles being studied is shaped with more complex intricate processes. Despite these limitations, it is important to consider the broader implications of our findings. The small clinal effects observed across numerous wild populations of Drosophila in Europe may appear subtle, but they contribute valuable insights into the dynamics of genetic variation and local adaptation. By conducting the first comprehensive survey of single nucleotide polymorphisms (SNPs) in circadian clock genes across multiple populations, our study reveals a nuanced landscape of genetic diversity and adaptation. Furthermore, the identification of associations between SNP alleles and climatic principal components expands our understanding of the complex interplay between genetic variation and environmental factors. Thus, despite the subtlety in the geographical clines and the low R2 values, our study significantly advances the field by providing a comprehensive dataset and shedding light on the genetic dynamics of circadian rhythms in natural populations. We believe that our work makes a valuable contribution by identifying several alleles that mediate phenotypic variation in circadian behavior and seasonal response.
To understand the mechanisms by which these SNPs mediate their phenotypic effects, further experiments will be required, looking at the expression of the different variants at the transcript and protein levels. The phenotypic effect of different alleles may also be manifested by affecting the interaction between their encoded proteins and other clock proteins. For example, the ls-tim polymorphism expresses two variants, S-TIM, which binds robustly to CRY, and L-TIM, which exhibits a much weaker interaction [32]. The difference between allelic variants could also affect the transport of proteins between the nucleus and the cytoplasm, which is a significant part of the molecular pacemaker. For example, when flies in the laboratory are maintained under continuous light and temperature diurnal cycles, S-TIM remains cytoplasmic constantly, while L-TIM shows typical nuclear accumulation at ZT0 [11].
Most of the SNPs that were tested here were intronic, alluding to their potential role in gene regulation. Indeed, studies in recent years suggest that variation in noncoding sequences may have an important regulatory role [13]. Introns harbor numerous sites that constitute a functional element, including intron splice enhancers, silencers that regulate alternative splicing, and other regulatory elements. Functional polymorphism within introns can affect gene expression, and intronic variants may modulate the link between genotype and phenotype. Mutations in non-coding DNA may affect remote genes at a distance by altering the splicing efficiency of genes or their transcriptional activity or modulating the expression of the alternative transcripts. Thus, the 'clock' SNPs that we found to impact circadian phenotype in this study may mediate their effect via transcriptional regulation.
Several genomic surveys of wild Drosophila populations have been carried out in the last decade, and notably, 'clock' genes are often present among loci that show substantial genetic differentiation or clinal variation. Polymorphisms in tim, timout, Clk, and cry, which were associated with bioclimatic principle components in European populations [14], were also found to be clinal in a survey focusing on North American Drosophila populations [33]. A polymorphism in sgg was flagged as being associated with climatic variables both in Europe [14] and in another North American survey [34]. Consistently, whole-genome scans for adaptive differentiation identified sgg as a significant candidate gene in both European and North American datasets. The fact that the same polymorphisms are clinal on different continents may further indicate their adaptive role. It is noteworthy that all the SNPs that were studied here are segregated in appreciable frequencies in both Europe and North America, as evident from the DGRP strains that were used to generate the MR population in this study. Overall, these studies underscore 'clock' genes as a major target for natural selection, which allows rapid adaptation to different climatic conditions. The immense advance in DNA sequencing technologies in recent years generated an unprecedented amount of data at the genetic level, including genomic sequences and genetic variation, but the functional importance of this variation is largely unknown. A major challenge in current research is to bridge the gap between genomic data and their phenotypic effect by developing high-throughput phenotyping techniques, setting the stage for the new field of 'phenomics' [35]. Understandably, the present phenomics screen focus on cellular, morphological phenotypes that are more amenable to high-throughput recording. The challenge would be high-throughput phenotyping of complex phenotypes such as behavior in whole animals. While recording the locomotor activity of a large number of individuals is feasible using Drosophila, our experience from the current work suggests that the main 'bottleneck' is the generation of congenic strains that are fixed for different SNP alleles. This process, either carried out using the Mendelian randomization approach or transgenics, is laborious, costly, and time-consuming. Hopefully, new CRISPR transgenesis protocols (e.g., [36]) will help bridge the genomics-phenomics gap.

Conclusions
We identified a large number of SNPs in circadian clock genes that segregate in European populations and exhibit latitudinal clines. These polymorphisms may represent local molecular adaptions of the circadian system. We tested the functional role of nine SNPs and found phenotypic effects in seven of them. This study adds to previous research toward the understanding of how the clock is fine-tuned to perform in different environments and molecular evolution of the clock genes, an area in which relatively little is known about.