Contribution of trans regulatory eQTL to cryptic genetic variation in C. elegans

Background Cryptic genetic variation (CGV) is the hidden genetic variation that can be unlocked by perturbing normal conditions. CGV can drive the emergence of novel complex phenotypes through changes in gene expression. Although our theoretical understanding of CGV has thoroughly increased over the past decade, insight into polymorphic gene expression regulation underlying CGV is scarce. Here we investigated the transcriptional architecture of CGV in response to rapid temperature changes in the nematode Caenorhabditis elegans. We analyzed regulatory variation in gene expression (and mapped eQTL) across the course of a heat stress and recovery response in a recombinant inbred population. Results We measured gene expression over three temperature treatments: i) control, ii) heat stress, and iii) recovery from heat stress. Compared to control, exposure to heat stress affected the transcription of 3305 genes, whereas 942 were affected in recovering animals. These affected genes were mainly involved in metabolism and reproduction. The gene expression pattern in recovering animals resembled both the control and the heat-stress treatment. We mapped eQTL using the genetic variation of the recombinant inbred population and detected 2626 genes with an eQTL in the heat-stress treatment, 1797 in the control, and 1880 in the recovery. The cis-eQTL were highly shared across treatments. A considerable fraction of the trans-eQTL (40–57%) mapped to 19 treatment specific trans-bands. In contrast to cis-eQTL, trans-eQTL were highly environment specific and thus cryptic. Approximately 67% of the trans-eQTL were only induced in a single treatment, with heat-stress showing the most unique trans-eQTL. Conclusions These results illustrate the highly dynamic pattern of CGV across three different environmental conditions that can be evoked by a stress response over a relatively short time-span (2 h) and that CGV is mainly determined by response related trans regulatory eQTL. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3899-8) contains supplementary material, which is available to authorized users.

polymorphic gene expression regulation underlying CGV is scarce. Here we investigated the transcriptional 28 architecture of CGV in response to rapid temperature changes in the nematode Caenorhabditis elegans. We 29 analyzed regulatory variation in gene expression (and mapped eQTL) across the course of a heat stress and 30 recovery response in a recombinant inbred population. 31 determined by response related trans regulatory eQTL. 48 49 Background 50 Many organisms can respond to sudden changes in the ambient environmental conditions by adjusting their gene 51 expression levels [1]. In particular, invertebrates are prone to environment-induced rapid gene-expression 52 changes. For instance, gene expression in the nematode Caenorhabditis elegans can swiftly change due to 53 exposure to pathogens, temperature, and toxicants [2][3][4][5][6]. A common denominator of many of these studies is that 54 they provide snapshots in time of the responses elicited by these environments. As such, they provide static 55 profiles of gene expression at a given moment. Further insight into the dynamics of gene expression and gene-56 expression regulation can be achieved by following responses, such as development or aging, over time [7][8][9]. 57 The C. elegans (V2) Gene Expression Microarray 4X44K slides, manufactured by Agilent were used. Before 148 starting cDNA synthesis, quality and quantity of the RNA were measured using the NanoDrop-1000 149 spectrophotometer (Thermo Scientific, Wilmington DE, USA) and RNA integrity was determined by agarose gel 150 electrophoresis (3 μ L of sample RNA on 1% agarose gel). 151 152

Data extraction and normalization 153
The microarrays were scanned by an Agilent High Resolution C Scanner with the recommended settings. The 154 data was extracted with Agilent Feature Extraction Software (version 10.5), following manufacturers' guidelines. 155 Normalization of the data was executed in two parts, first the RILs and the ILs, second the mutant strains. For 156 normalization, "R" (version 3.3.1 x 64) with the Limma package was used. The data was not background 157 corrected before normalization (as recommended by [25]). Within-array normalization was done with the Loess 158 method and between-array normalization was done with the Quantile method [26]. The obtained single channel 159 normalized intensities were log2 transformed and used for further analysis. 160 161

Environmental responses 162
The transcriptional response to heat stress was determined by explaining the gene expression over the treatment 163 with a linear model, 164 where y is the log2-normalized intensity as measured by microarray of spot i (i = 1, 2, ..., 45220), and T is the 166 treatment (either control, heat stress, or recovery from heat stress). This analysis ignored genotype. 167 The significances were corrected for multiple testing by applying the Benjamini Yekutieli method in 168 p.adjust (R, version 3.3.1 Windows x64) at FDR = 0.05 [27]. Thresholds of -log10(p) Due to the setup of our experiment, potential variation in development could exist among the RILs and the 174 treatments. The recovery animals were sampled two hours later than the control and heat-stress animals, 175 furthermore, heat stress slows the developmental rate [19]. We estimated the relative age by using a set of ~100 genes that show a strong, positive, linear response during development [9]. By setting the average age of the 177 control RILs to 48 hours we could estimate and compare the RILs in all treatments (Additional file 8). 178 179 Principal component analysis 180 A principal component analysis was conducted on the gene-expression data of the RILs over the three 181 treatments. For this purpose, the data was transformed to a log2 ratio with the mean, using 182 where R is the log2 relative expression of spot i (i = 1, 2, ..., 45220) in strain j (RIL) over all three conditions (n 184 = 48 per condition), and y is the intensity (not the log2-transformed intensity) of spot i in strain j. 185 The where y is the log2-normalized intensity as measured by microarray of spot i (i = 1, 2, ..., 45220) of RIL j. This 194 is explained over the genotype (either CB4856 or N2) on marker location x (x = 1, 2, ..., 729) of RIL j. 195 The genome-wide significance threshold was determined via permutation, where the log2-normalized 196 intensities were randomly distributed per gene over the genotypes. The randomized data was tested using the 197 same model as for the eQTL mapping. This was repeated for ten randomized datasets. A false discovery rate was 198 used to determine the threshold (as recommended for multiple 199 where FDS is the outcome of the permutations and RDS is the outcome of the eQTL mapping at a specific 202 significance level. The value of m 0 , the number of true null hypotheses tested, was 45220-RDS, and for the value 203 of m, the number of hypotheses tested, the number of spots (45220) was taken. The q-value was set at 0.05. This 204 yielded a threshold of -log10(p) > 3.9 for the control, -log10(p) > 3.5 for the heat stress, and-log10(p) > 3.9 for 205 the recovery treatment. For the analyses we used the most conservative thresholds measured, -log10(p) > 3.9, for 206 all the sets. 207 208

Statistical power calculations 209
In order to determine the statistical power at the set FDR threshold, QTL were simulated using the genetic map 210 of the strains used per condition (n=48 per condition). For each marker location, ten QTL were simulated that 211 explained 20-80% of the variation (in increments of 5%). Random variation was introduced based on a normal 212 distribution with sigma = 1 and mu = 0 and a peak of the corresponding size (e.g. a peak size of 1 corresponds to 213 20% explained variation) was simulated in this random variation. From the simulation, the number of correctly 214 detected QTL, the number of false positives and the number of undetected QTL were counted. This was based 215 on the thresholds determined in the permutations, -log10(p) > 3.9. Furthermore, the precision of the effect-size 216 estimation and the precision of the QTL location (based on a -log10(p) drop of 1.5 compared to the peak) were 217 determined. A table summarizing the results can be found in Additional file 9. 218 219 eQTL analysis 220 The distinction between cis-and trans-eQTL was made on the distance between the physical location of the gene 221 and the location of the eQTL-peak. For cis-eQTL the gene lies within 1 Mb of the peak or within the confidence 222 interval of the eQTL. The confidence interval was based on a -log10(p) drop of 1.5 compared to the peak. 223 The amount of variation explained per microarray spot with an eQTL was calculated by ANOVA, by 224 analysis of the gene expression explained over the peak-marker. For spots with multiple peaks, this analysis was 225 conducted per peak, not using a full model, since a single-marker model was used in the analysis. 226 In order to identify trans-bands (an enrichment of trans-eQTL), a Poisson distribution of the mapped 227 trans-eQTL was assumed (as in [28]). Therefore the number of trans-eQTL per 0.5 Mb bin were counted. Since 228 trans-eQTL peaks were mapped to 107, 106, and 103 bins (respectively in control, heat stress, and recovery), it 229 was expected that 9.16, 20.64, and 9.01 spots with a trans-eQTL were to be found at each of these markers. 230 Based on a Poisson distribution, it was calculated how many trans-eQTL needed to be found to represent an 231 overrepresentation. For example, for p < 0.001 there should be 20, 36, or 20 spots with a trans-eQTL at a 232 specific marker (respectively in control, heat stress, and recovery). 233 To test for polymorphisms in genes with eQTL, we used the data from the CB4856 reference genome 234 [22]. The genes with eQTL were matched to the polymorphisms. The frequencies of polymorphisms in each of 235 the groups (genes with cis-eQTL, genes with trans-eQTL, and genes without eQTL) were counted and compared 236 versus each other by a chi-squared test in "R" (version 3.3.1, x64). 237 238

Detection of eQTL across treatments 239
Two criteria were used to detect the occurrence of eQTL over multiple treatments. 240 In the first criterion, it was tested whether or not an eQTL was mapped in treatment one versus 241 treatment two, by simply comparing the tables listing the eQTL. This allowed for comparison of the actual 242 mapped peaks and for comparison of eQTL effects of trans-eQTL regulated from different loci. In order to 243 estimate the false-discovery rate associated with this comparison, the same analysis was applied to ten 244 permutated datasets per condition, using the -log10(p) > 3.9 for eQTL discovery. 245 The second criterion compared the occurrence of eQTL at the exact same marker location. In this 246 comparison, the eQTL mapped in one treatment were taken as lead for the occurrence of the same eQTL in the 247 other two treatments. This comparison allowed for direct comparison of the eQTL effect at the locus. Based on 248 observations on the effect distribution, this approach was used to estimate the number of trans-eQTL not 249 detected due to statistical power or not detected due to absence of the eQTL in a treatment (see also text in 250 Additional file 15). 251

Functional enrichment analysis 253
Gene group enrichment analysis was done using a hypergeometric test and several databases with annotations. 254 The databases used were: the WS220 gene class annotations, the WS256 GO-annotation, anatomy terms, 255 phenotypes, RNAi phenotypes, developmental stage expression, and disease related genes (www.wormbase.org) 256 [29]; the MODENCODE release 32 transcription factor binding sites (www.modencode.org) [30,31], which 257 were mapped to transcription start sites (according to [32]); and the KEGG pathway release 65.0 (Kyoto 258 Encyclopedia of Genes and Genomes, www.genome.jp/kegg/) [33]. 259 Enrichments were selected based on the following criteria: size of the category n>3, size of the overlap 260 n>2. The overlap was tested using a hypergeometric test, of which the p-values were corrected for multiple 261 testing using Bonferroni correction (as provided by p.adjust in R, 3.3.1, x64). Enrichments were calculated based 262 on gene names, not on spots. 263

Results 265
Transcriptional response over the course of heat stress 266 To better understand the transcriptional response to heat stress, we obtained the transcriptomes of 48 267 recombinant inbred lines (RILs) at the L4 stage in each of three treatments: control, heat stress, and recovery 268 from heat stress ( Figure 1A). The effects of the treatments on gene-expression levels were analyzed using a 269 linear model for pairwise comparisons between each of the conditions (see volcano plots in Additional file 4 and 270 a list of affected spots in Additional file 5). In this way, we identified 7720 differentially expressed genes over 271 the course of the three treatments (FDR = 0.05; Figure 1B). We found that both control and heat stress had many 272 unique differentially expressed genes: 2321 genes were only differently expressed in the comparisons of the 273 control treatment to the other two treatments and 3305 genes were only differently expressed in the comparisons 274 of the heat-stress treatment with the other two treatments. In the comparisons with the recovery treatment, only 275 942 genes were found unique for that treatment. Furthermore, many differentially expressed genes were shared 276 in the comparison of the recovery treatment versus the control and heat-stress treatment (2251 genes). Again, the 277 control and recovery (1152) and heat stress and recovery (1189) shared fewer genes. There were 1255 genes that 278 were differentially expressed between all three conditions and were therefore highly treatment dependent. These 279 results indicate that the control and heat-stress treatments are strongly contrasting in gene expression, whereas 280 the recovery treatment shares characteristics with both other treatments. 281 To follow up on this interpretation, a principal component analysis (PCA) was conducted on the gene-282 expression data, transformed to the log2 ratio with the mean. The first axis (20.0% of the variation) captures 283 variation mostly related to the heat-shock treatment, which is expected as this treatment specifically affect the 284 largest number of genes (Additional file 6). The second axis (11.4% of the variation) captures variation mostly 285 related to the control treatment, which also fits the analysis with the linear model as the control treatment was the 286 second most distinct treatment. Together, these two axes also place the recovery treatment in between the control 287 and heat-stress treatment, showing the contrast with the other two treatments was lower and possibly indicating 288 transcript levels in the recovery treatment are returning to normal. The third principal component (10.1% of the 289 variation) captures variation that sets the heat-stress treatment completely apart from the other two treatments, 290 which is as expected since the heat-stress treatment has the most unique differentially expressed genes. 291 In order to gain further insight into the functional differences between the treatments, an enrichment 292 analysis was conducted on genes belonging to the different overlap groups as shown in Figure 1 (For example 293 genes differentially expressed in only the control treatment, see the list in Additional file 7). Each of the groups 294 was enriched for many processes, showing that the treatments had a profound impact on gene expression. 295 Interestingly, we found that genes specific for each of the three treatments were enriched for genes expressed in 296 the intestine. Furthermore, the control and heat-stress treatments were strongly enriched for genes expressed in 297 the germline. These enrichments indicated that the expression of genes involved in metabolism and reproduction 298 (or development of the reproductive organs) were strongly altered during the heat-stress response. As this may 299 be caused by a developmental difference in the sampled populations, we estimated the developmental age using 300 a transcriptional ruler (see Materials and methods for details) [9]. It was found that the control population was 301 transcriptionally slightly younger than the heat shock (estimated ~1 h older) and the recovery population 302 (estimated 1.3 h older; Additional file 8). 303 304

Gene expression linked to genetic variation 305
Linkage mapping was performed using 48 RILs for each of the three treatments. Statistical power analysis 306 showed that this population has the power to detect 80% of the eQTL that explain at least 35% of the variation 307 (see Materials and methods and the table in Additional file 9). Identified eQTL (FDR ≤ 0.05; Figure 2; a table 308 with all eQTL is given in Additional file 10) were compared between the treatments ( Table 1). Most genes with 309 an eQTL were found in the heat-stress treatment (2626), whereas the control (1797) and recovery (1880) had 310 similar numbers. This increase in genes with eQTL was primarily caused by the larger number of genes with 311 trans-eQTL in the heat-stress treatment (1560; ~57% of total eQTL) compared to the control (751; ~40% of 312 total) and recovery (739; ~38% of total). The number of cis-eQTL was almost identical among conditions ( 2 : The differences between the total and the summation of the individual cis-and trans-eQTL is due to genes with both a cis-and trans-eQTL.

316
The cis-eQTL showed a bias for higher expression if the regulatory locus had the N2 allele (on average 317 70% of the genes with a cis-eQTL). This bias was absent in the trans-eQTL where, on average, 43% of the genes 318 with a trans-eQTL was more highly expressed if the locus had the N2 allele (illustrated by the figure in 319 Additional file 11). This discrepancy was to be expected, since the microarray platform used to measure the 320 transcripts was designed for the N2 genotype. Therefore, part of this variation is likely due to mis-hybridization. 321 However, previous studies have shown that this does not explain all the variation (see [28]). In congruency, we 322 found genes with a cis-eQTL to be more polymorphic than genes with a trans-eQTL and genes without an eQTL 323 (summarized in Additional file 12). For example, genes with a cis-eQTL were more likely to be fully deleted in 324 CB4856 (9.5% of the genes, compared to 0.9% in genes without an eQTL or 1.1% in genes with a trans-eQTL; 325 Chi-squared test, P < 1*10 -34 ). Furthermore, supporting that not all cis-eQTL stem from mis-hybridization, 326 polymorphisms in the flanking 3' and 5' regions were about two times more likely to occur near cis-eQTL 327 compared to genes without a cis-eQTL (Chi-squared test, P < 1*10 -6 ). When these enrichments for 3' and 5' 328 polymorphisms were compared between cis-eQTL with an N2 or CB4856 higher effect, we found no significant 329 difference. 330 The trans-eQTL were mainly found in 19 treatment-specific trans-bands, loci that regulate the 331 abundance of many transcripts. The 19 trans-bands were identified by analysis of the occurrence of trans-eQTL 332 across the genome (Poisson distribution, P<0.001; listed in Additional file 13). The seven trans-bands detected 333 in the heat-stress treatment affected most genes (1141; ~73% of all trans-eQTL). In the control treatment, five 334 trans-bands were found (325 genes; ~43% of all trans-eQTL) and in recovery treatment seven trans-bands were 335 identified (343 genes; ~46% of all trans-eQTL). Six out of the 19 trans-bands individually affected >100 genes, 336 one located at chromosome X: 0.5-2.0 Mb in control treatment; four in heat-stress treatment at chromosome I: 337 2.0-3.5 Mb, II: 12.0-13.5 Mb, IV: 1.0-2.5 Mb, and V:1.0-3.0 Mb; and one in recovery treatment at chromosome 338 I: 1.5-3.0 Mb. Importantly, the distribution across the genome of trans-bands and eQTL is treatment-specific. To 339 further investigate this, we determined the overlap in mapped eQTL over the course of the heat-stress response. 340

341
In contrast to cis-eQTL, trans-eQTL were environment-specific 342 Comparing the genes with a cis-eQTL among the three treatments, we found 1086 out of 1789 unique genes 343 (~61%) with a cis-eQTL in more than one treatment and 664 (~37%) in all three treatments ( Figure 3A). 344 Because cis-eQTL can be caused by mis-hybridizations, we also calculated the overlap for cis-eQTL with an N2 345 higher effect. In that selection, 303 out of 615 unique genes (~49%) had a cis-eQTL in more than one treatment 346 and 157 (~26%) in all three treatments. For the whole set of genes with a trans-eQTL the overlap was much 347 smaller ( Figure 3B); 360 out of 2610 genes (~14%) were found in more than one treatment and only 80 genes 348 (~3%) in all three treatments. By definition, the locus at which cis-eQTL were detected was the same between 349 treatments. Furthermore, the cis-eQTL effect sizes and directions were highly comparable (Pearson correlation 350 coefficients between 0.94-0.96; shown in a figure in Additional file 14). The cis-eQTL that were detected in 351 only one treatment were probably missed due to the small amount of variation explained by these eQTL (see the 352 text in Additional file 15). Interestingly, there were only three genes with cis-eQTL showing genotypic 353 plasticity: C52E2.4, C54D10.9, and nhr-226. This meant that we observed allelic variation acting in opposite 354 directions between environments (Figure 3C). 355 The effect sizes and direction of genes with a trans-eQTL found in more than one treatment were very 356 similar (Pearson correlation coefficients between 0.90-0.93; figure in Additional file 14). However, only a few 357 genes with a trans-eQTL were found over multiple treatments, far lower than the overlap in cis-eQTL between 358 conditions ( Figure 3A and B). Since trans-eQTL are not by definition regulated from the same location, the 359 overlap between treatments declines even further if location is taken into account. Taking the trans-eQTL from 360 the three treatments together, only ~38% of the multi-treatment trans-eQTL were located at a different locus 361 from one treatment to another (~28% if only loci at different chromosomes were counted), see Additional file 362 16 ( figure A). Interestingly, although the regulatory locus was located elsewhere, the genotypic effect of the 363 trans-eQTL was almost identical (Additional file 16, figure B), yet most genes with trans-eQTL only displayed 364 an eQTL in one treatment (text in Additional file 15). The likely reason that the majority of trans-eQTL was not 365 detected across treatments is that most trans-eQTL are environment-specific and therefore highly cryptic. 366 367 Trans-eQTL display two types of cryptic variation 368 For treatment-specific, cryptic trans-eQTL, we found that a regulator is active when the eQTL is detected and it 369 is not active when no eQTL is detected. With on and off switching regulators between treatments, genes can 370 have dynamic trans-eQTL, which appear as a treatment-specific switch of regulatory loci (Figure 3 -5). Since 371 the majority of genes with a trans-eQTL have one unique trans-eQTL in only one treatment (Figure 3B), a 372 switch in regulatory loci seems to occur less frequently compared to the on/off switch. 373 As the majority of the trans-eQTL was treatment-specific, we investigated whether detection in only 374 one treatment was a result of the statistical power of our study or if it was a biological phenomenon. As trans-375 eQTL explain 34.2% of variation on average, compared to 52.1% of variation for cis-eQTL (text in Additional 376 file 15), it is possible that the detection of trans-eQTL was more affected by lack of statistical power. By 377 simulations, we estimated that this only affected 20.6% of the trans-eQTL that were not detected in multiple 378 treatments (for the detailed analysis, see the text in Additional file 15), which argues for the cryptic nature of 379 trans-eQTL. Another line of evidence for this is the low overlap in affected genes in co-locating trans-bands 380 across treatments (13/19 trans-bands co-locate). We only found significant overlap in three pairs of trans-bands, 381 where 6.5-11.1% of the affected genes overlap (text in Additional file 15; hypergeometric test p < 1*10 -4 ). For 382 example, one of these, a major trans-band at chromosome IV:1-2.5 Mb in the heat-stress treatment, affected 244 383 genes of which 22 overlapped with the 31 genes in the recovery trans-band on chromosome IV:1-2 Mb. 384 Together, these results show that the majority (67.3%) of trans-eQTL indeed are cryptic. For example, the 385 transcript levels of flp-22, pqm-1, and sod-5 were affected by treatment ( Figure 4A) and showed a trans-eQTL 386 in only one treatment ( Figure 4B). 387 As mentioned before, only a minority of the genes with a trans-eQTL have different eQTL across 388 treatments. Of those genes with different trans-eQTL over treatments, the genotypic effects of the eQTL were 389 similar across treatments, even if the loci were different (Additional file 16, figure B). Only between 390 chromosome IV and V were eQTL with changes in effect directions observed. One of the genes displaying this 391 pattern was gei-7 (also known as icl-1). This gene was represented by three different micro-array probes, all 392 showing the same pattern: a primary eQTL on chromosome V at ~12.0Mb in all three conditions and a 393 secondary eQTL in the heat-stress treatment at chromosome IV at ~1.5Mb (Figure 5). 394 395

Functional enrichment of eQTL 396
To find which biological processes were affected by genetic variation on a gene-expression level we looked for 397 enrichment in gene classes, phenotypes, KEGG pathways, GO terms, and anatomy terms (see the list in Additional file 17). For cis-eQTL enriched categories were similar in all three treatments, as expected by the 399 consistent nature of cis-eQTL. For cis-eQTL, we found enrichments for the gene classes bath, math, btb, and 400 fbxa, which were previously found to be highly polymorphic between CB4856 and N2 [22]. Moreover, we found 401 enrichment for genes involved in the innate immune response and protein homo-oligomerization. It should be 402 noted that these enrichments are likely due to hybridization differences for the polymorphic genes (as cis-eQTL 403 with a positive N2 effect are enriched for exactly these categories, Additional file 17). 404 The trans-eQTL were also enriched for genes functioning in the innate immune response, especially for 405 genes where the N2 allele leads to higher expression. Furthermore, genes expressed in the intestine were 406 enriched in the trans-eQTL found in control and heat-stress conditions. Contrasting to the genes with cis-eQTL, 407 the genes with trans-eQTL were enriched for many different transcription factor binding sites, indicating active 408 regulation of trans-eQTL. 409 Consistent trans-eQTL were found in all three treatments for the enriched NSPC (nematode specific 410 peptide family, group C) gene class. This was remarkable as only a very small part of the trans-eQTL were 411 shared over the three treatments. For the heat stress and recovery trans-eQTL, genes expressed in the 412 dopaminergic neuron were enriched, with the strongest enrichment in the heat-stress treatment. These genes 413 were also enriched in the control treatment, however, in a group of trans-eQTL mapping to a different trans-414 band. In the heat stress and recovery treatment the dopaminergic neuron-specific genes showed trans-eQTL at 415 the IV:1-2MB locus, whereas in the control they showed trans-eQTL at the X:4-6 locus (Figure 6). We found that many of the genes that are affected over the heat-stress course are associated with 424 expression in the germline and intestine. These findings are partially in line with findings from an investigation 425 on the heat-shock regulatory network using a genome-wide RNAi screen in C. elegans [5]. Their heat-stress 426 conditions were 31.5°C for two hours followed by 24 hours of recovery at 20°C. The authors found that genes 427 associated with the proteasome induced heat-stress-specific gene expression only in the intestine and 428 spermatheca, which corroborates with our results. Differences with Guisbert et al. (2013), could be explained by 429 the differences in larval stage used (L4 vs L2) as well as differences in temperature and duration of the heat 430 stress (and recovery). Another study exposed C. elegans at the L4 stage to a heat stress of 30 min at 33°C, and 431 measured transcriptome differences using RNA-seq [34]. In support of our study, they also detected genes 432 associated with metabolism and reproduction, whereas, in contrast to our findings, they found a strong link with 433 cuticle specific genes. This contrast could be due to the different experimental conditions of Bunquell et al. 434 (2016), as they used RNAi treatments (empty vector or against hsf-1), a different heat shock duration, method of 435 synchronization (additional L1 arrest by [34]), and rearing temperature before heat shock (23 o C versus 20 o C in 436 this study). 437 We hypothesize that ultimately these discrepancies are likely to result from developmental differences. 438 The transcriptional program in C. elegans differs strongly during development [8,9]. Therefore, application of a 439 heat shock on L2 larvae has a very different developmental (and therefore transcriptional) starting point 440 compared to a heat shock applied on L4 larvae. Furthermore, within the L4 stage there is a strong difference in 441 gene expression in early-stage L4 and late-stage L4. We estimated that the heat-stress and recovery RILs were 442 one hour older than the control RILs suggesting that the former had an accelerated development as these 443 populations were physically the same age as the control population. The main processes that affect transcription 444 in the L4 stage are reproduction and development [9]. These are exactly the processes that are halted upon 445 induction of a heat shock [5,34]. Therefore, it is likely that the state in which these processes are strongly affects 446 the possible routes for down regulation. It would therefore be very interesting to study the effect of heat shock in 447 relation to the developmental dynamics. 448 If the effect of a heat shock is indeed dependent on developmental status, then the transcriptomes of the 449 RILs presented in this study could be indicative of the phenotypic outcome (e.g. heat-stress survival). The reason 450 being the developmental gradient generated by RILs (as shown by [15] within a single experiment, or explicitly 451 by [35] between different stages), which mainly affects trans-eQTL. Alternatively, our experiment could also be 452 analysed by including the variation among the estimated age of the RILs in the mapping model together with 453 treatment effects. A thorough interpretation of these models requires a better understanding of the transcriptional 454 dynamics over the heat-shock and recovery response, a goal we are currently actively pursuing [36]. 455

456
In contrast to trans-eQTL, cis-eQTL are directly linked to polymorphisms The cis-eQTL over the three treatments, which strongly overlapped, are highly enriched for polymorphic genes. 458 This has been reported before in C. elegans, but also in A. thaliana, Mus musculus, and for human cis-eQTL [12, 459 28, 37-41]. This can result in the detection of transcriptional variation that is actually caused by hybridization 460 differences [28,40]. Analysis of the bias in cis-eQTL with higher expression in N2 (the strain for which the 461 microarray was developed) versus CB4856 indeed shows that a proportion of the cis-eQTL is likely to stem from 462 hybridization differences. Also apparent from the gene-enrichment analysis, cis-eQTL were overrepresented for 463 polymorphic gene classes such as bath, math, btb, and fbxa, which are also divergent among other wild strains 464 [42,43]. Other experimental methods could limit such 'false positives', for example, RNA sequencing is 465 expected not to suffer from such biases [44]. 466 Interestingly, genes with a cis-eQTL were also strongly enriched for polymorphisms in regulatory 467 regions. For these cis-eQTL, it could be true that the expression is affected by transcription factor (TF) binding 468 sites [45], yet we did not detect any enrichment for such sites as mapped by ModEncode [30,31]. An 469 explanation for this is that genes with a cis-eQTL are regulated by different TFs, therefore the affected TF 470 binding site is different per gene with a cis-eQTL making an overrepresentation among all cis-eQTL unlikely. 471

CGV of transcriptional architecture is determined by trans-eQTL 473
Although previous studies in C. elegans have focused on continuous (thermal or developmental) treatments or 474 gradual change over time [12,15,35], only few genetical genomics experiments have measured the effect of 475 acute perturbation, where an organism is suddenly exposed to a different environment [39,46]. This acutely 476 affected the transcriptional architecture of gene expression, which consists of cis-and trans-acting eQTL. We 477 found that cis-eQTL were robust across all three treatments, including heat stress, which was found before in 478 studies on C. elegans and other species where eQTL patterns have been studied in different conditions. For 479 example, Li et al. (2006) found that more than 50% of all trans-eQTL were affected by temperature compared to 480 cis-eQTL [12], and Smith and Kruglyak (2008) also reported that cis-eQTLs were hardly affected by external 481 conditions as compared to trans-eQTL [47]. Also, in other species, like Arabidopsis thaliana, it was reported 482 that cis-eQTL were robust to light perturbation, whereas trans-eQTL were more affected by different light 483 regimes [39]. In humans, cis-eQTL showed a very high correlation between tissues compared to trans-eQTL, 484 and are replicated across populations in lymphoblastoid cell lines [48,49]. 485 By inducing a strong environmental perturbation in the form of a heat shock, a high amount of different 486 trans-eQTL were detected across the control, heat-stress, and recovery treatments. One of the things we noted is the increase in the number of trans-eQTL and trans-bands in the heat-stress treatment compared to the other two 488 treatments. This strong transcriptional variation could underlie -or result from -the phenotypic trait differences 489 observed between N2 and CB4856 in temperature experiments. However, it should be noted that also 490 developmental differences due to timing between the three treatments can contribute [9,15]. Temperature-491 affected trait differences have been observed for age at maturity, fertility, body size, vulval induction, and 492 lifespan [19,[50][51][52][53]. Furthermore, these strains also display behavioural differences in heat avoidance and 493 thermal preference [20,21]. Likely candidates for these trait differences could be found in loci affecting the 494 expression of many trans-eQTL. For example, the left arm of chromosome IV harbours a trans-band affecting 495 the expression of 244 genes and coincides with a QTL affecting lifespan after heat shock [19]. Analogously, the 496 laboratory allele of npr-1 affects many trait differences between N2 and CB4856 (as reviewed in [54]). Most 497 importantly, npr-1 affects the behaviour of the animal, which probably results in gene-expression differences, as 498 part of the expression differences can be mimicked by starving nematodes [55]. These gene-expression 499 differences can be picked up as a trans-band [28]. The latter example illustrates both a link between gene-500 expression and classical traits and that caution is required for inferring the direction of causality. 501 Why is CGV mainly affecting trans-eQTL? We hypothesize that this is due to the versatile nature of 502 trans-eQTL. First, trans-eQTL are loci that are statistically associated with variation in transcript abundance 503 from genes elsewhere on the genome. The ultimate causes for this association can be manifold, from direct 504 interactions such as polymorphic transcription factors that affect gene expression to indirect interactions such as 505 receptor-kinase interactions [46], receptors [46, 55, 56], or effects at the behavioural level that result in 506 expression differences [55]. Therefore, an environment can require the organism to respond, thereby requiring 507 specific polymorphic genes to react, ultimately leading to the expression of cryptic variation. 508

509
The trans-eQTL architecture is comprised of treatment-specific genes 510 The trans-eQTL architecture is remarkably unique over the three treatments tested. We only observed 3% 511 overlap in trans-eQTL in the three treatments, for which the main cause was treatment specificity of trans-512 eQTL. Surprisingly, genetic variation affects the expression of genes in only one direction; only in rare cases 513 does allelic variation change the sign of the effect and this was only observed for cis-eQTL (C52E2.4, 514 C54D10.9, and nhr-226). On the one hand, this is in congruency with other eQTL studies comparing different 515 environments; trans-eQTL are strongly affected by different environments (for example, see [12,39,47]). On 516 the other hand, it raises questions about the genetic architecture of trans-bands; co-localizing trans-bands are 517 generally not affecting the expression of the same genes (see text in Additional file 15), which can imply the 518 involvement of multiple regulators (causal genes). 519 However, it seems unlikely that an abundance of novel trans-eQTL also implies an abundance of novel 520 causal genes. First of all, over all three treatments, the majority of trans-eQTL are located in trans-bands, which 521 are mostly non-overlapping between treatments. This is an observation that extends to other studies and other 522 species in which eQTL have been mapped: the majority of trans-eQTL are found on a few regulatory hotspots 523 (for example, see [28,38,57]). Therefore, it is logical to assume that a small set of causal genes ultimately 524 explains the majority of trans-eQTL. Second, the allelic effect only has one direction, which is easily aligned 525 with the notion of a few regulators instead of many. Together, these observations can help in further dissecting 526 loci to identify causal genes. Trans-band regulators might play a role in the dynamic response, which could aid 527 in narrowing down candidate genes. 528 However, it should be reiterated that the route from genetic variation resulting in transcriptional effect 529 can be manifold, which can also obscure the ultimate cause of the observed trait variation. An organism is an 530 intricate web of interdependencies leading to the phenotype. Therefore, upon further dissection of the loci, a 531 single eQTL might prove to be many. Furthermore, it should be noted that trans-eQTL explain less variation 532 compared to cis-eQTL, which has been established across species [28,58]. Therefore, it is more likely that trans-533 eQTL are not detected. However, the treatment specificity and direction of allelic effects of trans-eQTL across 534 three treatments robustly show trans-eQTL architecture is comprised of treatment-specific genes. 535 536

Conclusion 537
Here we present the contribution of CGV on eQTL across three treatments in the nematode C. elegans. We find 538 that mainly trans-eQTL are affected by CGV, in contrast to cis-eQTL, which are highly similar across 539 treatments. Furthermore, we show that most CGV results in unique genes with a trans-eQTL, instead of different 540 allelic effects and/or different eQTL for the same genes. This shows the highly dynamic nature of CGV. at 20 o C). At the end of these treatments, the nematodes were in the L4 stage, and were harvested. Thereafter 575 RNA was isolated and the transcriptome was measured by microarray. (B) The outcome of the treatment 576 analysis. On the left a legend is included to clarify which contrasts are compared. On the right, the overlap in 577 differentially expressed genes is shown per treatment comparison. For example, 942 genes are uniquely 578 differentially expressed in the recovery treatment, these genes are differently expressed between recovery and 579 heat stress and between recovery and control, but not between heat stress and control. 580