Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index

Genetic studies have shown that obesity risk is heritable and that, of the many common variants now associated with body mass index, those in an intron of the fat mass and obesity-associated (FTO) gene have the largest effect. The size of the UK Biobank, and its joint measurement of genetic, anthropometric and lifestyle variables, offers an unprecedented opportunity to assess gene-by-environment interactions in a way that accounts for the dependence between different factors. We jointly examine the evidence for interactions between FTO (rs1421085) and various lifestyle and environmental factors. We report interactions between the FTO variant and each of: frequency of alcohol consumption (P=3.0 × 10−4); deviations from mean sleep duration (P=8.0 × 10−4); overall diet (P=5.0 × 10−6), including added salt (P=1.2 × 10−3); and physical activity (P=3.1 × 10−4).


Diet Score
Supplementary Fig 2. The associations of different nutrient quantities with BMI and diet score. Nutrient quantities were estimated from 24 hour dietary recall. Nutrients were fitted jointly along with variables from the 'BMI' model (Table 2 and Methods). For BMI, the effects are expressed as the percentage change in BMI per standard deviation of the nutrient, and for the diet score the effect is the standard deviation change in diet score per standard deviation of the nutrient. The estimated effects and 95% confidence intervals are plotted for each sample: the British sample (n=12,747, blue) and the diverse sample (n=4,413, red). If there is no statistically significant heterogeneity (p > 0.05) between the samples, a combined estimate from a fixed effects meta-analysis is also plotted (diamonds). A star on the right indicates the p-value below the Bonferroni corrected significance threshold of 0.05/22.

Cooked vegetable intake
Supplementary Fig 3. The associations of different nutrient quantities with frequency of added salt and cooked vegetable intake.. Nutrients were fitted jointly along with variables from the 'BMI' model (

Control of Population Structure
For the British sample, we calculated principal components from the sample determined to be genetically British by UK Biobank. We LD-pruned SNPs using PLINK in a sliding window of size 1000 to ensure that no pair of SNPs within the window had an R 2 of more than 0.1. We filtered out SNPs with minor allele frequency less than 0.05, missingness greater than 1%, and Hardy-Weinberg exact test p-value less than 10 -6 . This left ~104,000 SNPs across the genome. We used EIGENSOFT 1 with 'fastmode' 2 on to calculate the top 20 principal components. We fitted the 'Scores', 'Activity', 'Alcohol', and 'Diet' models ( Table 2) using R 3 , with the top 20 principal components added.
For the diverse sample, we used a mixed model to prevent confounding due to family relatedness and population structure not captured by principal components 4-7 . We filtered out SNPs with minor allele frequency less than 0.01, with more than 1% missing calls, and Hardy-Weinberg equilibrium exact test p-value less than 10 -10 . We used a stronger threshold for the Hardy-Weinberg equilibrium exact test for the diverse sample because, while we wanted problematic SNPs with gross violations of equilibrium to be removed, Hardy Weinberg equilibrium is not expected to hold exactly in ethnically mixed samples. To fit the models in the diverse sample, we used a mixed model with two random effects: one from the SNPs on chromosomes other than 16, and one from the SNPs on chromosome 16 more than 2cM away from rs1421085, where genetic distance was determined using the genetic map provided by UK Biobank. We calculated genetic relatedness matrices using GCTA 8 , and fitted the models using the Average Information algorithm in GCTA. These correspond to the maximum likelihood estimates of the fixed effects given the variance components that maximize the restricted likelihood.

Efficacy of Population Structure control
If population structure has been controlled effectively and there are no true causal loci, then the association test statistics at independent SNPs across the genome should be sampled from the null distribution. A common measure of effectiveness of control of population structure is the inflation factor 9 : this estimates the ratio of the median test statistic across the genotyped variants to the median that would be expected from the null distribution of test statistics 9 . A weakness of this measure is that if a trait has many causal variants, which BMI is known to have 10,11 , then the inflation factor should be greater than 1 even if population structure has been controlled for perfectly 12 . In the following, we calculate inflation factors for SNPs across the genome to measure how effective our control of population structure is in both samples.
To test whether a mixed model could control for the kind of structure in the diverse sample, we used BOLT-LMM 13 , with the LMM-Inf setting, to calculate association statistics between log-BMI and the SNPs on the chromosomes other than 16, which contains the FTO locus. We used the 'BMI' model ( Table 2) variables as fixed effects, excluding any interactions with FTO. We used BOLT-LMM instead of GCTA because of the greater computational efficiency.
(Note that for this analysis we undertake association analyses at SNPs genome-wide, whereas our primary analyses are focused on a single FTO SNP.) The results should be comparable because BOLT-LMM with the LMM-inf setting fits the same infinitesimal mixed model as GCTA. The inflation factor over the tested chromosomes was 1.07, which is lower than 1.09 reported for a BMI meta-analysis 12 .
We measured how effective adjusting for the top 20 principal components in the British sample was at controlling population structure by computing association statistics for a sample of SNPs across the genome. To ensure the association test statistics were comparable to our FTO analysis, we used the same code and model within R as for the primary analysis. However, this imposed computational constraints, preventing a genome wide analysis. We therefore selected 100 SNPs from each chromosome, leaving a gap of 100 genotyped SNPs between each selected SNP. We kept those with minor allele frequency >5% and missingness <1%, leaving 872 SNPs. We used the 'Scores' model ( Table 2) with all of the FTO variables removed and replaced with the test SNP. The inflation factor was 1.12.
While the inflation factor is higher than in the diverse sample, it is close to the inflation factor of 1.09 reported for a BMI meta-analysis 12 .