Assessment of a causal relationship between body mass index and atopic dermatitis

53 full text papers excluded: 2 reported on birthweight

Six studies reported the mean difference in BMI between AD cases and controls (Table E1). These studies showed mixed results, but the majority (4 out of 6) reported mean BMI to be lower in AD cases compared to controls. Kusunoki et al. reported the odds of AD in children to be 1.02 per 1 kg/m 2 increase in BMI, 2 and the same direction of effect was observed in overweight and obese children (Fig 2). The remaining 5 studies reported the odds of AD with higher BMI (Table E2), where the majority (4 out of 5) estimates gave evidence of increased AD risk.

Meta-analysis of published studies
All data pertaining to BMI and AD were extracted and a meta-analysis was performed for the definition with the most available data, this being the odds of AD in overweight and/or obese individuals. This analysis represents an update of the 2015 review by Zhang et al. 3 The meta-analysis was conducted separately for children and adults, as well as combined. A random effects model was used due to the inclusion of heterogeneous populations and study designs being meta-analysed. The overall OR for AD in overweight individuals was 1.05 (95% CI 0.94 to 1.19) in adults (n=51,008) and 1.08 (95% CI 1.00 to 1.16) in children (n=506,202) ( Figure E2). For obese individuals, the OR for having AD was 1.19 (95% CI 0.95 to 1.49) in adults (n=1,400,679) and 1.20 (95% CI 1.11 to 1.30) in children (n=796,514) ( Figure E3). Where stated, children were defined as those under 18 years of age, although one paediatric study included individuals aged up to 19 years. 4 Adults were defined as those aged 18 years and above in studies where this had been described.

Observational analysis using UK Biobank and HUNT datatsets Clinical outcomes
The BMI of UK Biobank participants was calculated from standing height and weight measurements that were taken while visiting an assessment centre. Individuals were defined as having atopic dermatitis (AD) based on their response during a verbal interview with a trained member of staff at the assessment centre. Participants were asked to tell the interviewer which serious illnesses or disabilities they had been diagnosed with by a doctor and were defined as AD cases if this disease was mentioned. Disease information was also obtained from the Hospital Episode Statistics (HES) data extract service where health-related outcomes had been defined by International Classification of Diseases (ICD)-10 codes (Table E3). Additionally, if any had answered "yes" to "Has a doctor ever told you that you have hay fever, allergic rhinitis or eczema", then these individuals were excluded from the AD controls. Table E3. International Classification of Diseases (ICD)-10 codes used to obtain disease information for AD (eczema; atopic eczema; atopic dermatitis) from the UK Biobank resource Within HUNT, participants' height and weight were measured and used to calculate BMI (kg/m 2 ). Participants were defined as AD cases based on their response to a general questionnaire sent to all HUNT participants. AD cases responded affirmatively to both "Have you had or do you have any of the following diseases: Eczema on hands" and "Did you have eczema when you were a child? (also called atopic eczema)". In addition, cases were obtained from the Nord-Trøndelag Health Trust which includes the two hospitals (Levanger and Namsos) in the study area, and the General Practitioner records. Disease classifications are shown in Table E4.

Confounder variables
Within UK Biobank, confounders that were considered in the current study were age, sex, smoking status, alcohol intake and educational attainment. The age and sex of participants were baseline characteristics determined at recruitment. The information on age was coded and analysed as a continuous variable, while sex was analysed as a binary variable. Smoking status, alcohol intake and educational attainment were defined by responses to a touchscreen questionnaire. The smoking status of participants was summarised as being a current or previous smoker, or never smoked, where this information was coded into a categorical variable. Alcohol intake frequency was determined by asking participants "about how often do you drink alcohol?", where options included "Daily or almost daily", "Three or four times a week", "One to three times a month", "Special occasions only" and "Never". This information was categorised for daily, weekly and monthly alcohol intake. Educational attainment was also defined by asking "which of the following qualifications do you have?", where participants could select more than one option including "College or University degree", "A levels/AS levels or equivalent", "O levels/GCSEs or equivalent", "CESs or equivalent", "NVQ or HND or HNC or equivalent", "Other professional qualifications eg: nursing, teaching", or "None of the above". Participant responses were coded into categorical variables for degree holders, those who had completed advanced level studies (A-level) or had obtained their general certificate of secondary education (GCSE). Within HUNT, confounders considered in the current study were age, sex, smoking status and alcohol intake. Information on educational attainment was not available in the third survey of the HUNT study. The age and sex of participants were determined at the time of participation. The information on age was coded and analysed as a continuous variable, while sex was analysed as a binary variable. Smoking status and alcohol intake were defined by the participants' response to a questionnaire. Smoking status was defined as being never, former, occasional, or current smoker. Alcohol intake frequency was determined by asking participants "about how often in the last 12 months did you drink alcohol?", where options included "4-7 times a week", "2-3 times a week", "about once a month", "a few times a year", "not at all last year" and "never drunk alcohol".

Meta-analysis
Within the UK Biobank and HUNT datasets, logistic regression models were used to estimate the observational association between BMI and AD. This analysis was performed for all individuals, as well as for overweight (25 kg/m2<BMI<30 kg/m2) and obese (BMI>30kg/m2) individuals alone. Analyses were adjusted for age, sex, smoking status, alcohol intake and educational attainment (where information on education was available in UK Biobank only). The estimates for each dataset were meta-analysed assuming a random effects model to account for heterogeneity.

Results
There was very little evidence of an association between BMI and AD in UK Biobank, but some evidence within the HUNT dataset ( Figure E4). Upon meta-analysis, the OR of AD per 1 kg/m 2 higher BMI was 1.01 (95% CI 1.00 to 1.01; P=0.26). Among overweight individuals (BMI = 25 to 30 kg/m 2 ), the OR of AD per 1 kg/m 2 higher BMI was 1.02 (95% CI 1.00 to 1.04; P=0.07; 4,820 cases; 130,776 controls). A similar estimate was found among obese individuals (BMI greater than 30 kg/m 2 ) but with stronger evidence of an association (OR=1.02; 95% CI 1.01 to 1.03; P=3.3x10 -4 ; 2,741 cases and 73,907 controls).

Figure E4. Observational association between BMI upon AD in two population-based studies. Association analysis and
meta-analysis of observational data from the UK Biobank and HUNT study, Norway. Observational analysis was restricted to individuals with complete information on potential confounders. Estimates are given per 1kg/m 2 increase in BMI. CI, confidence interval.

Analysis of causal relationships Study populations and phenotypes
Data were available, with written informed consent, for a total of 317,391 participants including 9,933 AD cases from the UK Biobank 5 aged 40-69 years and 1,775 AD cases from the third survey of the Nord-Trøndelag Health Study, Norway 6 (HUNT, 2006-08) aged 20 years and over (Table E5). AD in the UK Biobank was defined by patient-report of doctor diagnosis or hospital statistics using ICD-10 codes; in the HUNT study AD was defined by self-report and/or hospital statistics using ICD-9 and ICD-10 codes and/or diagnosed by general practitioners using ICPC-2 codes. Healthy controls were individuals from the same population-based studies, but without reported AD. All individuals included in this analysis were of white European ethnicity. The BMI (kg/m 2 ) of UK Biobank and HUNT participants was calculated from height and weight measurements. UK Biobank is approved by the National Health Service National Research Ethics Service (ref 11/NW/0382; UK Biobank application number 10074); the HUNT Study was approved by the Regional Committee for Medical and Health Research Ethics (REC Central), which also gave specific approval for this study (2015/2003). Summary level data were available for 425,220 individuals of European ancestry from published GWAS studies for BMI 7 (n=322,154) and AD 8 (n=103,066) ( Table E5).

Genotyping
Genotyping of UK Biobank participants was performed with one of two arrays (Applied Biosystems TM UK BiLEVE Axiom TM Array (Affymetrix) and Applied Biosystems TM UK Biobank Axiom TM Array). Sample quality control (QC) measures included removing individuals who were duplicated and highly related (third degree or closer), had sex mismatches, as well as those identified to be outliers of heterozygosity and of non-European descent. Further details of the QC measures applied, and imputation performed have been described previously. [9][10][11] Genotyping of the HUNT participants was performed with one of three different Illumina HumanCoreExome arrays (HumanCoreExome12 v1.0, HumanCoreExome12 v1.1 and UM HUNT Biobank v1.0). The genotypes from different arrays had QC performed separately and were reduced to a common set of variants across all arrays. Sample QC measures were similar to those applied to the UK Biobank. Related individuals were excluded from the analysis (n=30,256). Details of the genotyping, QC and imputation are described elsewhere. 12

Genetic instruments
The genetic instrument for BMI comprised the 97 BMI-associated SNPs reported by the GIANT consortium (a metaanalysis of 125 GWAS studies with 339,224 individuals). 7 We note that this study included 1,334 individuals from the HUNT dataset in their analysis. The 97 SNPs were extracted from both UK Biobank and HUNT datasets and combined to create a separate standardised genetic risk score (GRS) for each dataset using the --score command in PLINK (version 1.9). The dosage of the effect allele for each SNP was weighted by the effect estimates reported for the European sex-combined analysis (n= 322,154) by Locke et al, 7 summed across all variants and divided by the total number of variants. The scores were standardized to have a mean of 0 and standard deviation of 1. One BMIassociated SNP, rs12016871, was not present within the UK Biobank and HUNT datasets, therefore rs9581854 was used as a highly correlated proxy (r 2 = 1.0). Of the 97 BMI SNPs, 22 are strongly associated with BMI in childhood 13 and since AD is a predominantly paediatric disease we used these SNPs as genetic instruments for BMI in childhood.

MR: investigation of the effect of BMI on AD
One-sample MR analysis was performed in the UK Biobank and HUNT datasets with the individuals' BMI SNPs, measured BMI and AD status (Figure E5(a)). The MR estimates from each SNP were meta-analysed assuming a random effects model, giving a single estimate for the analysis performed in each dataset. A random effects model was used to avoid over-precision of the causal estimate, and to allow for heterogeneity in the causal estimates being meta-analysed from the different genetic variants. The MR analysis with the individual BMI SNPs was performed with the two-stage predictor substitution (TSPS) method. 15 The first stage involved regression of BMI upon individual BMI SNPs. The outcome (AD) was then regressed upon the fitted values from the first regression stage. As AD is a binary outcome, the first stage linear regression was restricted to control individuals, as recommended by Burgess et al. 16 Logistic regression was then performed in the second stage where the fitted values for the cases were predicted. The standard errors (SE) of these estimates were adjusted using the first term of the delta method expansion for the variance of a ratio, allowing for the uncertainty in the first regression stage to be considered. 16 Genetic principal components (as previously described 11,12 ) were included as covariates in the analysis to control for residual population structure. UK Biobank analysis also controlled for the platform used to genotype the samples. A similar protocol was followed in HUNT adjusting for the first four principal components and genotyping batch. Two-sample MR analysis of published GWAS data was performed using the "TwoSampleMR" R package. 17 Estimates for the association between BMI and BMI SNPs in Europeans were taken from the GIANT BMI GWAS study published by Locke and colleagues. 7 Summary statistics from the most recent AD GWAS meta-analysis 8 were used to obtain estimates for the association of AD with the BMI SNPs in Europeans. The published BMI SNP estimates were based on an inverse normal transformation of BMI residuals on age and age-squared, as well as any necessary study-specific covariates. In unrelated individuals, residuals were calculated according to sex and case/control status and were sexadjusted amongst related individuals. The causal estimates for the two-sample analysis were converted to raw BMI units (kg/m 2 ), assuming 4.6kg/m 2 to be the median BMI standard deviation. 7 The one-sample estimates obtained from UK Biobank and HUNT were meta-analysed assuming a fixed effect model. This was then meta-analysed with the twosample estimate to obtain an overall causal estimate, assuming no between-method heterogeneity. This was performed using the genetic instrument of 97 BMI-associated SNPs, and separately for the instrument of 22 childhood BMI-associated SNPs. A separate two-sample MR analysis was performed in the same manner, using BMI SNP-BMI association estimates from the more recent BMI meta-analysis reported by Yengo and colleagues. 14

Asessment of confounders
When investigating the association between the BMI GRS and potential confounders of BMI, some small associations with the confounders were seen (Figures E6 and E7). However, the magnitudes of these associations were minimal in comparison to the strength of association with BMI. The FTO variant alone also showed some effect on confounders, but it was much more strongly associated with measured BMI than potential confounders (Figures E8 and E9).

Figure E6. Association of BMI genetic risk score with BMI (kg/m 2 ) and potential confounders in UK Biobank.
Estimates are given per 1 standard deviation increase in BMI GRS. A-level, Advanced level studies; CI, confidence interval; GCSE, General Certificate of Secondary Education; Monthly Alcohol Consumption was defined as frequency of "one to three times a month". For "sex", reference = Female". Figure E7. Association of BMI GRS with BMI (kg/m 2 ) and potential confounders in HUNT. Estimates are given per 1 standard deviation increase in BMI GRS. BMI, body mass index; CI, confidence interval.*age given per 10-year intervals. For "sex", reference = Female".

Figure E8. Association of FTO SNP (rs1558902) with BMI (kg/m 2 ) and potential confounders in UK Biobank.
Estimates are given per 1 copy increase in effect allele (A). A-level, Advanced level studies; CI, confidence interval; GCSE, General Certificate of Secondary Education; Monthly Alcohol Consumption, defined as frequency of "one to three times a month". For "sex", reference = Female".

Figure E9. Association of FTO SNP (rs1558902) with BMI (kg/m 2 ) and potential confounders in HUNT.
Estimates are given per 1 copy increase in effect allele (A). *age given per 10-year intervals. For "sex", reference = Female".

Sensitivity analysis
MR-Egger regression, weighted median analysis and the weighted mode-based estimate (MBE) were used to investigate potential horizontal pleiotropy. The strict definition of pleiotropy is when a SNP influences more than one trait. 18 In MR, vertical pleiotropy is assumed where the genetic instrument influences the exposure which in turn influences the outcome. However, SNPs that influence the exposure and the outcome through different pathways, known as horizonal pleiotropy, would violate the MR assumption that the instrumental variable has an effect on the outcome only via the exposure being investigated and could bias the causal estimate. 19 The weighted median method provides a valid causal estimate if at least 50% of the information in the MR analysis comes from valid instruments. 20 Likewise, the weighted MBE also provides a valid causal estimate based on the assumption that the most frequent pleiotropy value is zero across the genetic instruments, 21 whilst the intercept from the MR-Egger regression analysis allows the size of any pleiotropic effect to be determined. 22 MR-Egger regression gives a valid causal estimate under the 'InSIDE' assumption, where each SNP-exposure effect is uncorrelated with the horizontal pleiotropic effect of the SNP. 22 We also used heterogeneity statistics to detect invalid instruments in MR that are due to the presence of pleiotropy. 23 As a proof-of-concept, one-sample MR analysis was performed using the FTO SNP alone (rs1558902) as a genetic instrument due to its strong association with BMI. 24,25 The instrumental variables used in an MR analysis are assumed to be independent of confounders to avoid bias of the causal estimate. We therefore investigated the relationship between the BMI GRS, the FTO variant alone, and potential confounders of BMI by performing a simple regression of the confounder upon the BMI GRS and FTO variant.
Reverse MR: investigating the effect of AD genetic risk on BMI In this reverse MR, we investigated the genetic liability of AD upon BMI (Figure E5(b)). One-sample MR analysis was performed separately in the UK Biobank and HUNT datasets using the two-stage least squares (TSLS) method with individual AD SNPs as genetic instruments. This analysis involves two regression stages where AD is first regressed upon the instrument (disease-associated SNPs), then the outcome (BMI) is regressed upon the fitted values from the first stage regression. 19 The final one-sample MR estimates from UK Biobank and HUNT were meta-analysed assuming a fixed effect model to give a single causal estimate (change in BMI per log odds of AD). To aid interpretation of the causal estimate, one-sample MR was also performed where genetic liability for AD was considered as a continuous variable, with values from "0" to "1" to give the difference in BMI between AD cases and controls with the "ivpack" R package. 26 Two-sample MR analysis was performed with the "TwoSampleMR" R package, 17 using summary results from GWAS studies for AD 8 and BMI. 7 The one-and two-sample MR estimates were meta-analysed using a fixed effect model to give an overall causal estimate. To aid interpretation, these estimates were multiplied by 0.693 to give the change in BMI per doubling odds of AD, as demonstrated by Gage et al. 27 Sensitivity analyses for the reverse MR were performed using MR-Egger regression, weighted median and weighted MBE analysis methods. A two-sample MR analysis was performed in the same manner, where AD SNP-BMI association estimates were extracted from the more recent BMI meta-analysis. 14 Analysis software All analyses were performed using R (www.r-project.org) unless otherwise stated. The code used to carry out these analyses is available on GitHub (https://github.com/abudu-aggrey/Eczema_BMI_MR).

Results of MR analyses
Causal effect of BMI upon AD Similar causal estimates were found in UK Biobank (OR=1.03; 95% CI 1.00 to 1.06; P=0.10) and HUNT (OR=1.03; 95% CI 0.96 to 1.10; P=0.41). The estimate from the two-sample analysis with published GWAS data gave limited evidence of higher BMI increasing AD risk (OR=1.02; 95% CI 0.99 to 1.04; P=0. 19). Meta-analysis of the UK Biobank, HUNT and two-sample estimates showed evidence of a small causal effect, OR=1.02 (95% CI 1.00 to 1.04; P=0.03) (Figure 1 in Letter). This represents an increase in the odds of AD by ~2% for each 1 unit increase in BMI, or an increase in the risk of AD by approximately 11% for an increase in BMI of 5 units (OR per 5 units higher BMI = exp(Beta per 1 unit higher BMI * 5), for example, from 20 to 25kg/m 2 . A similar causal estimate was found when restricting the BMI instrument to SNPs most strongly associated with childhood BMI (OR=1.04; 95% CI 1.01 to 1.07; P=0.01). Sensitivity analyses MR-Egger regression analysis showed little evidence of pleiotropy (UK Biobank intercept= 0.00; 95% CI -0.01 to 0.01; P=0.80, HUNT intercept=0.01; 95% CI -0.01 to 0.02; P=0.34) and the sensitivity analyses gave similar estimates (Table E6, Figure E10). There was also little evidence of heterogeneity among the individual effect estimates for each SNP in both datasets (UK Biobank Q=101.07, P=0.32; HUNT Q=100.04, P=0.37). MR analysis with the FTO SNP alone gave a slightly stronger estimate but with a wider confidence interval (OR=1.05; 95% CI 0.98 to 1.12; P=0.20) (Figure E11). Performing two-sample MR with the larger number of most recently published BMI SNP estimates (941 SNPs) 14 also gave evidence of a causal effect upon AD risk (OR=1.08; 95% CI 1.01 to 1.16; P=0.02). Effect of AD genetic risk upon BMI One-sample MR estimates in UK Biobank and HUNT gave little evidence that genetic risk of AD influences BMI (0.00kg/m 2 change in BMI per doubling odds of AD, 95% CI -0.06 to 0.06, P=0.94). Meta-analysis with the two-sample estimate also gave weak evidence of a very small causal effect (0.03 kg/m 2 change in BMI per doubling odds of AD, 95% CI -0.02 to 0.08, P=0.24). However, we note that the FLG loss-of-function variant (R501X/rs61816761) known to be strongly associated with AD 28 was not available in the two-sample analysis. There was little evidence of pleiotropy in the causal estimate for genetic risk of AD on BMI, and modest heterogeneity among the individual SNP effects (UK Biobank Q=66.51, P=4.15x10 -6 ; HUNT Q=24.10, P=0.34) (Table E7, Figure E12).
When performing two-sample MR with summary data for the most recently reported BMI SNPs, 14 the causal difference in BMI also gave very little evidence for BMI being influenced by genetic risk for AD, (0.02 kg/m 2 lower BMI in cases compared with controls, 95% CI -0.12 to 0.15, P=0.82). It should be noted, however, that summary statistics were not available for 5 of the 24 AD-associated SNPs in this analysis, including the FLG loss-of-function variant (R501X/rs61816761), thereby weakening the genetic instrument.

Table E6. Sensitivity analyses for the causal effect of BMI upon AD.
OR per 1 unit increase in BMI (kg/m 2 ); CI, confidence interval; IVW, inverse variance weighted analysis; MBE, mode-based estimate; OR, odds ratio; TSPS, two-stage predictor substitution.     Change in BMI (kg/m 2 ) per doubling odds of atopic dermatitis; CI, confidence interval; IVW, inverse variance weighted analysis; MBE, mode-based estimate; TSLS, two-staged least squares.