Blood lipids and prostate cancer: a Mendelian randomization analysis

Abstract Genetic risk scores were used as unconfounded instruments for specific lipid traits (Mendelian randomization) to assess whether circulating lipids causally influence prostate cancer risk. Data from 22,249 prostate cancer cases and 22,133 controls from 22 studies within the international PRACTICAL consortium were analyzed. Allele scores based on single nucleotide polymorphisms (SNPs) previously reported to be uniquely associated with each of low‐density lipoprotein (LDL), high‐density lipoprotein (HDL), and triglyceride (TG) levels, were first validated in an independent dataset, and then entered into logistic regression models to estimate the presence (and direction) of any causal effect of each lipid trait on prostate cancer risk. There was weak evidence for an association between the LDL genetic score and cancer grade: the odds ratio (OR) per genetically instrumented standard deviation (SD) in LDL, comparing high‐ (≥7 Gleason score) versus low‐grade (<7 Gleason score) cancers was 1.50 (95% CI: 0.92, 2.46; P = 0.11). A genetically instrumented SD increase in TGs was weakly associated with stage: the OR for advanced versus localized cancer per unit increase in genetic risk score was 1.68 (95% CI: 0.95, 3.00; P = 0.08). The rs12916‐T variant in 3‐hydroxy‐3‐methylglutaryl‐CoA reductase (HMGCR) was inversely associated with prostate cancer (OR: 0.97; 95% CI: 0.94, 1.00; P = 0.03). In conclusion, circulating lipids, instrumented by our genetic risk scores, did not appear to alter prostate cancer risk. We found weak evidence that higher LDL and TG levels increase aggressive prostate cancer risk, and that a variant in HMGCR (that mimics the LDL lowering effect of statin drugs) reduces risk. However, inferences are limited by sample size and evidence of pleiotropy.


Abstract
Genetic risk scores were used as unconfounded instruments for specific lipid traits (Mendelian randomization) to assess whether circulating lipids causally influence prostate cancer risk. Data from 22,249 prostate cancer cases and 22,133 controls from 22 studies within the international PRACTICAL consortium were analyzed. Allele scores based on single nucleotide polymorphisms (SNPs) previously reported to be uniquely associated with each of low-density lipoprotein (LDL), high-density lipoprotein (HDL), and triglyceride (TG) levels, were first validated in an independent dataset, and then entered into logistic regression models to estimate the presence (and direction) of any causal effect of each lipid trait on prostate cancer risk. There was weak evidence for an association between the LDL genetic score and cancer grade: the odds ratio (OR) per genetically instrumented standard deviation (SD) in LDL, comparing high-(≥7 Gleason score) versus low-grade (<7 Gleason score) cancers was 1.50 (95% CI: 0.92, 2.46; P = 0.11). A genetically instrumented SD increase in TGs was weakly associated with stage: the OR for advanced versus localized cancer per unit increase in genetic risk score was 1.68 (95% CI: 0.95, 3.00; P = 0.08). The rs12916-T variant in 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) was inversely associated with prostate cancer (OR: 0.97; 95% CI: 0.94, 1.00; P = 0.03).
In conclusion, circulating lipids, instrumented by our genetic risk scores, did not appear to alter prostate cancer risk. We found weak evidence that higher LDL and TG levels increase aggressive prostate cancer risk, and that a variant in HMGCR (that mimics the LDL lowering effect of statin drugs) reduces risk. However, inferences are limited by sample size and evidence of pleiotropy.
are the product of confounding by common causes of both cholesterol levels and prostate cancer (e.g., aspects of diet), bias, or reverse causality (the cancer causing altered cholesterol metabolism) [5]. As serum cholesterol levels can be modified by lifestyle changes [6] and statin therapy [7], clarifying the causality of this association could inform the development of prevention interventions for prostate cancer.
Statins lower cholesterol levels by inhibiting HMG-CoA reductase, the rate-limiting enzyme in cholesterol synthesis. A meta-analysis of 27 observational studies concluded that statin therapy reduced prostate cancer by 7% (RR 0.93; 95% CI: 0.87, 0.99; P = 0.03) [8], but a separate metaanalysis of four randomized trials showed minimal evidence of any association (RR 1.08; 95% CI: 0.91, 1.30; P = 0.38) [9]. Both analyses revealed considerable heterogeneity between the included studies (I 2 values >70%) and, as prostate cancer was assessed as a secondary outcome in the trials, misclassification of outcome could have biased the results. As yet, evidence regarding statin therapy for prostate cancer is controversial [10].
Mendelian randomization uses genetic variants robustly associated with traits of interest (in this case, circulating low-density lipoprotein [LDL], high-density lipoprotein [HDL], and triglyceride [TG] levels) as instrumental variables to make inferences about whether associations between exposures and disease are likely to be causal [11]. The principle of Mendelian randomization is that analysis of groups defined by common genetic variants is analogous to that of an intention-to-treat analysis in a randomized controlled trial, based on Mendel's laws of segregation and independent assortment. Using genetic variants as "instrumental variables" to proxy modifiable exposures should be unconfounded by environmental factors, represent life-long exposure, and not be subject to reverse causality with respect to the phenotype proxied by the genotype. A genetic score using several single nucleotide polymorphisms (SNPs) in combination can be constructed to represent the additive effect of multiple gene variants to explain more of the variance in the risk factor of interest, avoiding weak instrument bias, and increasing power [12]. Genetic scores using multiple gene variants for lipid traits have been used previously to investigate associations between blood lipids and vascular disease [13][14][15]. If the instrumental variable assumptions hold, as shown in Figure 1, then a test of the association between the instrument and the outcome is a test of the presence of a causal effect of the intermediate on the outcome [16].
Here we test for the possible presence of a causal effect of lipid fractions (LDL, HDL, TG) on prostate cancer using genetic variants for these traits combined in genetic risk scores using the principle of Mendelian randomization [11]. As far as we know, this study is the first to employ Mendelian randomization to investigate the association between circulating lipids and prostate cancer risk.

Methods
Genetic risk scores as instruments for circulating lipid fractions were developed using SNPs previously identified by genome-wide association studies (GWAS). These scores were then applied to 22,249 prostate cancer cases and 22,133 controls within the international PRACTICAL consortium [17] for whom genetic data were available.

PRACTICAL consortium (prostate cancer association group to investigate cancer-associated alterations in the genome)
We investigated the association between genetic risk scores for lipid traits and prostate cancer risk in an individual participant meta-analysis of men in 22 studies of the international PRACTICAL consortium ( Table 1). Fifteen of these studies were based in Europe, five in North America, and two in Australia. Genotypic information was available for 44,382 participants (22, Missing data were excluded from each analysis.
-, no data available. 1 Advanced cancer (T3 to T4 or SEER staging regional or distant), high grade (Gleason the consortium website (http://www.practical.ccge.medschl. cam.ac.uk). Data concerning cancer grade and stage were collected by each study. Cancers were categorized as low grade (Gleason score ≤6) or high grade (Gleason score ≥7), and localized (T1 or T2 on TNM staging, or if not available, "localized" on SEER staging) or advanced (T3 or T4 on TNM staging, or if not available, "regional" or "distant" on SEER staging). Data were not available for grade in two (MEC/UTAH), or stage in three (CPCS1/CPCS2/QLD), of the PRACTICAL studies, respectively.

GWAS identification of lipid SNPs
SNPs associated with the lipid traits under investigation were identified by a review of the current published literature in which papers concerning lipid SNPs were identified by entering appropriate search terms into the web of science database ("Lipid," "SNP," "GWAS"). We selected SNPs that were exclusively associated, at genome-wide significance, with each lipid trait of interest. The majority of SNPs were taken from two landmark lipid GWAS [18,19], the latter being the largest genetic association study of blood lipid levels to be conducted to date (188,577 individuals). For completeness, a number of SNPs from other publications were also included in the analysis [14,20]. The EPIC-Norfolk cohort was used by the referenced GWASs to identify SNPs associated with lipid traits; we therefore excluded this study from our analysis to reduce the chance of under-estimating the causal effect of the lipids on prostate cancer.

Genotyping data
PRACTICAL samples were genotyped using an Illumina Custom Infinium genotyping array (iCOGS) consisting of 211,155 SNPs designed for the Collaborative Oncological Gene-Environment Study (COGS) (details of which may be found on their website http://www. cogseu.org). The array was specifically designed for the evaluation of genetic variants in breast, ovarian, and prostate cancer. As genotypic information was not available for all SNPs in the genetic risk scores, we also used SNPs that had been imputed using IMPUTE2 software [21]. As a sensitivity analysis, allele frequencies and ORs for prostate cancer outcomes using imputed and genotyped data were compared by cross tabulation (Table  S1). As results were similar, imputed data were used in all subsequent analyses. All SNPs with an indication of poor imputation quality were removed (r 2 hat<0.30), as were those with a minor allele frequency of <1%, a call rate of <95%, or those that violated the Hardy-Weinberg equilibrium (P < 0.05).

Gene variants used to create the genetic risk scores
Genetic risk scores were used as instruments to proxy exposure to circulating blood lipids in a Mendelian randomization framework [22]. This analysis assumes that the genetic risk scores used in the analysis influence prostate cancer risk only via their ability to alter the specific lipid trait that they are acting as proxies for (e.g., LDL, Fig. 1). If other biochemical processes or traits are associated with the genetic risk scores (i.e., horizontal pleiotropy is present, where pathways from lipid-associated SNPs to disease are involved that are independent of lipids, see Box 1) and also directly alter prostate cancer risk, this violates a major assumption of Mendelian randomization by introducing genetic confounding. Therefore, only SNPs exclusively associated at genome-wide significance (P < 5 × 10 −8 ) with one lipid trait (either LDL or HDL or TGs, but not more than 1) were chosen for each genetic risk scores to make them as specific to one trait as possible (as in Holmes et al. [15]) (Table S2), minimizing the possibility of pleiotropic effects. Of the 118 SNPs identified as associated with only one lipid trait, 62 were either genotyped or imputed in the PRACTICAL consortium and considered eligible for inclusion in the genetic risk scores based on current biological knowledge. SNPs selected for the genetic risk scores were tested for evidence of linkage disequilibrium (LD) using the SNAP pairwise online tool (http:// www.broadinstitute.org/mpg/snap/ldsearchpw.php), an open access resource which uses pairwise LD data based on phased genotype data from the International HapMap project. A threshold r 2 value of ≥0.85 was used to indicate LD; where SNPs were in LD, the SNP with the largest effect on the lipid trait was selected for the genetic risk scores. SNP genotypes were coded as 0, 1, or 2 depending on exposure to the risk allele. Dosage values (ranging from 0 to 2) were generated for imputed SNPs. LDL, HDL, and TG scores were composed of 11, 36, and 15 SNPs, respectively. Genetic risk scores were created by summing the number of "risk" alleles that each of the 22,249 prostate cancer cases and 22,133 controls were exposed to, such that the greater the number of "risk" alleles a man had, the higher the score. "Risk" alleles were those that were positively related to serum LDL or TG, or negatively related to serum HDL. Published effect sizes (the effect Box 1. Pleiotropy Vertical pleiotropy: A genetic locus is linked to a cascade of events. This is not generally a problem for Mendelian randomization studies. Horizontal pleiotropy: A genetic locus is related to multiple phenotypes. This violates Mendelian randomization assumptions. of the risk allele on the trait in SD) were applied to each SNP and summated to give a "weighted" genetic risk score for each trait [23], for each individual man in each study, so that when estimating the effect of a unit increase in the genetic risk score directly translated to the effect of a SD change in the trait upon the outcome.
In a subsidiary analysis we examined the association of a SNP in 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) (rs12916) in relation to prostate cancer outcomes. rs12916-T has been used previously to mimic statin intervention in order to estimate the causal association of statin use with type 2 diabetes and adiposity measures [24]. ORs for prostate cancer outcomes were reported per rs12916-T allele.

Validation of genetic risk scores and investigation of potential pleiotropy in the ALSPAC cohort
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a birth cohort established to investigate environmental and genetic factors in health and development [25]. ALSPAC data were used to validate the genetic risk scores as instruments for circulating lipid exposure using linear regression. Ten of the 11 SNPs in the LDL score could be included in score validation analyses using ALSPAC data (due to poor imputation quality in ALSPAC, rs1801689 could not be included in the LDL score validation). Thirty-five of the 36 HDL genetic risk score SNPs could be validated in ALSPAC (rs1084651 was poorly imputed) and 14 of the 15 TG genetic risk score SNPs were available for validation in the ALSPAC cohort (rs11649653 was poorly imputed).

Statistical analysis
The genetic risk scores were entered into logistic regression models to estimate the effect per genetically instrumented SD increase in the lipid trait on prostate cancer outcomes.
Outcomes investigated were: all prostate cancer (case vs. control status), grade (high [≥7] versus low [≤6] Gleason score), and stage (advanced versus localized TNM or SEER staging). We conducted the analyses within each of the individual studies that make up the PRACTICAL consortium and then combined the results into a summary odds ratio (OR) for each outcome per unit increase in the genetic risk scores by fixed effect meta-analysis using the "metan" command in Stata v.13 [26]. To test that the instruments (genetic risk scores) were not associated with confounders, we investigated whether each genetic risk score was associated with available covariables that could be potential confounding factors (age, diagnostic prostate specific antigen (PSA) levels, and family history of prostate cancer). To account for potential confounding by population stratification, we adjusted for the top eight principle components (variables concerning the population's genetic architecture). All analyses were performed in Stata v.13 (Stata Corp LP, 2013, College Station, TX).

Results
The genetic risk scores were validated in ALSPAC participants at the age of 7 years ( Table 2). The genetic risk score for LDL explained 0.3% of the variability in circulating LDL and not only was most strongly associated with LDL (linear regression coefficient; 0.56 mmol/L LDL; F stat: 14.11; P = 2 × 10 −4 ), but was also weakly associated with HDL (linear regression coefficient; 0.19 mmol/L HDL; F stat: 5.32; P = 0.02). The HDL genetic risk score was the strongest instrument, explaining 0.9% of variability in circulating HDL and was associated exclusively with HDL (linear regression coefficient; −0.25 mmol/L HDL; F stat: 37.60; P = 9.5 × 10 −10 ). The TG genetic risk score was most strongly associated with TG and explained 0.2%  Table 1). The percentage of high-grade cancers reported varied between studies (27.9-84.2%), as did the proportion of advanced stage cancers (3.5-46.7%). There was little evidence to support an association between genetic risk scores and PSA at recruitment. Family history of prostate cancer was weakly, and imprecisely, associated with the LDL (OR: 0.50, 95% CI: 0.21, 1.19; P = 0.12) and HDL (OR: 1.62, 95% CI: 1.02, 2.58; P = 0.04) genetic risk scores. Four of the top eight principal components were associated with at least one of the genetic risk scores. Age at interview was not associated with the genetic risk scores, with the exception of the TG score (linear regression coefficient: −2.76, 95% CI: −5.52, −0.0003; P = 0.05) ( Table 3).
The ORs for prostate cancer outcomes stratified by grade and stage are summarized in Table 4. There was weak evidence to suggest an effect for a genetically instrumented SD increase in LDL between high-versus lowgrade cancer cases (OR: 1.50; 95% CI: 0.92, 2.46; P = 0.11). There was little evidence to suggest an association between the HDL and TG genetic risk scores and prostate cancer grade: When high-and low-grade cancers were compared, a genetically instrumented SD decrease in HDL gave an OR of 1.03 (95% CI: 0.79, 1.34; P = 0.82), and a genetically instrumented SD increase in TG gave an OR of 0.93 (95% CI: 0.57, 1.52; P = 0.77). For cancer stage, a genetically instrumented SD increase in LDL gave an OR for advanced versus localized cancers of 0.91 (95% CI: 0.51, 1.64; P = 0.77). A genetically instrumented SD decrease in HDL gave an OR for advanced versus localized cancers of 1.02 (95% CI: 0.74, 1.39; P = 0.92). The OR for advanced versus localized prostate cancer per SD genetically instrumented increase in TG was 1.68 (95% CI: 0.95, 3.00; P = 0.08).
EPIC-Norfolk was excluded from the analysis, as the cohort was included in the referenced GWAS discovery panels. As a sensitivity analysis, we included EPIC-Norfolk in the meta-analysis. The results for EPIC-Norfolk were in agreement with the combined estimates for the odds of prostate cancer by the genetic risk scores (Fig. S1). As some associations were present between the genetic risk scores and family history or age at diagnosis (Table 3), we performed a sensitivity analysis adjusting for these variables. The results were unaltered following adjustment for age with the TG score and for family history with the LDL and HDL scores (data available on request).
The rs12916-T variant in HMGCR was weakly associated with a decreased risk of prostate cancer overall (OR: 0.97; 95% CI: 0.94, 1.00; P = 0.03; I 2 = 0%) (Fig. 3). Studies with more than 20% missing data were excluded from each analysis. Linear regression models take clustering by substudy into account. LDL, low-density lipoprotein; HDL, high-density lipoprotein; TG, triglyceride; OR, odds ratio. 1 Higher LDL/TG scores reflect increasing circulating LDL/TG, higher HDL scores reflect lower circulating HDL. 2 Family history of prostate cancer (in father or brother): compares Yes versus No (logistic regression).
C. J. Bull et al.

Discussion
We did not find evidence to suggest an association between circulating LDL, HDL, or TG (proxied by genetic risk scores) and overall prostate cancer. Although power to investigate advanced/high-grade cancer is limited (as evidenced by wide confidence intervals), our results may indicate a potential role for LDL and TG in prostate cancer progression. These findings are clinically important, as they highlight a distinction between indolent disease and more aggressive cancers. Variation in HMGCR has been exploited previously to demonstrate the implications of statin treatment on type 2 diabetes and bodyweight [24]. Our results from an analysis using a variant in HMGCR (rs12916-T) to proxy statin intervention suggest that statins may hold potential in prostate cancer prevention, but our inference is tentative and requires further investigation in larger sample sizes. Associations observed between the LDL and HDL genetic risk scores and family history findings may be an artifact of multiple testing. However, these findings warrant further investigation with larger numbers to increase the precision of the point estimates.
Meta-analyses of observational studies and randomized controlled trials present heterogeneous findings for associations of cholesterol with prostate cancer, making it difficult to conclude whether cholesterol plays a role in prostate cancer. The most recent and comprehensive assessment of observational data did not show any association between HDL and LDL and prostate cancer, but could not draw definitive conclusions on high-grade prostate cancer due to limited data [5]. Our Mendelian randomization approach has several advantages over conventional observational epidemiology: it eliminates the problem of reverse causality, as prostate cancer status cannot alter

A B C
one's germline genetic makeup; genetic risk scores represent an individual's exposure to lipid traits over their lifetime, reducing biological and technical sources of measurement error that arise from one-off blood sampling at one point in a person's life; and confounding by behavioral, lifestyle, and other related intermediate traits should be minimized as individuals are effectively randomly allocated to a low or high level of exposure based on their genotype, randomly generated at conception (Mendel's second law of independent assortment) [27]. However, Mendelian randomization is susceptible to genetic confounding if the SNPs used as instruments for the trait of interest have effects on other phenotypes besides the specific lipid of interest, and it is these other phenotypes which lead to prostate cancer (horizontal pleiotropy) [16]. There is also potential for confounding due to population stratification; however, as we have adjusted for principle components in our regression models, this should be minimized. We cannot be sure that the LDL score is exclusively associated with LDL as validation of the score in ALSPAC revealed possible pleiotropy with circulating HDL. However, the HDL score which is strongly associated with HDL, and not LDL or TG in ALSPAC, was not associated with prostate cancer; therefore, the weak association noted between the LDL score and high-grade prostate cancer is likely an LDL, not an HDL effect. As F statistics for the genetic risk scores relate to ALSPAC and not PRACTICAL (the dataset for the outcome), conventional thresholds, such as F > 10 are not relevant for this study. The dataset for the outcome is far larger than the dataset in which we tested the genetic risk score-trait association; therefore, it may be that the genetic risk scores are in fact stronger instruments for circulating lipid traits in PRACTICAL than we estimated.  A recent meta-analysis of 14 prospective studies reported that blood LDL and HDL were not associated with either overall prostate cancer or high-grade cancers; however, subgroup analysis was only performed on a limited number of studies and it is likely that the study was underpowered to detect any effect of these traits on cancers stratified by stage and grade [5]. Our findings suggestive of a possible role for LDL in high-grade prostate cancer and for TG in advanced stage prostate cancer are supported by Andreassen et al., who found evidence of an association between these traits and prostate cancer with the use of an alternative genetic epidemiologic method [28]; however, this analysis was also conducted with data obtained from the PRACTICAL consortium. Using conjunction false discovery rate analysis, they were able to combine summary statistics from GWAS for the identification of genetic overlap between the two phenotypes (blood lipids and prostate cancer). Similarly, they found no pleiotropic enrichment for HDL in prostate cancer.
A meta-analysis of 27 observational studies reported a 7% relative reduction in total prostate cancer risk with statin therapy [8]. Our results for the rs12916-T SNP and prostate cancer outcomes support similar conclusions; however, it is possible that the observed effects are via the increase in type 2 diabetes phenotypes associated with this allele [24], as an inverse association between type 2 diabetes and prostate cancer has been reported [29].
Cholesterol is thought to have multiple procancer effects at the cellular level: it is involved in cellular proliferation, inflammation, membrane organization, and steroidogenesis. Our findings are supported by preclinical work showing increasing concentrations of LDL support the proliferation of prostate cancer cell lines, but not normal epithelial cells, suggesting cholesterol metabolism is reprogrammed in prostate cancer [30]. Furthermore, as signal transduction proteins are located in cholesterol-rich membranes, it makes sense that oncogenic signaling might be regulated in a cholesterol-sensitive manner [31]. As statins are largely retained in liver, any effects on prostate cancer are not likely directly on the tissue, but via their ability to reduce circulating cholesterol [32]. Statins have also been associated with similar cancer phenotypes in cellular and animal models, indicating that the effects of statins observed in populations are likely due to on-target effects of statins (i.e., their ability to reduce circulating cholesterol via HMG-CoA reductase inhibition) rather than off-target effects, although these have been reported [33][34][35]. Further research on this subject is needed, as previous work has found that statins differentially affect prostate cancer depending on their subtype [36] and the cancer phenotype [32].
A large number of genes robustly associated with serum lipids were used in the generation of genetic risk scores and their association (at genome-wide significance level) with just one of the lipid traits may mitigate against results being due to pleiotropy with other lipids. However, key SNPs likely regulate multiple traits, so would therefore have been excluded from our analysis. This highlights a complication in investigating complex traits and the biological complexity of homeostatic mechanisms: it is hard to encompass the variability in a given lipid trait while keeping the genetic risk score specific; this is likely reflected by the magnitude of the r 2 statistics in our score validation (Table 1). We would have liked to adjust for statin use in our analyses; however data were not available. By including patients who are taking statins in our analysis we may have reduced the true effects of the genetic risk scores on prostate cancer outcomes.
We conclude that this study presents tentative evidence of a potential role for LDL and TG levels in prostate cancer etiology, but not for HDL. These inferences assume Mendelian randomization assumptions (such as absence of pleiotropy) hold. Future work will involve developing stronger and more specific instruments for circulating lipid traits, and making use of new instrumental variable methods, such as Egger regression which controls for pleiotropy [37]. If confirmed, these findings are potentially of translational importance as they indicate that lowering LDL cholesterol may be beneficial in reducing risk of high-grade prostate cancer. Table S1. Genotyped versus imputed SNPs. Table S2. SNPs used to generate allele scores for an unfavorable lipid profile. Figure S1. Meta-analysis OR prostate cancer per unit increase in genetic risk score (SD trait).