Mendelian randomization for studying the effects of perturbing drug targets

Drugs whose targets have genetic evidence to support efficacy and safety are more likely to be approved after clinical development. In this paper, we provide an overview of how natural sequence variation in the genes that encode drug targets can be used in Mendelian randomization analyses to offer insight into mechanism-based efficacy and adverse effects. Large databases of summary level genetic association data are increasingly available and can be leveraged to identify and validate variants that serve as proxies for drug target perturbation. As with all empirical research, Mendelian randomization has limitations including genetic confounding, its consideration of lifelong effects, and issues related to heterogeneity across different tissues and populations. When appropriately applied, Mendelian randomization provides a useful empirical framework for using population level data to improve the success rates of the drug development pipeline.


Introduction
The majority of small molecule and biologic drugs exert their effects by perturbing protein targets 1 . The identification of such targets is therefore central to drug discovery. Despite increasing investment in research and development within the pharmaceutical industry 2 , overall drug development failure rates remain high [3][4][5][6][7][8] , most notably for targets that represent novel mechanisms. Such failures result in increased costs and reduced availability of novel agents 9 .
With the recent growth in genetic data 10 , there has been substantial progress in the identification of genes that are linked to human health and disease. Genetic data can potentially be used for identifying and prioritizing novel drug targets and indications 2 . For example, genome-wide association studies (GWAS) have corroborated approximately 70 of the 670 known effects of licensed drugs through associations at the loci of the genes coding for their corresponding target proteins 11 . Studies of drug development programs have also shown that targets with genomic support have a higher rate of success 2,12-15 .

Mendelian randomization
Through the random allocation of genetic variants at conception, genetic studies in human populations can imitate the design of randomized controlled trials (RCT) 16,17 . Such investigation uses genetic variants as instrumental variables for studying the effect of an exposure on an outcome, and has been referred to as Mendelian randomization (MR) 18 . Phenotypic observational studies are limited in their ability to draw causal inferences due to bias from confounding and reverse causation 18 . In contrast, MR uses the random allocation of genetic variants from parents to offspring during conception to guard against these biases.
MR requires the following instrumental variable assumptions: the genetic variant i) is associated with the exposure (relevance), ii) has no common cause with the outcome (independence), and iii) only affects the outcome via the exposure (exclusion restriction) 19,20 . The first of these is testable; the remaining assumptions are untestable but falsifiable. Assumption iii) the exclusion restriction, assumes that the genetic variant affects the outcome through the exposure and not any other horizontally pleiotropic pathways 18,21 . Further assumptions are also required to obtain valid point estimates, for instance, that the influence of the exposure on the outcome is the same for all individuals (effect homogeneity) or that the exposure is a monotonic (always increasing or always decreasing) function of the instrument for all individuals in the population (monotonicity) 19 . In addition, the interpretation of MR findings can have particular nuances, as previously described 22 .
Where the exposure under study is perturbation of a drug target, MR can be used to explore drug effects ( Figure 1) 23,24 . MR makes use of genetic variants located within or close to a gene encoding a drug target (e.g. at HMGCR encoding the drug target of statins) that lead to downstream effects similar to the desired drug response (e.g. lowering of low-density lipoprotein [LDL] cholesterol) in order to explore effects on clinical outcomes (e.g. risk of coronary artery disease). SNP: single-nucleotide polymorphism.

Amendments from Version 1
The following author was added: Robert A Scott. This author was omitted in the initial submission as an accident. All co-authors are aware (from previous drafts of the paper) that this author should be included. A conflict of interest statement was also added.
Any further responses from the reviewers can be found at the end of the article REVISED For drug target MR specifically, genetic variants such as single-nucleotide polymorphisms (SNPs) related to the function or expression of the drug target protein can be used as instrumental variables to study the effect of perturbing that drug target 25,26 . These variants are typically in or near the gene that encodes the drug target (cis-variants). Such MR can be used in drug development to investigate the likely efficacy and safety of perturbing novel drug targets 27,28 , as well as explore the repurposing potential and adverse effects of existing drugs 25 .
The identification and validation of appropriate genetic variants as instrumental variables for an exposure is critical for the design and interpretation of all MR analyses 29 . While previous work has offered practical advice on selecting instruments for MR studies considering disease biomarkers 24 , the field is continuing to evolve rapidly 30 . The growth in genetic association study data that extends to tissue-specific gene expression 31 , circulating proteins 32 , metabolites 33,34 and cytokines 35 , has been coupled with increased efficiency of MR studies using automated software, databases, statistical packages and readily available code 30,36-38 . However, there is still no consensus on the strategy for identifying genetic instruments and exploring potential drug effects with MR. Here we discuss practical considerations while also offering illustrative examples for the most relevant points. We describe issues relating to selection of genetic variants as proxies for drug target perturbation, evaluation of the plausibility of genetic variants as proxies for drug target perturbation, generation and interpretation of MR estimates, and limitations of MR for investigating drug target perturbation. Finally, we offer a step-by-step framework for how to conduct a drug target MR study (Box 1).

Instrument selection
MR investigations of drug effects have mainly studied small molecule, peptide and biotherapeutic drugs 39,40 , where genetic instruments are selected as variants that mimic perturbation of their protein targets. Instrument selection can be considered in two parts: i) identifying the gene or group of genes corresponding to the drug target proteins and ii) selecting genetic variants to proxy perturbation of the drug targets. These steps are discussed in detail below, followed by consideration of drugs that have targets made up of multiple proteins.
Identifying genes corresponding to drug target proteins The key difference between conventional MR for an exposure and MR for the investigation of drug effects is that for the latter the instrument can be constructed in relation to the gene corresponding to the drug target, rather than genetic variants from across the genome ( Table 1). The first step of this process is therefore to identify the drug target of interest and its corresponding gene. Resources such as DrugBank (which is freely available for non-commercial purposes) provide information about existing drugs, including their mechanism of action, targets and their corresponding gene(s), and indications 41 . Where the target of a drug is known, information regarding the corresponding gene can also be obtained from other databases such as Ensembl and UniProt 42,43 .
Selecting genetic variants to proxy drug targets Several factors need to be considered when selecting genetic variants to proxy the effects of drug target perturbation. If MR is being used to investigate effects of perturbing the target of a drug with an existing indication, then instruments can be selected based on their location at the corresponding gene and association with that indication. For instance, Gill et al. selected genetic variants to proxy antihypertensive drug class effects as those located at the gene corresponding to the drug target that also related to systolic blood pressure in a GWAS 44 . If the indication is not known, one possible approach is to use quantitative trait loci for expression of the gene encoding the drug target of interest (in relevant tissues or cell contexts) as instruments for drug target perturbation. An important limitation of gene expression is that variants affecting gene expression may not necessarily also affect protein expression, and vice versa (Figure 2) 45 . Furthermore, gene expression quantitative loci have been reported to account for little of the heritability of complex diseases 46 . Therefore, protein expression quantitative loci may make better instruments for proxying drug effects than gene expression data, if they are available in relevant tissues and contexts.
As mentioned above, selected instruments for drug target perturbation are often restricted to cis-acting genetic variants -those in or close to the gene of interest. In general, such variants are more likely to have effects specific to the protein of interest than genetic variants that are not located within the gene locus (trans-acting). This point is well highlighted by the example of C-reactive protein, for which MR analyses investigating effects of its circulating levels produce very different results depending on whether the instruments are selected from Box 1.
Step-by-step guide for conducting Mendelian randomization (MR) analyses of drug target perturbation

1.
Determine the drug targets of interest

Conventional MR MR investigating drug effects
Aim of the analysis To investigate the effect of an exposure on an outcome To investigate the effect of perturbing a drug target on an outcome

Genomic location of instruments
Genome-wide Often restricted to the locus of the gene encoding the drug target under study

Selection of genetic instruments
Variants associated with the exposure under study Variants associated with perturbation of the drug target under study

Statistical analysis
Typically uses uncorrelated variants; higher risk of pleiotropic effects on the outcome through pathways unrelated to the exposure More frequent use of methods to account for correlation between instrument variants; lower risk of pleiotropic effects on the outcome through pathways unrelated to the drug target the CRP locus or from throughout the genome 24 . Related considerations include how proximal cis-variants should be to the corresponding gene, and whether enhancer or promoter regions for the gene should be counted as cis-acting 47 . While some evidence currently supports that genetic variants affecting gene expression typically lie within 200kB of the gene locus 48 , there is no established consensus on the issue of proximity. The pertinent requirement is that the variant be related to the function or expression of the drug target. In the past, single-region MR analyses have sometimes used only the top variant in the region as a genetic instrument 28,49,50 . However, this approach can also be suboptimal if studying a region containing variants that have multiple conditionally independent associations with the exposure. For example, genetic association studies have suggested that the SHBG region encoding sex-hormone binding globulin (SHBG) may harbor up to nine variants independently associated with circulating SHBG concentration 51 , and that using only the top variant may limit the statistical power of such MR analysis.
The degree to which variants at the same locus should be allowed to correlate with each other through linkage disequilibrium (LD) while still being modelled as independent also warrants attention. Unaccounted correlation between the variants used can result in underestimation of the standard error of MR estimates, yet there is no recommended LD threshold. To circumvent this issue, methods are available to adjust for LD between genetic variants used as instruments, which may help confirm the robustness of the findings and maximize statistical power 52-54 .
Investigating drugs with multiple targets Many drugs do not have a target that is encoded by a single gene. For example, the calcium channel blocker class of antihypertensive drugs have targets that are made up of proteins coded by several different genes 44,55 . At present there is no consensus on the best way to combine data from multiple genes corresponding to a single target into an instrument. Previous studies have selected genetic variants related to the individual genes and combined their data to investigate the effect of perturbing the drug target, while applying clumping to ensure independence as described above 44,55 .

Instrument evaluation
Once the instrument has been selected, it can be evaluated to ascertain its validity for the analysis of interest. MR analyses exploring drug effects can be biased if the genetic variants incorporated as instruments have "horizontal" pleiotropic effects, where there are pathways from the variant to the specific outcome under consideration that do not pass through the exposure of interest 56 . In contrast, "vertical" pleiotropy lies on the causal pathway between the pharmacological mechanism and outcome 56 . Vertical pleiotropy does not bias MR estimates and is often of interest as it can provide insight into causal mechanisms and mediation. As with MR generally, one of the most useful approaches for evaluating instrument validity is to investigate its relation to a known effect of the exposure under consideration 57 . This approach is feasible for MR used to predict the effect of perturbing targets for which there are drugs with established indications and known associations with biomarkers 58 . For example, Walker et al. selected genetic variants to proxy antihypertensive drugs from gene expression data and validated these instruments through their associations with systolic blood pressure, prior to applying MR analyses investigating the outcome of interest, Alzheimer's disease 55 . An instrument may also be examined in relation to potential confounders, in order to investigate violations of the independence and exclusion restriction assumptions necessary for MR 56 . Berry et al. illustrated such an approach during their evaluation of genetic proxies for vitamin D status 59 . In this study, the association of variants with social, dietary and lifestyle factors was investigated, to identify potential sources of confounding.
Complementary data may also be used for instrument evaluation. For example, MR studies designed to investigate the effect of genetically predicted variations in interleukin-6 (IL6) signaling would be expected to show that the selected instruments associate with molecules that are downstream of the pathway 49 . Genetic association estimates for the serum levels of several of these molecules are available, including IL6 and IL6 receptor (IL6R), C-reactive protein (CRP) and fibrinogen 60 . Hence, if the selected genetic instruments are valid proxies for IL6 signaling, they may be expected to show consistent effects across these molecules. An alternative example is provided by Wurtz et al. who demonstrated consistency between the metabolic changes associated with starting statins and metabolomic associations of the HMGCR variant rs12916 that was used to proxy statin effect 61 .

Analysis
Given a set of genetic instruments, the statistical methods used for MR investigation of drug target perturbation are similar to those used for MR more generally 62 . Interpretability is often facilitated by scaling of genetic associations to unit change in a trait related to drug target perturbation. For example, for analyses considering associations of variants in the HMGCR gene that are used to proxy statin drug effects, estimates may be scaled to change in low-density lipoprotein cholesterol levels 50,63-65 . As another example, for analyses investigating IL6R signaling using variants in the IL6R gene, effects may be scaled to downstream changes in CRP levels 49,60 . Care must be taken in the interpretation of such scaled estimates however, because although MR estimates may be directionally concordant to the effect of drug target perturbation on the biomarker, their magnitudes may not be comparable 66 .
Statistical approaches used to evaluate potential bias from horizontal pleiotropy in MR analyses can also be used in MR investigating drug target perturbation 62 . However, variants selected as instruments for drug target perturbation are often selected from within a specific locus rather than from throughout the genome, and may be limited in number. Statistical sensitivity analyses for investigating horizontal pleiotropy typically require large numbers of genetic variants, and so may not be suitable for many drug target MR analyses 29 . Assessment of heterogeneity between MR estimates produced by variants in a single locus is still possible however, and can be used to inform on potential bias related to horizontal pleiotropy 53,67 .
In an effort to better explore the target region and increase statistical power, genetic variants that have weaker associations with perturbation of the drug target may be considered as instruments 27,68 . Despite the potential benefits of this approach 69 , care must be taken to avoid weak instrument bias 54,70 . Under a two-sample design, weak instrument bias will attenuate MR estimates towards the null 71 .
MR can be used to assess a wide range of outcome traits and thus investigate potential effects of perturbing the drug target on these traits 72 . Such studies are often conducted as hypothesis-free, phenome-wide association analyses (PheWAS) 73,74 , and can be helpful for exploring potential adverse effects or identifying previously unknown re-purposing opportunities. For example, Schmidt et al. conducted a PheWAS of the PCSK9 locus to assess potential adverse effects of PCSK9 inhibitor drugs 75 .
In addition to using MR, it is also possible to generate genetic evidence supporting a causal effect of drug target perturbation on an outcome by identifying proportionality of genetic associations with traits proxying drug target perturbation and the outcome, at the corresponding drug target gene locus. Such investigation is referred to as genetic colocalization, and can help distinguish causation from genetic confounding (such as may arise due to horizontal pleiotropy). Popular colocalization methods include coloc 76 , moloc 77 , eCAVIAR 78 and HEIDI 79 . However, a limitation of many colocalization approaches is that they assume there is only a single causal variant at the considered locus.
Triangulation, the practice of integrating evidence from several different methodological approaches and data sources that each differ in their susceptibility to bias, is another important aspect of interpreting the analysis 80  Therefore, MR estimates reflect the lifelong effects of perturbing a drug target, which may not be equivalent to interventions given at a specific point in time and for a shorter time period (Figure 3). While these differences make it unlikely that MR estimates will accurately reflect the size of effect of a pharmacological intervention, they are still a useful indication of presence and direction of causal effects 58 .
A further limitation of MR for studying the effects of drug target perturbation is that it may not account for post-transcriptional and post-translational modification in the pathway from a gene to a biologically functional protein.
Well-conducted MR analyses may be able to inform broadly on drug class effects, but not necessarily provide information on the effects of a specific pharmacological agent. For example, dihydropyridine and non-dihydropyridine subclasses of calcium-channel blocker antihypertensive drugs have distinct pharmacological effects. Genetic variants that affect blood pressure via calcium-channel blockade can estimate the effects of calcium-channel targeting drugs in general, but cannot differentiate the relative effects of dihydropyridine versus non-dihydropyridine subclasses 44 . Furthermore, MR in this context is applied to drug targets and not compounds -so it can be used to investigate the effects of perturbing a drug target, but is unlikely to be able to offer insight towards molecule specific effects 28 .
Drug effects also vary in different tissues and populations, and similarly MR estimates for the effects of perturbing drug targets may only be valid if genetic association data from the relevant tissues or populations are used. This limitation can have implications for both identifying instruments and using MR to study drug effects, as highlighted in an example that used gene expression data to identify instruments for antihypertensive drug classes in the investigation of repurposing potential for the prevention of Alzheimer's disease 55 . Here, it is not clear whether the same genetic variants related to gene expression in vascular, cardiac and brain tissue. Furthermore, to date, most genotyped samples have been sampled from European ancestry populations. While this approach minimizes the risk of population stratification and false-positive GWAS signals, consideration of distinct ancestral groups is likely to offer novel insight. For example, genetic evidence on the effects of alcohol comes from variants in the ALDH2 gene, which are common in Asian, but not European populations 84 .

Conclusion
Over the last decade, MR has become a widely used epidemiological tool for estimating the causal effects of risk factors on clinical outcomes. On top of this well-studied application, there are now multiple examples highlighting its power for investigating drug effects. Despite its explicit assumptions, modern developments in statistical methodology and the widespread availability of multiple levels of omics data have provided the necessary resources to more reliably and efficiently use MR in order to study drug effects. As such, it has found a growing niche within the broader framework for exploring therapeutic targets, efficacy, adverse effects and repurposing potential. Given the high failure rates of clinical trials and that drug targets with genetic support are more likely to make it through the development pipeline 13,15,23 , MR can provide evidence for prioritizing agents to move forward in development.

Data availability
No data is associated with this article.

Publisher Full Text
Is the topic of the review discussed comprehensively in the context of the current literature? Yes

Is the review written in accessible language? Yes
Are the conclusions drawn appropriate in the context of the current research literature? Yes conception. As such, MR studies usually only concern survivors of their genetic make-up and also of the disease of interest, i.e., they may be selected on genetic variants and the disease of interest, meaning they are open to selection bias. In this situation estimates for the drug target on the disease of interest might be attenuated, reversed, or spurious. In addition, inadvertently selecting survivors of their genetic variants and competing risk of the disease of interest will similarly affect the estimate for the drug target on the disease of interest, even if nobody has died of the disease of interest before recruitment. Examples of this issue include the genetic variants corresponding to PCSK9 inhibitors appearing to have no effect on stroke 1 and the genetic variant corresponding to statins, i.e., rs12916, appearing to have no effect on stroke. 2 These findings conflict with RCTs, and are likely the result of selection bias, as described above.
Unlike RCTs, MR is best suited for investigations of the effects of drugs intended to reduce the risk of disease but is not necessarily suitable for studies of drug effects in patients. An MR study of a disease treatment drug in patients would be open to selection bias because patients are typically survivors of their genetic make-up and of the disease of interest. This point has been explained in relation to genetic studies, 3 and solutions are underdevelopment for both genetic studies and MR. Researchers often want to use MR to assess effects of drugs in a patient sample just as they can in an RCT, so it needs to be explained when such an MR study would and would not be likely to give valid results.

2.
Criteria for suitable samples for selecting genetic instruments for a drug need to be given. For example, should the sample be population-representative, in young people, from a trial of the relevant drug, from controls only of a case-control study? 3.
There are also some other points that need clarification. Paragraph 3, The major sources of bias in phenotypic observational studies are now understood to be confounding and selection bias, as explained here. 4,5 So, selection bias needs to be added to the second to last sentence of this paragraph. The last sentence of the same paragraph then needs to be amended slightly, because MR is open to selection bias, as explained above.

1.
Paragraph 4, the elaboration of the exclusion restriction should explain what "horizontal pleiotropy" is because it has not been mentioned before. Please also make it clear that this assumption is violated if any open path from genetic variant to outcome exists that does not affect the exposure. 6 Instrument evaluation first paragraph, please similarly clarify the first sentence.

2.
In the Limitations section, or elsewhere, please explain the differences between an RCT and an MR study in terms of selection bias and types of questions that can be answered by each design. Ensure Figure 3 reflects these differences.

Hong Kong
Review of the manuscript (Gill et al., 2021) This is a clearly written paper concerning the use of genetics to infer drug's effects on health outcomes. The paper summarises landmark studies/commentaries in this area of research and have covered the majority of the points. I have some comments for the authors' consideration.

Major comments
Although the use of genetics to infer effect of medication is promising, one of the challenges is the identification of a relevant variant to mimic medication use.  Figure 1 is a useful illustration, the last panel may be misinterpreted as decreasing LDL is associated with a greater risk of CAD. As such, Figure 1 should be amended for clarity. Secondly, as authors correctly acknowledge, restricting instruments to cis variation is the standard practice to mitigate the potential for genetic confounding, but it should also be noted that this still leaves open the possibility for genetic confounding, particularly in regions containing multiple protein-coding genes. Multivariable MR conditioning for the effects of the genetic instruments on the surrounding gene products may represent a way to assess whether findings may be confounded by proximal gene products when such data is available, an approach implemented by Porcu et al. (2019) (https://www.nature.com/articles/s41467-019-10936-0). 1 Similarly, in the section titled "Selecting genetic variants to proxy drug targets", a qualifying statement may be warranted regarding the selection of genetic variants based on association with known indications. If pQTL data is available, this approach may better serve instrument validation while a selection of variants is supported by evidence of association with protein expression or function. If no such data is available, evidence of association with known indications can be used, but it warrants a cautious approach with careful consideration of the cis window and other protein-coding genes therein. Thirdly, authors should further comment on the careful use of trans-instruments to approximate drug targets as was executed by Georgakis  have a role in triangulation of evidence." Perhaps this should be briefly commented on in the context of technological biases between different proteomics and metabolomic technologies (for example https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6277005/) 3 thus further emphasizing the importance of ensuring the robustness of MR results using alternative instruments. As proteomewide pQTL data becomes increasingly available to the public, such sensitivity analyses should become more commonplace. Overall, this review is exquisitely written, thoughtful, and appropriately referenced. Accordingly, we have no major concerns and only offer the aforementioned minor comments.