Selection of Genetic Instruments in Mendelian randomisation studies of sleep traits

Summary This review explores the criteria used for the selection of genetic instruments of sleep traits in the context of Mendelian randomisation studies. This work was motivated by the fact that instrument selection is the most important decision when designing a Mendelian randomisation study. As far as we are aware, no review has sought to address this to date, even though the number of these studies is growing rapidly. The review is divided into the following sections which are essential for genetic instrument selection: 1) Single-gene region vs polygenic analysis; 2) Polygenic analysis: biologically- vs statistically-driven approaches; 3) P-value; 4) Linkage disequilibrium clumping; 5) Sample overlap; 6) Type of exposure; 7) Total (R2) and average strength (F-statistic) metrics; 8) Number of single-nucleotide polymorphisms; 9) Minor allele frequency and palindromic variants; 10) Confounding. Our main aim is to discuss how instrumental choice impacts analysis and compare the strategies that Mendelian randomisation studies of sleep traits have used. We hope that our review will enable more researchers to take a more considered approach when selecting genetic instruments for sleep exposures.


Introduction
Sleep is a complex phenotype regulated by homeostatic and circadian processes [1] and characterised by multiple dimensions such as duration, quality and timing or chronotype [2].Studies have shown that these specific sleep traits are moderately heritable, with twin studies estimating that between 44-50% of their variability is genetically determined [3,4], while SNP-based heritability studies have shown that heritability of self-reported traits ranges from 5 to 15% [5][6][7][8].Moreover, these dimensions have been consistently associated with several adverse health outcomes.For example, inadequate sleep duration, poor quality, and inappropriate timing are associated with adverse health consequences [9].However, as most research has used observational epidemiology to study these associations, whether these links are causal remained elusive until very recently.
Mendelian randomisation (MR) is a method that uses genetic variants to assess causal relationships [10].The MR method addresses two questions: whether an observational association between an exposure and an outcome is causal alongside the magnitude of this effect [11,12].MR is increasingly used to overcome some limitations of traditional observational epidemiology, such as unmeasured confounding and reverse causality [10], and the analysis is facilitated by MR packages, such as the widely used "MendelianRandomization" package for the R open-source software environment [13] or the "mrrobust" Stata package [14].Recently, numerous MR studies examined the causal relationship between genetic instruments of sleep traits and different health outcomes .
In the context of MR, a genetic variant can be considered an instrumental variable (IV) for a given exposure if it satisfies the following assumptions: i) it is associated robustly with the exposure of interest, ii) it does not influence the outcome through a pathway other than the exposure (horizontal pleiotropy) and iii) it is not associated with the outcome due to confounding [12].Genetic variants used as IVs in MR are usually single-nucleotide polymorphisms (SNPs), a common variation at a single position of DNA sequence [10].
MR studies have steadily grown as genetic variants reliably associated with different exposures have increased over the last decade, thanks to genome-wide association studies (GWAS) [11].GWA studies now test millions of genetic variants for their association with a given trait.Thus, finding genetic polymorphisms to use in an MR study is becoming more feasible.However, selecting optimal genetic instruments can be challenging [61].
Although several guides exist for conducting MR studies [12,61,62], these are not widely adopted in the field of epidemiology of sleep; thus authors using genetic instruments for sleep traits have taken different approaches to the selection process.This review explores the criteria used for instrument selection in MR studies of sleep traits, discussing how this choice impacts analysis and some steps in the selection process that are often overlooked.We aim to demonstrate the importance of a careful selection of instruments to conduct an MR study.Nonetheless, it is worth mentioning that the selection process will always depend on the aim of the research and the specific exposure under study, and while we focus on MR studies of sleep traits, many of the issues discussed here apply to other behavioral phenotypes as well.In addition, even though MR has been particularly useful for understanding the causal role of sleep phenotypes on several health outcomes, other causal methods must also be used for replication and triangulation purposes.A summary of the main points to consider when selecting sleep genetic instruments in MR is presented in Figure 1.

Criteria used to select genetic instruments in MR for sleep traits 1 Single-gene region vs. polygenic analysis
The first step in instrument selection is to decide whether the analysis will be performed using variants from a single gene region or multiple regions (a polygenic analysis).When a particular region has been reported to have a specific biological link to the exposure, the selection usually focuses on these variants [12].This approach has the advantage of specificity, leading to a more plausible MR [63].However, for complex risk factors such as sleep, no single gene region encodes this risk factor [64].In fact, numerous genomic variants have been discovered by sleep GWAS in adults, indicating that sleep is a highly polygenic trait.For example, for insomnia, 554 risk loci have been reported in a recent study [65].Thus, a polygenic analysis is often used in MR studies of sleep traits.
A polygenic analysis supposes the inclusion of multiple variants [12].If the variants are all valid instruments, power is maximised because each SNP contributes incrementally to affect levels of the biomarker [61,63].In the case of sleep, as common individual genetic variants confer small effects, the polygenic approach will typically have greater power to detect a causal effect than the single gene region approach [12].
2 Polygenic analysis: biologically-driven vs statistically-driven approach For a polygenic analysis, one of two approaches may be chosen for selecting genetic variants: a biologically-driven or a statistically-driven approach [12].The former implies selecting variants from regions with a highly plausible biological link with the exposure of interest [12,61].The advantage of this approach is that these instruments may be less susceptible to horizontal pleiotropy [61].However, biological understanding is rarely infallible [12], and the biological basis of sleep in humans is not fully understood [66].Thus, instrument selection is often performed using a statistically-driven approach [67] or a combination of both approaches [12].
The statistically-driven approach exploits the increasing availability of SNPs associated with specific exposures in GWAS [61].For this reason, authors tend to search for the latest and largest GWA study available and select SNPs robustly associated with the exposure of interest (MR assumption 1).However, it is important not to assume that the latest and largest study will always yield the best instruments.For example, most published GWAS of sleep traits have been performed in European samples and are also not sex-specific.Nonetheless, some GWAS have been performed in other ethnic groups, including Hispanic/Latino Americans [68] and multi-ancestry samples [69][70][71][72][73][74].Furthermore, some have employed sex-stratified analyses for obstructive sleep apnea and insomnia, which display marked sexual dimorphism in disease prevalence [69,75].However, further work is needed to better understand sex-related sleep differences, which have been associated with the influence of sex hormones on sleep regulation but have been understudied [76].
When using a statistically-driven approach, it is crucial to evaluate the reported SNPs carefully.Briefly, as described more thoroughly in the review, some important criteria for instrument selection include: 1) evaluating the number of variants to incorporate and their p-value, minor allele frequency (MAF) and whether they are palindromic; 2) selecting independent variants; 3) avoiding sample overlap between the discovery GWAS and the data under study (where possible); 4) prioritisation of GWAS with well-measured/defined phenotypes and determine whether to use a continuous or a binary exposure; 5) choosing variants based on their total and average strength and; 6) taking into account confounders of the genetic instrument-outcome relationship.If the instruments are not suitable, they could be selected from a different GWA study, which could even mean choosing them from an older one.In addition, combining the SNPs into a single instrument is another option if various studies report adequate variants.Where possible, it is best practice to choose SNPs found both in the discovery dataset and in a replication cohort, as these are likely to be more reliable.
In Table 1, we present the latest GWAS of sleep phenotypes (for a detailed list of SNPs reported in the GWAS see Supplementary Table 1).Of note is that some sleep traits are still lacking robust instruments.This is the case of sleep quality and multidimensional sleep (whereby rather than a series of single separate characteristics, sleep is thought of as a multidimensional construct with domains including regularity, satisfaction, alertness/ sleepiness, timing, efficiency, and duration, among others) [77,78].

P-value
A common statistical approach, and usually the first step in MR studies, is to evaluate the level of statistical significance of the genetic variants associated with the exposure of interest and to include all variants at a given level of significance.The conventional threshold is p<5×10 −8 [12].This threshold is the equivalent of p<0.05 when corrected for the multiple testing based on performing a Bonferroni correction for all the independent common SNPs across the human genome and thus, it is referred to as "genome-wide level of significance".Using this threshold has been shown to lead to robust results [79].Nonetheless, given the more recent mega-GWAS because of access to large biobanks, there have been proposals to change it to p<5×10 −9 to decrease the chances of false-positive associations.However, the latest MR studies of sleep traits have selected variants using the traditional threshold [30,44,51,52].
In the field of sleep, some of the first GWAS published discovered genetic variants for restless leg syndrome, a neurological disorder that causes involuntary leg movements during sleep [80,81].Recently, a large GWAS reported that some of the variants previously associated with restless leg syndrome did not reach genome-wide significance, emphasising the need for stringent thresholds [82].Even though it is important to consider the p-value threshold, this is not the only factor to consider when selecting variants for an MR study.
The following sections will discuss other steps to assess whether variants are valid genetic instruments.

Linkage disequilibrium clumping
Linkage disequilibrium refers to the correlation between SNPs at different positions.This phenomenon occurs because of the physical proximity of variants on the chromosome [10].In GWAS, the reported variants are often 'clumped' to near independence using distancebased or correlation-based thresholds [67].
The distance-based approach consists of pruning the variants to include those separated by a certain distance (usually 500.000base pairs = 500 kilobases).In the correlation-based approach, only variants that are correlated at a certain threshold (usually r 2 <0.01, 0.1 or 0.2) are included [67].Implementing the correlation-based approach, Broberg et al. (2021) [19], in their study about the association between insomnia and pain, decided to use an r 2 =0.6 as their primary threshold and an r 2 =0.1 as their secondary threshold.Cullel et al. ( 2021) [24] and Zhou et al. (2021) [42] clumped genetic variants considering both approaches, an r 2 <0.001 and a kb=10,000 distance, which is more conservative.
It is important to consider LD when selecting the variants as it could violate core MR assumptions.Genetic variants that are correlated with the variants used may have effects on competing risk factors.The LDkit (a graphical user interface software) or PLINK (opensource C/C++ toolset) could be used for calculating linkage disequilibrium [83,84].Testing the association of potential confounders of the variants could reduce concerns about making invalid inferences due to LD [67].

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts

Sample overlap
When selecting a GWA study, it is essential to understand in detail the sample studied.This is because when using genetic variants discovered in the analytical sample, a bias called "winner's curse" may occur.This bias implies overestimating the strongest variant in the data under analysis [85].An overestimation will generally occur when the associations with confounders are stronger than expected by chance.Thus, an overlap between the genetic variant discovery dataset and the data under analysis may overestimate the variant-outcome associations and lead to false-positive results [67].To overcome this issue, Liu et al. (2022) [44] excluded data from participants in the UKB from their COVID-19 outcome dataset since their exposures (sleep and circadian phenotypes) were derived from this biobank.
The ideal situation to avoid this bias is having two non-overlapping datasets, what is called "Two-sample MR" [12].MR-Base, a platform that integrates a database of GWAS results with an application programming interface, a web application and R packages, allows the automation of two-sample MR [86].However, different datasets are not always available with the data or sample size necessary to perform the analysis.To mitigate potential issues with sample overlap, there are several alternatives thought to balance the risk of an imprecise estimation [67].One option is to calculate the bias due to sample overlap, which can be done with the formulae from Burgess, Davies & Thompson (2016) [87].Henry et al. (2019) [45] did this in their MR study about the impact of sleep duration on cognitive outcomes.
In their study about the association of insomnia with depressive symptoms and subjective well-being, Zhou et al. (2021) [42] also calculated sample overlap finding a bias ranging from 3% to 14%.
Another possible solution is to perform the MR analysis using a reduced genetic instrument replicated in an independent cohort, which could be a good option as a sensitivity analysis for studies that are unable to bypass sample overlap.In our own MR study, which examined the association between genetically-instrumented habitual daytime napping (using 92 SNPs) and cognitive function and brain volume, we replicated our findings using a reduced instrument consisting of 17 SNPs that were replicated in an independent cohort (23andMe) with no sample overlap with UKB (our analytical sample) [88].Additional analyses with this reduced instrument were largely consistent with our main findings.We are unaware of other studies using genetic instruments of sleep traits taking this approach.However, a study which investigated the relationship between glycaemia and cognitive function, brain structure and incident dementia, used a reduced genetic instrument for diabetes to avoid the "Winner's curse" bias [89].

Type of exposure
When deciding which GWA study to select, it is important to prioritise well-measured/ defined phenotypes used for identifying the genetic instruments.One aspect to consider is whether the phenotype was measured using self-reported data or an objective method (e.g.accelerometer-derived data).Many of the GWAS of sleep traits available are based on self-report questions, but some used and/or have been replicated with accelerometer-derived data, polysomnography or electronic medical records [69,90,91].Moreover, those using selfreported data sometimes have support from objective measures.For example, Dashti et al.

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts (2019) [92] tested whether the 78 loci found for self-reported habitual sleep duration (using a question on hours of sleep) in their GWA study were also associated with accelerometerderived sleep estimates.Another study by Dashti et al. (2019) [27] found that the variants were also valid when sleep duration was determined by bed and wake times.Ideally, genetic instruments discovered and replicated based on objective data should be selected.
Moreover, it is essential to understand how the exposure was analysed.For example, Lane et al. (2019) [93] performed two parallel GWAS for frequent and any insomnia symptoms based on participants' responses to the question "Do you have trouble falling asleep at night, or do you wake up in the middle of the night?".For frequent insomnia, they considered participants who responded "usually" as cases and "never/rarely" as controls, with those reporting "sometimes" being excluded.For any insomnia, they considered participants who responded "sometimes" or "usually" as cases and "never/rarely" as controls.On the contrary, a GWA study by Jansen et al. (2019) [8], using the same question, defined insomnia cases as participants who answered "usually", while participants who answered "never/rarely" or "sometimes" were defined as controls.In this example, insomnia symptoms are analysed in three different ways using the same underlying question.Understanding how the exposure was measured is crucial for adequately interpreting results.
Another crucial aspect of genetic instrument selection is whether the exposure is continuous or binary.It is well-established that continuous measures should be used where possible in MR [94].However, using continuous exposures has the caveat that sometimes MR studies aim to test whether a particular disease status (e.g., insomnia) might be causally related to a specific outcome.Furthermore, some sleep traits are often considered binary: chronotype (evening vs. morning types), napping (frequent vs. infrequent nappers), and duration (longer vs. shorter sleepers), amongst others.In the case of using a binary exposure, it is important to be aware of its limitations.Burgess and Labrecque's paper (2018) [94] explained that the problem arises when using a binary exposure that dichotomises a continuous variable (e.g.short/long sleep arises from dichotomised sleep duration).In the cases of MR studies using these types of exposures, the results should be conceptualised in terms of the underlying continuous risk factor.

Total (R 2 ) and average strength (F-statistic) metrics
Selection of genetic instruments is often conducted by considering each variant's effect size to avoid weak-instrument bias.This bias can occur when the genetic instruments explain a small proportion of the variance in the exposure.Weak instruments may lead to non-robust results and bias the estimates towards the confounded observational estimate [95].Some of the most commonly used effect indicators are the proportion of the phenotypic variance explained by all of the genome-wide significant SNPs (R 2 ) and the F-statistic obtained from regressing the exposure on the genetic instrument (in a multivariable linear regression) [62].The R2 provides information about the total strength of the genetic variant, and usually, the larger, the better.Swerdlow et al. (2016) [62] argue that the R2 is the most useful effect metric when selecting genetic instruments for MR analysis.However, the F-statistic provides information about the average strength of the instrument, with an F>10 indicating that substantial weak instrument bias is unlikely [95].
Several options for obtaining F-statistics are available.If individual-level data are available for the exposure, the 'Individual-level data regression' approach can be performed.However, if individual-level data are not available and the R 2 from the exposure GWA study is obtainable, the Cragg-Donald F-statistic method may be used [95].This method uses the R 2 , sample size (n), and the number of instruments (k) to calculate the statistic (F=(n−k−1/k) (R 2 /1−R 2 )) [67].Liu et al. (2021) [31] used this formula reporting a F-statistic of 143.24 in their study about the association between genetically-instrumented insomnia and cardiovascular diseases.When the R 2 is unknown, the 't-statistic' summary-level method can be used (F=ß 2 /SE 2 ).In this case, the F-statistic will be an approximation because it uses the sample size for the discovery GWA study, not the one from the data under analysis.Finally, the "MendelianRandomization" R package allows the calculation of the F-statistic [13].

Number of SNPs
MR studies including large numbers of genetic variants are rapidly increasing.This growth is related to the proliferation of GWAS and the desire to obtain more precise estimates.However, as previously discussed, not all variants are valid IVs [96], and an enlarged set of genetic instruments is not always better [12].Selecting a large number of variants could lead to a larger R 2 but a weaker F-statistic and greater chances of including pleiotropic variants, violating a core MR assumption.Including more variants also allows the use of more robust methods, including common sensitivity analyses such as the MR-Egger test.On the contrary, fewer variants will lead to a lower R 2 but potentially a greater F-statistic, which could lead to an instrument with insufficient power [96].
To understand how the strength of the instrument depends on the number of SNPs, we present Liu et al. (2022) [30] study on the relationship between sleep traits and glycated haemoglobin -HbA1c-(see Table 2).In this study, the F-statistic for all exposures was higher than 10 (which indicates an appropriate average strength), while the R 2 ranged between 0,06 and 2,09%.In the case of long sleep duration, including fewer variants (five) lead to a low total strength (R 2 =0,06%) and a good average strength (F-statistic=41).

Minor allele frequency and palindromic variants
MAF is the proportion of minor alleles for a specific SNP in a given population [67].In other words, it is the frequency at which the second most common allele occurs.Usually, GWAS identify common variants [97]; however, SNPs with a wide distribution of MAFs can sometimes be included.Some MR studies exclude variants with a low MAF because causal estimates from those variants may have low precision [96,98].For example, Chen et al. (2021) (Chen et al., 2021) decided to remove variants with a MAF<1% in their study about the association between sleep traits and low bone mineral density.However, excluding variants with low MAF could mean removing variants associated with the exposure of interest.For example, low-frequency variants in PERIOD3 have been associated with chronotype [7] and familial advanced sleep phase syndrome [99].
Another potential problem is palindromic variants because they can introduce ambiguity into the identification of the effect allele.A palindromic SNP occurs when the two possible .Additional care should be taken with palindromic variants because studies might report effects of the same SNP using different strands (e.g. a study reports an SNP with A/G alleles and another with T/C alleles).In those cases, the ambiguity can be identified if the effect allele frequency is reported and the MAF is substantially below 50% [100].For example, if a specific SNP has alleles A/T, with allele A frequency being 0.11 in the GWA study and 0.91 in the data under study (both coding this allele as the effect allele) and both studies have the same ethnic origin, this means that authors used different reference strands.In this case, it is necessary to switch the direction of the effect in either the discovery GWA study or the analytical sample, a procedure called variant harmonisation [86].
However, if it is not possible to verify that alleles are correctly orientated, it may be necessary to take some precautions [67].There are options to deal with this problem: replace the variants with suitable, non-palindromic linkage disequilibrium proxies, perform sensitivity analyses to evaluate the impact of these variants on the results or exclude them [100].For example, in a study by Alimenti et al. (2021) [16] about causal links between habitual sleep duration/napping and macronutrient composition palindromic SNPs with MAF close to 0.50 were excluded and the remaining palindromic instruments were aligned based on their MAF.

Confounding
The third MR assumption states that the genetic variant-outcome association is unconfounded [101].Violations of this assumption could be due to at least two different types of confounding.One is confounding by ancestry (e.g., if SNPs associated with sleep duration have higher/lower frequencies in different ancestry groups in the sample under study and additionally, cultural differences have an impact on the outcome under study), which could be controlled by restricting the sample to a single ancestry group, and/or adjusting for principal components of ancestry.A second source of confounding occurs if SNPs associated with the exposure of interest are also associated with common confounders of the relationship under study.One of the advantages of MR is that it exploits the fact that genotypes are not generally associated with confounders.However, such associations may occur, especially when using weak instruments or small samples [67].Thus, it is important to test whether the genetic instruments are associated with confounders of the exposure outcome relationship [10].
To address this issue, authors must first identify common confounders previously reported between their exposures and outcomes.For example, in the case of the association between obstructive sleep apnea and hypertension, weight and age are proposed as two of the main confounding factors in this putative relationship [102].For the long sleep-mortality association, some authors have argued that depression is most likely to confound this relationship [103].Therefore, it is essential that, regardless of the exposure of interest, a literature review is carried out to identify the confounders to be considered.
Then, authors often statistically test associations between their genetic instrument and variables reported in the literature as potential confounders in the exposure-outcome association.This is crucial as MR aims to give causal estimates that are not biased due Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts to confounding [67].In the MR study performed by Henry et al. (2019) [45], the authors explored the validity of their instruments by testing associations of potential confounders (such as sex, age, educational attainment and use of sleep-inducing medication) with their sleep duration genetic score.

Conclusions
In this article, we explored the criteria used for selecting genetic instruments for sleep traits in the context of MR, discussing how instrumental choice impacts analysis.We also presented GWAS of sleep phenotypes since 2016 and discussed MR studies using genetic sleep instruments to date.We are convinced that instrument selection is the most important decision when designing an MR study and that this is becoming even more important as the number of sleep genetic variants found in GWAS increases.We hope this review will aid researchers in designing robust MR studies and continue to elucidate our understanding of the causal role of sleep on health outcomes.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.Sleep Med.Author manuscript; available in PMC 2024 January 11.

Figure 1 .
Figure 1.Flowchart with the main points to consider when selecting sleep genetic instruments in MR.
Paz et al.Page 19 Sleep Med.Author manuscript; available in PMC 2024 January 11.