Edinburgh Explorer What genome-wide association studies reveal about the association between intelligence and physical health, illness, and mortality

The associations between higher intelligence test scores from early life and later good health, fewer illnesses, and longer life are recent discoveries. Researchers are mapping the extent of these associations and trying to understanding them. Part of the intelligence-health association has genetic origins. Recent advances in molecular genetic technology and statistical analyses have revealed that: intelligence and many health outcomes are highly polygenic; and that modest but widespread genetic correlations exist between intelligence and health, illness and mortality. Causal accounts of intelligence-health associations are still poorly understood. The contribution of education and socioeconomic status — both of which are partly genetic in origin — to the intelligence-health associations are being explored.


Intelligence, and health and death
Until recently, an article on DNA-variant commonalities between intelligence and health would have been science fiction. Thirty years ago, we did not know that intelligence test scores were a predictor of mortality. Fifteen years ago, there were no genome-wide association studies. It was less than five years ago that the first molecular genetic correlations were performed between intelligence and health outcomes. These former blanks have been filled in; however, the fast progress and accumulation of findings in the field of genetic cognitive epidemiology have raised more questions. Individual differences in intelligence, as tested by psychometric tests, are quite stable from later childhood through adulthood to older age [1,2]. The diverse cognitive test scores that are used to test mental capabilities form a multi-level hierarchy [1][2][3]; about 40% or more of the overall variance is captured by a general cognitive factor with which all tests are correlated, and smaller amounts of variance are found in more specific cognitive domains (reasoning, memory, speed, verbal, and so forth). Twin, family and adoption studies indicated that there was moderate to high heritability of general cognitive ability in adulthood (from about 50-70%), with a lower heritability in childhood [4]. It has long been known that intelligence is a predictor of educational attainments and occupational position and success [1].
Relatively recently, the 'ultimate validity' of intelligence test scores was discovered, that is, that higher intelligence significantly predicts later death. First, an Australian Vietnam Veterans study found that higher young-adult intelligence predicted lower risk of accidental deaths up to early middle age [5]. Then, a population-representative Scottish study found that intelligence test scores at age 11 years predicted deaths from all causes up to older age (the mid-70 s) [6]. The association between intelligence test scores from early life and mortality from all causes has been widely replicated [7][8][9]. Intelligence from childhood and adulthood is associated with most of the major causes of death with the exception of non-smoking-related cancers [10 ,11]. Broadly speaking, a one-standard-deviation advantage in intelligence in youth lowers the risk of mortality by 20-25% or more up to older age; the effect sizes are hardly attenuated at all by adjusting for childhood socioeconomic status, though are partly attenuated after adjusting for education and adult socio-economic status, which are possible mediators of the association [6,7,8,9,10 ,11].
In addition to mortality, intelligence test scores are associated with lower risk of many morbidities, such as cardiovascular disease, cerebrovascular disease, hypertension, cancers such as lung cancer, stroke, and many others, as obtained by self-report and objective assessment [12][13][14]. Higher intelligence in youth is associated at age 24 with fewer hospital admissions, lower general medical practitioner costs, lower hospital costs, and less use of medical services, and intelligence appeared to account for the associations between education and such health outcomes [15,16]. Higher intelligence is related to a higher likelihood of engaging in healthier behaviours, such as not smoking, quitting smoking, not binge drinking, having a more normal body mass index and avoiding obesity, taking more exercise, and eating a healthier diet [16][17][18].
The flood of intelligence versus mortality/illness/healthbehaviours findings was captured by the term 'cognitive epidemiology' [19]. From early on until now, there have been speculations about the possible causes of these associations [6,10 ,14,20]. Briefly, there is acknowledgement that the causes of the associations are probably multiple, such as there being a constitutional (perhaps partly genetic) association between intelligence and health, and/or that intelligence's influence might act via more education, higher health literacy, and more affluent social class. Here, we examine evidence for possible genetic links between intelligence and health.

Genetic contributions to health, and to intelligence
There are at least three reasons to conduct genetic studies of phenotypes. First one wants to understand the genetic architecture of a phenotype, that is, what is the nature of the genetic variants that contribute to variation in the phenotype. For example, a single mutation might have a large effect, as is the case in Mendelian diseases. By contrast, continuous traits might be more likely to be polygenic; that is, to have some of their variance caused by small contributions from many genetic variants. Second, having discovered the genetic architecture, one is interested in the specific genes in which variants have causal effects, that is, one wants to understand the molecular genetic mechanisms of variation. Third, knowing that there is some genetic contribution to a phenotype, one can ask how good a predictor the genotypic information is; that is, how well can one predict some variation in a phenotype from only genotypic information? Much recent progress has been made along these lines for illnesses and for intelligence.
Before the mid-2000s, genetic studies were done by three main methods. First, pedigree-based (twins, adoptees, and families) studies of relatives' phenotypic associations were used to estimate the heritability of phenotypes, and genetic correlations among them. Limitations of pedigree methods include the fact that several assumptions must be made in doing the modelling, and that one does not learn about the specific genes involved. Second, candidate gene studies tested hypotheses concerning whether certain genetic variants were associated with phenotypic differences. For example, the possession of the e4 allele of the gene for Apolipoprotein E (APOE) is associated with an increased risk of developing Alzheimer's disease. Limitations of the candidate gene method include the fact that most candidate gene findings are not replicated (APOE e4 possession is an exception to this), and that it is difficult to choose a candidate genetic variant from the millions that are known. Third, genetic linkage analysis was used to track genetic markers in families where specific phenotypes were common, to identify regions of the genome that segregate with the phenotype. The main limitations of this method are that large families are required and it identifies relatively large regions of the genome, rather than specific genetic variants or genes.
This changed with the advent and rise of genome-wide association studies (GWASs) [21] (See Box 1). Sample sizes for GWASs often began with a few thousand, but, as the polygenic architecture of many traits became clear -, that is, the associations between individual genetic variants and phenotypes typically had very small effect sizes -it was necessary to form consortia so that the Ns of studies rose to the tens and then hundreds of thousands. Some GWAS consortia are now approaching and passing one million participants.
The typical finding -there are exceptions -in health and cognitive GWASs is that many genetic variants of small effect contribute to phenotypic variation. In 2017, a survey of the first ten years of GWASs' discoveries enumerated the SNPs that were associated with, for example, Crohn's disease, diabetes, blood lipid levels, heart function, height, bone density, red blood cell traits, metabolic traits, blood platelets, breast cancer, rheumatoid arthritis, blood metabolites, menarche, Alzheimer disease, kidney function, lung function, and education [21]. Often, the numbers of genetic loci in which significant SNP associations are found runs to dozens or even hundreds for a single phenotype.
In 2011 the first apparently-decently-sized GWAS of intelligence appeared (N approximately 3500), and found no significant SNPs [22]. By the time the sample size was about 100 times greater, the number of independent genomic regions that were associated with intelligence was about or greater than 150 [23 ,24 ,25 ]. Figure 1 shows results from a recent GWAS of intelligence. Many of these SNPs are located in regions of the genome that have previously been associated with physical and mental illnesses. Therefore, we now know many actual DNA variants that have significant associations with intelligence tests' scores; there are probably thousands in total. Although it found no significant SNPs, the 2011 paper [22] did make a difference; it was the first study to estimate the heritability of intelligence from DNA data alone and in unrelated subjects. This used a then-new method-called GREML, and run in the GCTA framework [26] -which examined people's overall genetic similarity -based on common SNPs -with their phenotypic similarity (See Box 1). The common-SNP-based heritability of intelligence is estimated to be about 25% [24 ]. It is typical for this common-SNP-based heritability to be about half of that estimated from twin studies [27 ]. It is thought that this 'missing heritability' is because there are types of genetic variants other than causal variants that are in linkage disequilibrium with common SNPs that contribute to heritability. Some new techniques are helping to find these and close the gap between twin-based and SNP-based heritability [28].

Genetics and the intelligence-health relationship
Three things are clear. First, higher intelligence in early life is a significant predictor of better health behaviours, fewer and later illnesses, and longer life. Second, many of the relevant health and illness outcomes, as well as health behaviours, have many SNPs associated with them, and have a detectable level of common-SNP-based heritability. Third: the same goes, genetically, for intelligence. Relatively new methods -bivariate extension of GREML run on GCTA [29], and LD regression [30,31] (see Box 1) -have allowed estimates of the genetic correlations between phenotypes. That is, we can test the extent to which the polygenic signature obtained by using the summary results from GWAS contributes to any two phenotypes, including between 8 Genetics Box 1 Methods for investigating shared genetic aetiology between intelligence and physical health, illness and mortality

Genome-wide association study (GWAS)
GWAS is used to identify genetic variants associated with phenotypes. For diseases, large numbers of cases and controls are genotyped using testing arrays, most commonly from the companies Illumina or Affymetrix. The arrays contain up to 1 million genetic variants spread throughout the genome. For quantitative traits, large numbers of individuals on whom the trait is measured are genotyped. Using reference datasets, for example, Hap Map, 1000 Genomes and the haplotype reference consortium (HRC), several million genetic variants are then imputed to give greater genomic coverage and to harmonize datasets genotyped using different genetic testing arrays, containing different variants. Logistic (for case control) or linear (for quantitative traits) regressions are then performed between each genetic variant and the phenotype. As millions of regressions are performed for each phenotype a P-value of <5 Â 10 À8 is usually considered genome-wide significant. As GWASs do not analyse every variant in the genome they do not usually identify causative variants, but rather indicate regions of the genome that are implicated in a particular phenotype. If the same regions of the genome are identified in GWASs of multiple phenotypes this may indicate that the phenotypes share genetic aetiology.
2. Genome-wide complex trait analysis (GCTA)-genomic-relatedness-based restricted maximum-likelihood single component (GREML) GREML [26] is used to estimate the proportion of variance in a phenotype that is due to the linkage disequilibrium between genotyped variants and unknown causal variants. It gives a lower-boundary estimate of the heritability of a phenotype as it does not include variance accounted for by genetic variants that are not well tagged by variants on the array, for example, rare variants.

Bivariate GCTA
Bivariate GCTA [46] is an extension of GCTA-GREML that allows the genetic correlation between two phenotypes to be determined. Genetic correlation describes the proportion of the variance that two phenotypes share that is due to genetic causes. High genetic correlation between two phenotypes indicates that the phenotypes share genetic aetiology. This method requires the actual genotyping data from the sample and for the sample to have been measured on the two phenotypes in question.

Linkage disequilibrium (LD) regression
LD regression [30,31] is a method that can be used to determine the genetic correlation between two phenotypes, using only summary statistics from GWASs. LD regression estimates the genetic effect on a trait by measuring the extent to which the observed effect sizes from a GWAS can be explained by LD. The covariance between the genetic effects in two phenotypes can be indexed in a similar way; normalizing this genetic covariance by the heritability of the trait will estimate the genetic correlation between the two traits. This method does not require genotyping data, and can produce genetic correlations for two phenotypes that were GWAS-ed on different samples. For example LD regression is used to compute the genetic correlation between intelligence and longevity by using summary statistics from a GWAS of mortality and a GWAS of intelligence, both conducted on independent samples.

Polygenic scores (PGS)
PGS analysis uses summary GWAS data for a given phenotype to test whether polygenic liability to that phenotype is associated with the same or different phenotype measured in an independent sample. It allows the amount of variance in one phenotype attributed to the polygenic score for the same or a second phenotype to be calculated. A PGS for a particular phenotype can be calculated for each individual in a sample, by summing the known effect size of each individual SNP (obtained from a GWAS of that phenotype) multiplied by the number of reference alleles present for that SNP in a particular individual. PGSs can be calculated using PRSice [47]. For example, the polygenic score for risk of coronary artery disease is associated with cognitive ability in older adults [48].

Mendelian Randomization (MR)
The methods described above will indicate whether or not two phenotypes share genetic aetiology, but do not reveal the direction of causation. Once shared genetic aetiology between two phenotypes is established, MR methods can be used to investigate whether one phenotype directly influences the other phenotype, or whether genetic variants independently affect both phenotypes. Bi-directional MR allows each phenotype to be used as the exposure or the outcome in turn, potentially providing support for the direction of effect. In MR, genetic variants (often variants that are genome-wide significantly associated with the relevant phenotype) are used as instrumental variables (IV) for the exposure. Unlike the exposure itself, these genetic variants should be largely independent of confounding factors and reverse causation. The IV is used to estimate if the exposure causally influences the outcome. There are three basic assumptions of MR: Firstly, the genetic variants are associated with the exposure; secondly, the genetic variants are only associated with the outcome of interest via their effect on the exposure; and finally, the genetic variants are independent of confounders of both the exposure and the outcome. Biological pleiotropy, whereby a genetic variant independently influences multiple traits, may violate the second assumption. The more SNPs that make up the IV the more likely that biological pleiotropy will be present. Methods have now been developed that test and correct for biological pleiotropy [49]. Two-sample MR allows the exposure and the outcome to be measured in different samples and therefore the effect of the IV on the exposure and outcome can be obtained from GWAS summary data [50].
intelligence and health. Polygenic signatures for many diseases were soon shown to be associated with intelligence [32]. Twin studies had suggested that part of the intelligence-mortality association might be genetic in origin, though there was disagreement about how much genetics contributed [33,34]. However, more recent studies have used genomic data.
The list of significant molecular genetic correlations between intelligence and physical health variables is now long [23 ,24 ,25 ,32]. Table 1 gives some examples. With regard to mortality, longevity has been used; parental age at death has also been used, as a proxy, because most relevant studies have not carried on long enough for many participants to have died. There is a positive correlation of 0.36 between intelligence and parental age at death. There are inverse genetic correlations between intelligence and both heart disease and hypertension, with effect sizes between À0.1 and À0.2. There is a small (<0.1) association with cholesterol, with higher 'good' cholesterol going with higher intelligence and the reverse for the 'bad' cholesterol. There is a moderatesized inverse genetic association between intelligence and Alzheimer's disease. There is a positive genetic association, of 0.27, with intracranial volume, which is an indication of maximal brain volume in the life course. There are significant positive genetic correlations between intelligence and birth weight, lung function, happiness, and short-sightedness. There are significant negative genetic correlations between intelligence and body mass index, poor self-rated health, lung cancer, osteoarthritis, insomnia, smoking, waist-hip ratio, and long-sightedness. It must be stressed that these correlations are based on GWASs conducted on different samples; that is the people on whom intelligence was measured were not the people on whom the health-based phenotype was assessed. Associations are interesting, but they do not explain why the correlations exist, or the direction of causation, which require further study and more new GWAS-based methods.
Understanding the intelligence versus physical health association, including the part played by genetics As described above, genetic correlations have been identified between intelligence and many diseases, and physical health traits; moreover, polygenic scores for diseases and health traits predict intelligence. However, it is not clear if these findings are due to: (1) genetic variants influencing health traits/diseases, and then those health traits/diseases influencing intelligence; (2) genetic variants influencing intelligence, and then intelligence influencing health traits/diseases; or (3) genetic variants influencing general bodily system integrity [20] that influences both intelligence and health traits/diseases. (1) and (2) may be due to mediated pleiotropy which Genetics and intelligence-health correlations Deary, Harris and Hill 9  Chromosome number is shown on the X-axis, with each dot representing one of the more than 12 million imputed single nucleotide polymorphisms. The red line represents genome-wide significance, that is, P < 5 Â 10 À8 .
can be tested for using a relatively new technique called Mendelian Randomization (MR) (see Box 1).
Using a bi-directional two-sample MR approach we identified no causal association between intelligence or educational attainment (a proxy measure of intelligence), and the physical health traits of body mass index (BMI), systolic blood pressure, height, coronary artery disease and type 2 diabetes [35], using data from the UK Biobank (N approximately 110,000) and large GWAS consortia. However, a larger, more-recent study found MR-based evidence for potentially causal genetic effects of intelligence on larger intracranial volume, lower risk of Alzheimer's disease, lower body mass index, and greater likelihood of quitting smoking [25 ]. A MR study investigating the effect of education on obesity in about 2000 Finns concluded that education could be a protective factor against obesity, as measured using BMI [36].
Another study using education data from the SSGAC consortium and coronary heart disease data from CAR-DIoGRAMplusC4D (total sample size = 543,733) found that higher education was causally associated with reduced risk of coronary heart disease, lower likelihood of smoking, lower BMI and a more favourable blood lipid profile [37]. Sensitivity tests indicated that the results were unlikely to be driven by biological pleiotropy. A twostep MR study investigated the influence of vitamin B12 intake during pregnancy on cord blood DNA methylation Note: Samples contributing to the three papers are not independent. Variables with no significant genetic correlation with intelligence in any of the three studies were not included. P values shown are nominal; some do not survive Bonferroni correction or correction for false discovery rate, as indicated in the three papers.
and whether there is a causal influence on offspring's cognition in the Avon Longitudinal Study of Parents and Children (ALSPAC) [38]. A small causal effect of vitamin B12-responsive DNA methylation changes on children's cognition was identified. MR analysis has suggested that genetically-predicted intelligence and education both had associations with Alzheimer's disease [39].
Another part of understanding the genetic contribution to intelligence-health correlations concerns other predictors of health inequalities, and intelligence's correlations with them. Intelligence, we saw earlier, is related to education and socio-economic status (SES), and those were known to be related to health inequalities before intelligence was known to have health associations. Although education and SES are principally thought of as social-environmental variables, both have been found to be partly heritable, by both twin-based and molecular genetic studies, both have high genetic correlations with intelligence, Mendelian Randomisation results show bidirectional genetic effects between intelligence and education, and both have genetic correlations with health outcomes [25 , 40,41,42,43,44,45].

Conclusion
Intelligence has predictive power for many health outcomes. Part of that association is genetic. The genes involved, and the causal pathways of the associations are being explored.

Conflict of interest statement
Nothing declared.