Genes to predict VO2max trainability: a systematic review

Cardiorespiratory fitness (VO2max) is an excellent predictor of chronic disease morbidity and mortality risk. Guidelines recommend individuals undertake exercise training to improve VO2max for chronic disease reduction. However, there are large inter-individual differences between exercise training responses. This systematic review is aimed at identifying genetic variants that are associated with VO2max trainability. Peer-reviewed research papers published up until October 2016 from four databases were examined. Articles were included if they examined genetic variants, incorporated a supervised aerobic exercise intervention; and measured VO2max/VO2peak pre and post-intervention. Thirty-five articles describing 15 cohorts met the criteria for inclusion. The majority of studies used a cross-sectional retrospective design. Thirty-two studies researched candidate genes, two used Genome-Wide Association Studies (GWAS), and one examined mRNA gene expression data, in addition to a GWAS. Across these studies, 97 genes to predict VO2max trainability were identified. Studies found phenotype to be dependent on several of these genotypes/variants, with higher responders to exercise training having more positive response alleles than lower responders (greater gene predictor score). Only 13 genetic variants were reproduced by more than two authors. Several other limitations were noted throughout these studies, including the robustness of significance for identified variants, small sample sizes, limited cohorts focused primarily on Caucasian populations, and minimal baseline data. These factors, along with differences in exercise training programs, diet and other environmental gene expression mediators, likely influence the ideal traits for VO2max trainability. Ninety-seven genes have been identified as possible predictors of VO2max trainability. To verify the strength of these findings and to identify if there are more genetic variants and/or mediators, further tightly-controlled studies that measure a range of biomarkers across ethnicities are required.


Background
The worldwide prevalence of chronic diseases, such as cardiovascular disease, cancers, stroke and diabetes is rising [1]. Low cardiorespiratory fitness is strongly associated with chronic diseases and premature mortality [2][3][4][5][6][7]. To alleviate the health and economic burden associated with low cardiorespiratory fitness, health guidelines across the world recommend individuals undertake regular exercise [1].
Exercise training can increase cardiorespiratory fitness and decrease chronic disease via a number of mechanisms [7]. Adaptations include improvements to cardiac size, stroke volume (increase in volume of blood pumped from the left ventricle), cardiac output (volume of blood pumped from the heart per minute), pulmonary blood flow and respiratory function, supply of oxygenrich blood to working muscles (increased number of capillaries and blood volume), muscle mitochondrial function and content, oxidative enzyme capacity, vascular wall health and function, and biomechanical efficiency [2,7]. It has been suggested that improvements in cardiorespiratory fitness in response to exercise training varies greatly between individuals, with some people responding well or very well ('responders' or 'high-responders') to exercise training, whereas others only have mild increases in their cardiorespiratory fitness following similar exercise training ('low-responders') [4,5,[8][9][10][11]. Importantly, these responses need to be compared to within-subject random variation to ascertain true interindividual differences [12]. The ability to change cardiorespiratory fitness is a multifactorial trait influenced by environmental factors (such as exercise training) and genetic factors [4,5,11]. Considering cardiorespiratory fitness is one of the best integrative predictors of morbidity and mortality risk, it may be important to understand how genetics predict the variability in response to exercise training. This knowledge could lead to targeted personalised exercise therapy to decrease the burden of chronic disease.
The gold standard measure for cardiorespiratory fitness is maximal oxygen uptake (VO 2max ), which is quantified as the maximal amount of oxygen the body can use in 1 min, during dynamic work with large muscle mass [13]. Research into human variation of VO 2max was first undertaken over forty years ago, with several authors identifying a strong genetic influence on VO 2max in twins [14,15]. Subsequent studies have identified significant familial aggregation for VO 2max trainability. For example, authors have found greater variance between pairs of monozygotic (MZ; identical) twins than within pairs of twins for VO 2max training response after standardized aerobic training interventions [16,17]. The strongest evidence to date on this topic was found in the HEalth, Risk factors, exercise training And GEnetics (HERITAGE) family study [18]. Four hundred seventythree Caucasian adults from 99 nuclear families completed 20 weeks of Moderate Intensity Continuous Training (MICT). The average increase in VO 2max was 400 mL O 2 /min, with a range from − 114 to + 1097 mL/ min. This difference was two and half times greater between families than within families, with a 47% heritability estimate for VO 2max training response [18]. A major limitation from these findings, however, is there was no comparator control group.
Since this familial longitudinal research, the Human Genome Project completed sequencing of the human genome resulting in significant advancements in genetic analysis capabilities. This led to a better understanding of genetic variations of large populations. Analyzing genetic variants on a population level using techniques such as candidate gene analysis, GWAS, whole genome and exome sequencing and RNA expression analysis (RNAseq, or microarrays) has resulted in the possibility of developing 'personalized genomics'. This aims for biological profiling to provide more effective health management and treatment [5]. However, research in the field of exercise genomics it still in its infancy and much work is needed before genomic tools could be utilized to personalize exercise training programs [19].
The aim of this study was to systematically review the literature and identify genetic variants that have been associated with VO 2max trainability following an aerobic exercise training intervention. Given the infancy of this research field, results should only be used to provide the basis for future research. This research should aim to confirm previous findings and investigate mediators that can influence gene expression. Importantly, future genetic studies in this area should attempt to investigate the physiological functions that contribute to improving VO 2max training response and overall health outcomes. Findings from ongoing research may assist clinical professionals to provide personalized evidenced-based medicine centered on phenotype, contributing to the fight against chronic disease.

Methods
A comprehensive search of four databases (PubMed, Embase, Cinahl, Cochrane) was completed from their inception until October 2016. Studies focusing on genes and their VO 2max /VO 2peak response to supervised aerobic training were sought with the following search terms: genetic profiling, polymorphism, single nucleotide polymorphisms, SNPs, genetic variants, predictor genes, trainability, endurance training, cardiovascular fitness, cardiorespiratory fitness, VO 2max , VO 2peak , aerobic power, aerobic fitness, aerobic capacity. A full list of search terms can be found at the end of this review.
Two authors (CW and JC) agreed on the criteria for inclusion. Articles were incorporated if they were: original, peer-reviewed research; included an aerobic intervention, with minimum 75% supervision; included genetic variant testing; included a maximal VO 2max / peak using direct gas analysis from an incremental test (pre and post intervention); conducted on humans; and written in English.
Using an extraction grid, one author (CW) conducted the initial screening analysis. After removing duplicates and scanning the titles and abstract of articles, those meeting the inclusion criteria were reviewed. Data recorded from the review consisted of the author's name and place of study, study design, study sample, tissue source, genotyping method used, gene and variant examined, genotype, gene expression (if examined), intervention used, possible mediators (such as medications and health concerns), and the influence of the genetic variant investigated on VO 2max change. Further articles were retrieved from snowballing included articles from their reference lists. Articles included in the review are in Table 1.
A summary of key findings from the included articles is provided in Tables 2 and 3. Limitations were assessed by two authors (CW and JC) based on the intervention, genotyping method used, study design and sample used. Table 4 was developed to highlight which predictor genes for VO 2max trainability merited further exploration. A third author (MW) examined Tables 1, 2, 3 and 4 to ensure all genetic variants, genomic coordinates and genotypes, were described with a consistent annotation.

Results
Of the 1635 articles identified, 35 met the inclusion criteria (see Fig. 1). A summary of these articles is provided in Tables 1, 2 and 3. From the 35 articles, 97 genetic variants were identified as being significantly associated with VO 2max trainability (Table 4).

Study characteristics
Across the studies DNA samples from 4212 individuals were used. Tissue sources were predominantly blood leucocytes, lymphoblastoid cell lines and buccal cells. Genotype was primarily identified through PCR-RFLP (polymerase chain reaction restriction fragment length polymorphism based analysis) for candidate genes and Illumina Human CV370-Quad Bead Chips for GWAS analysis (which can capture over 370,000 SNPs per participant).
Most reviewed studies (n = 32) used a single-group longitudinal design. However, one study compared three groups using a longitudinal design [28]. One study used retrospective data from two Randomized Controlled Trials (RCT) [20]; and one was a double-blind study [39].

Candidate gene studies
The candidate gene association approach requires a prior hypothesis that the genetic polymorphisms of interest are causal variants or in strong linkage disequilibrium (LD) with a causal variant, and would be associated with a particular exercise-related phenotype at a significantly different rate than predicted by chance alone (may be higher or lower). This approach is effective in detecting genetic variants that are either directly causative, or belong to a shared haplotype that is causative [54]. Thirty-two candidate gene studies were based on the gene's molecular function and possible association with VO 2max trainability ( Table 2).
Genes associated with muscular subsystems VO 2peak can be influenced by muscle efficiency and it has been hypothesized that genes encoding muscular subsystems may contribute to the genetic variability in VO 2peak training response [33]. Twelve genes and 21 genetic variants related to muscular phenotypes were investigated in 935 (76 female) cardiac patients from the CARAGENE study [33]. Three out of the 21 genetic variants were significantly associated (p < 0.05) with an increase in VO 2peak following 3 months of MICT (2-3 × 90-min sessions per week at 80% HR max ; p < 0.05). These variants included GR:c.68 > A (G/A genotype, number of people with genotype; n = 55) in the glucocorticoid receptor gene (GR; rs6190), CNTF:c.115-6G > A (AA

Ghosh, 2013
There were no other possible mediators (such as medications, health concerns) or other significant findings noted in the above three studies. Where possible, gene variants were annotated using the references sequence (GRCh37/hg19) *Out of the 39 SNPs identified via GWAS, 21 (*) explained 49% of the VO 2 max trainability variance (after regression analysis). The 15 most significant were then examined using data from the following studies: HERITAGE African-Americans, DREW study, STRRIDE study. The variants replicated are in italics +11 SNPs from a regression analysis explained~23% of the estimated VO 2 max variance. 90% RNA expression remained unchanged by exercise training. (++) were found in study by Bouchard (2011) but weren't included in the regression analysis because they weren't considered significant at the 0.00015 level Top 20 GWAS associated genes based on second-best SNP-P values #Candidate genes identified through CANDID software based on literature search; GWAS association data; sequence conversion & gene expression. This equates to a 'final score' rather than p-value. Bolded text indicates moderatestrong related biological mechanisms that influence VO 2 max trainability **(+) = significantly higher training response (0) = no significant difference in training response between genotypes (−) = significantly lower training response genotype, n = 21) in the ciliary neurotrophic factor gene (CNTF; rs1800169) and the AMPD1:c.133C wild type (CC genotype, n = 652) of the adenosine monophosphate deaminase gene (AMPD1; rs17602729). Furthermore, a larger change in relative VO 2peak was reported in patients with a greater number of these variants described (Area Under the Curve (AUC): 0.63; 95% Confidence Interval (CI): 0.56-0.7; p < 0.01). More specifically, those with a gene predictor score (GPS) of one or less positive response alleles had an average increase in VO 2peak of 16.7%. Those with four or more positive response alleles had an average increase of 25%, with each positive response allele contributing approximately 1% (13.5 mL/ min) to the increase in VO 2peak . Caucasians aged between 17 and 65 years from the HERITAGE study who were homozygous (TT genotype) for the AMPD1:c.133C > T (p.(Gln45*)) (rs17602729) variant (n = 6), had a lower VO 2max training response (<121 mL/min; p = 0.006), compared to the CT and CC genotypes (n = 497) following 20 weeks of MICT (3 × 50 min per week at 55-75% HR max ) [46].
The glutathione S-transferase P1 (GSTP1) c.313A > G variant has been associated with an impaired ability to remove excess reactive oxygen species. This is hypothesised to increase the exercise training response by better activation of cell signalling pathways resulting in positive muscle adaptations [45]. While investigating 62 Polish females' (19-24 years-old) response to 12 weeks of MICT (3 × 60 min per week at 50-75% HR max ), participants (n = 30) with the GSTP1:c.313A > G (GG + GA genotype) demonstrated a 2 mL/kg/min greater improvement in VO 2max compared to AA genotypes (n = 5) following training (absolute p = 0.029, relative p = 0.026, effect size = 0.06) [45].

Genes associated with electrolyte balance
The electrogenic transmembrane ATPase (NA+/K + −ATPase) gene may contribute to VO 2max trainability by affecting the electrolyte balance and membrane excitability in working muscles [24]. Examining Caucasian data from the HERITAGE study, it was found that those homozygous for a recurrent 3.3-kb deletion in the exon 1 of the ATP1A2 gene (n = 5) had a 41% (45 mL/min) lower training response compared to heterozygotes (n = 87) [24]. This exon encodes on part (alpha-2-subunit) of the Na+/K + ATPase protein. This genotype also had a 48% (197 mL/min) lower VO 2max training response than homozygotes (n = 380) for a repeated 8.8-kb in the exon 1 of the ATP1A2 gene following 20 weeks of MICT (p = 0.018) [24]. VO 2max gains were 29% (130 mL/min) and 39% (160 mL/min) greater in offspring homozygous for a 10.5-kb deletion in exon 21-22 (n = 14) compared to heterozygotes (n = 93) and homozygotes (n = 187) respectively (p = 0.017) [24]. The angiotensin-converting enzyme (ACE) gene contributes to blood pressure, fluid and salt balance [55]. Elite endurance athletes are more likely to have the Insertion (I) allele [56] which relates to lower ACE activity and reduced blood pressure response during exercise, whereas sprint/power athletes are more likely to have the Deletion (D) allele and the DD genotype [57] and subsequently higher ACE activity. Caucasians from the CARAGENE study with the homozygous II genotype (frequency of 0.23 and 0.18 for men and women respectively) had a 2.1% greater VO 2max training response (p = 0.047) compared to the DD genotype (frequency of 0.3 and 0.36 for men and women respectively) [31]. When eliminating those on ACE inhibitors, the improvement increased by 3% (p = 0.013) [31]. On the other hand, VO 2max trainability was 14-38% greater (p = 0.042) in HERITAGE Caucasian offspring with the DD genotype (n = 81) [25]. Three studies found no association with ACE or angiotensinogen genetic variants and VO 2max training response in 53 Caucasians (average age 19 years) following 12 weeks of military training [47]; 147 multiethnic 19-24 year-old adults following 8 weeks of military training [39]; and 83 Brazilian policemen (average age 26 years) following 17 weeks of MICT (3 × 60 min per week at 50-85% VO 2peak ) [48].

Genes associated with oxidative phosphorylation and energy production
Mitochondrial DNA (mtDNA) encodes several enzyme subunits involved in oxidative phosphorylation, and may be a key factor in endurance and cardiorespiratory fitness [56]. Research of mtDNA variants in 41 inactive Japanese men (mean age 20.6) failed to find a significant difference in trainability after 8 weeks of MICT (3-4 × 60 min per week at 70% VO 2max ) [49]. On the contrary, 3 men (17-25 years) with the mtDNA variant in subunit 5 of ND5 had a lower VO 2max training response compared to other mtDNA variants (~gain 0.22 L/min less, p < 0.05) following 12-weeks of MICT (3-5 × 45 min per week at 85%HRR max ) [50].
The creatine kinase muscle (CKM) gene has been associated with reduced fatigue from increased adenosine triphosphate (ADP) production [26,27]. Using data from the HERITAGE study, parents and offspring homozygote for the 1170 bp allele (n = 12) had a lower VO 2max training response (3 times and 1.5 times lower respectively; p < 0.05) compared to other CKM genotypes (n = 148). This explained 9 and 10% of the inter-individual variation in VO 2max change respectively [26]. A nominal genetic linkage was identified in siblings (n = 277) who shared two alleles (1170 base pairs or 985 + 185 base pairs) at the CKM locus identical by descent (IBD), with these siblings having similar changes in VO 2max compared to siblings with fewer alleles IBD (p = 0.04) [27].
In an earlier study focusing on muscle specific inherited variations, no association was found in 295 Caucasians (18-30 years old) between CKM or adenylate kinase (AK1) variants after a randomized control trial that included 15 weeks of endurance training versus maximal power contraction interval training [40]. Similarly, no association was found with the CKM gene and VO 2max trainability in 937 Caucasian patients with coronary artery disease following 3 months of MICT (2-3 × 90 min aerobic sessions per week at 80% HR max ) [29].
The beta-2-adrenergic receptor (ADBR2) gene helps to support oxygen delivery to working muscles via the adrenergic receptors [30]. In participants from the CARAGENE study, there was no association found between ADBR2 genotypes or haplotypes, and VO 2max trainability [30].
The 5′-aminolevulinate synthase 2 (ALAS2) gene is highly expressed in erythroid cells and is imperative for hemoglobin and myoglobin synthesis [53]. Seventy-two Chinese participants (18-22 years old) allocated to one of 13 ALAS2 genotypes with compound dinucleotide repeats lengths (157 bp −184 bp), were placed in a 4-week 'HiHiLo' training program (varying between low and high altitude training at 75% VO 2max ) [53]. Baseline hemoglobin levels and change in VO 2max with training was significantly higher in subjects (n = 25) with the dinucleotide repeats ≤ 166 bp (p < 0.05). No significant associations were found between VO 2max trainability and other genes related to oxygen transport and utilization genotypes in 102 young Chinese soldiers following 18 weeks of 3 × 5000 m runs per week [35,37,38]. These genes include mitochondrial transcription factor A (TFAM) [35] and hemoglobin-beta locus (HBB) [38].

Hypotheses free studies
Over the last decade, with the advent of technological advances allowing researchers to genotype millions of genetic variants (e.g. SNPs) in each individual, the investigation of the contribution of common variants to traits is now feasible. Unbiased and hypothesis-free genome wide association studies (GWAS) for exercise/health-related traits have emerged.
Three studies have used GWAS to identify genes associated with the VO 2max response to exercise training [20, 21 28]. These are outlined in Table 3.
The first investigated two clinical trials and data from the HERITAGE study [28]. RNA expression profiling and VO 2max testing was performed on 24 healthy and inactive Caucasian men (average age 24 years) before and after a 6-week training intervention (4 × 45-min cycling sessions per week at 70% VO 2max ). Muscle biopsies from the vastus lateralis were collected and the RNA expression of genes was correlated with changes in VO 2max by analysing oligonucleotide arrays. Pearson correlations were used to identify the relationships between the median logit normalised probe sets and the number of times they were selected. In the 24 subjects, using a median correlation cut-off greater than 0.3, 29 genes were selected greater than 22 out of 24 times. The sum of expression of these 29 genes were found to have a significant linear relationship with VO 2max change following endurance training (r 2 = 0.58, p < 0.00001). Across the group, VO 2max changes improved on average by 14% and ranged from −2.8% to 27.5% (p = 0.0001). More than 20% of the group had a response less than 5%. A gene set enrichment analysis found that the oxidative phosphorylation gene was upregulated (False Discovery Rate (FDR) = 1.1%), which was associated with an increased reliance on lipids during training (RER decreased on average by 10% post training, p < 0.0001). To identify if these predictor genes would be similar in a different sample, a 12-week blind study on 17 young and active Caucasian men was conducted. Training consisted of 1day of testing, 2 sessions of interval training (3 × 3-min intervals at 40-85% P max ) and 2 × 60-120-min cycle sessions (55-60% P max ) each week. The 29 predictor genes were also significantly associated with VO 2max trainability in this group (p = 0.02). The haplotypes of these predictor genes were then genotyped using candidate genes identified from the HERITAGE study. Six genetic variants were associated with VO 2max trainability: SMTNL2, DEPDC6, SLC22A3, METTL3, ID3 and BTNL9 (p < 0.01 each). A stepwise regression model using 25 variants from the predictor set and 10 variants from the HERTI-AGE study (Table 3) found that eleven SNPs (included in Table 4) contributed to 23% of the differences seen in residual VO 2 max gains, which correlated to approximately 50% of the genetic variability in VO 2max trainability (seven variants from the RNA predictor set and four from the HERITAGE project). Reciprocal RNA expression validation found that three of four HERITAGE candidate genes enhanced the original RNA transcript predictor model. Overall, more than 90% of gene expression did not change. However, OCT3 was downregulated in high responders and H19 was upregulated in low responders (FDR <5%). BTNL9, KLF4 and SMTNL2 also had small but inconsistent changes in expression (i.e. dissimilar in high vs low responders) (FDR < 5%).
A GWAS examining 324,611 variants from the HERI-TAGE study was completed to identify possible predictor genes associated with VO 2peak [20]. Based on single-variant analysis, 39 variants (Table 3) were associated with gains in VO 2peak although none of these achieved genome-wide or suggestive significance (p = 1.5 × 10 −4 ) [19]. The strongest predictor for training response was found in the Acyl-CoA synthetase longchain family member 1 (ACSL1) gene (4:g.185725416A > G; rs6552828) which accounted for 7% of the training response (p = 1.31 × 10 −6 ). After a stepwise multiple regression analysis of the thirty-nine variants, 21 were suggested to account for (or at least contribute to) 49% of the variance in VO 2max trainability (included in Table 4; p < 0.05). The strongest predictors were found in SNPs associated with: PR domain-containing protein 1 (PRDM1); glutamate receptor, ionotropic, N-methyl-D-aspartate 3A (GRIN3A); N-methyl-D-aspartate receptor (NMDA); potassium voltage-gated channel subfamily H member 8 (KCNH8); zinc finger protein of cerebellum 4 (ZIC4); and, ACSL1. An unweighted 'predictor score' based on contribution to VO 2max of these 21 variants was created. A score of '0' represented homozygote for the low-response variant; '1' represented heterozygous and '2' represented homozygous for the high-response allele. Individuals with a score equal to or less than 9 (n = 36) had an average VO 2max score improvement of 221 mL O 2 /min. Alternatively, those (n = 52) with a score equal to or greater than 19 had an average VO 2max increase of 604 mL/min. The 15 most significant variants were tested for replication in a sample of African-Americans from the HERI-TAGE study, women in the Dose Response to Exercise (DREW) study (n = 112), and the men and women in the Study of a Targeted Risk Reduction Intervention through Defined Exercises (STRRIDE) (n = 183) [20]. Variants in the NDN (15:g.24008071 T > C; rs824205) and DAAM1 (14:g.59477414C > T; rs1956197) were replicated in the DREW study, the Z1C4 (3:g.146957166 T > C T; rs11715829) variant was replicated in the STRRIDE study and CAMTA1 (7:g.7015105 T > C; rs884736) and RGS18 (1:g.192059022G > A; rs10921078) variants were replicated in African-Americans from the HERITAGE study. Four variants in the genes supervillin (SVIL), neuropillin 2 (NRP2), titin (TTN) and carbozypeptidase (CPVL) identified by Timmons et al. [28] were also found by Bouchard et al. [20], however, at a significance of 0.008, these variants were not included in the multivariate regression analysis.
Using the HERITAGE cohort, an extended analysis was performed, with 2.5 million variants analysed [21]. To reduce bias associated with outlier variants, the second most significant variant p-value was used to determine genotype and changes in VO 2max . Even with an extended analysis, the ACSL1 gene was shown to have the most significant variant (4:g.185725416A > G; rs6552828), which confirmed findings by Bouchard et al. [20], whom identified the most significant variant at each gene ( Table 3). The following genes and their variants were also replicated in both studies: CAMTA1 (rs884736), RYR2 (rs7531957), g.63226200G > A (rs6090314), C12orf36 (rs12580476) and CD44 (rs353625) [20,21].
The gene prioritisation tool 'CANDID' was then used to rank candidate genes for changes in VO 2max [21]. This was done via: 1) a weighted analysis based on variant gene expression in targeted tissues; 2) GWAS p-value change in VO 2max ; 3) literature related to candidate genes; and 4) 'cross species sequence conservation' [21]. The top-ranking candidate genes from the GWAS and CANDID tool (Table 1) were then investigated for possible biological mechanisms and changes in VO 2max . As a result, variants were allocated into four groups: 1) broad effects on exercise-related processes (such as the electron transport chain, physical fitness, skeletal development and other cardiorespiratory markers); 2) moderately strong scores against selective exercise-related processes; 3) high and low scores across several exercise-related processes; 4) low scores across all exercise-related processes.
Variants and their involvement in pathways related to changes in VO 2max response were then examined [21]. Out of the sixteen pathways found, variants related to pantothenate and co-enzyme A (CoA) biosynthesis, PPAR gene signalling and immune function signalling had the highest level of 'burden' (variants contributing to trainability). The variants related to long-chain fatty acid transport (including ACSL1) and fatty acid oxidation strongly influence VO 2max training response via lipid metabolism process and the tricarboxylic acid cycle, both of which affect the availability of adenosine triphosphate and subsequently training response.

Predictor genes
Out of the 35 articles analysed (candidate genes and GWAS studies), 97 predictor genes were identified as possible contributors to VO 2max trainability (Table 4). These genes were based on what authors deemed significant, or the most significant, for their particular study. Thirteen of these predictor genes were replicated between at least two studies (bolded in Table 4). The traits for VO 2max trainability (e.g. which genotype was related to the training effect and whether it was a low or high responding genotype) was not outlined for each variant and hence this will require confirmation in future studies.

Discussion
This systematic review aimed to summarize genetic variants that have been identified as influencing VO 2max trainability. We have reviewed 35 studies that have reported 97 genes associated with an exercise traininginduced improvement in VO 2max . It has been estimated that VO 2max trainability has a significant heritable component of around 50% [39].
While most of the articles examined in this review have focused on one or a few candidate genes/markers (n = 32), it is noted that exercise-related phenotypes are complex traits and are polygenic (i.e. influenced by many genes working together) with each genetic variant likely to be contributing a small percentage (typically less than 1%) to the overall change in VO 2max [33,39,61]. Thus relying on one variant as a predictor is misguided; rather it has been suggested that a gene predictor score (GPS) based on numerous variants has a greater probability to determine higher and lower responders for VO 2max trainability. For example, a score of '0' represents a homozygote for a low-response variant; '1' represents heterozygous and '2' represents homozygous for a highresponse variant [20]. A higher score indicates a greater possible VO 2max training response (and vice versa). A similar model has been suggested in elite athletes aiming to determine the probability of an individual with a theoretically 'optimal' polygenic profile for endurance sports. The 'optimal' profile using a so-called 'total genotype score' (TGS, ranging from 0 to 100, with '0' and '100' being the worst and best genotype combinations, respectively) was quantified from a simple algorithm resulting from the combination of candidate polymorphisms [62,63].
These predictor genes, along with muscle RNA and protein expression data provide a sound platform to further explore the cellular mechanisms underlying VO 2max trainability. Further research will need to consider several limitations identified from the literature to-date. For example, the lack of replication found between articles and conflicting results with certain variants, may be a result of several main limitations (typically in study design). Firstly, most of the articles used a hypotheses-driven candidate gene approach (n = 32), several articles used retrospective data from similar cohorts (n = 19), and many lacked a control group and randomization (n = 31). While it is understandable that in the past, high-throughput SNP microarray or gene sequencing technology was not available to use, by looking at one or only a few gene variants (whereas it is estimated that the human genome consists of about 40 million common gene variants) it is almost impossible to generate meaningful information. Similarly, a lack of control group makes it challenging to distinguish between individual response to an intervention and within-subject random variation [64]. Secondly, most of the exercise training studies involve a relatively small number of participants (typically n = 20 to 30; with the exception of the HERITAGE and CARAGENE studies), which results in lack of statistical power when associating genotype with a phenotype. Many of the studies also failed to include a robust significance criterion (p < 0.05 occurs approximately 10 6 times in the genome by chance). Thirdly, a lack of racial diversity (74.5% Caucasian) further reduces the power of variants detected. Finally, many of the training studies were not tightly controlled in terms of nutrition, participant baseline data (study entry), physical activity status and other lifestyle factors.
Future research needs to consider epigenetic variation of gene activity that can occur in reaction to external factors, such as additional physical activity, drugs, diet and environmental toxins [61,65]. Such epigenetic modifications can affect all adaptions to exercise training [10]. For example, in addition to nutrition and baseline physical activity status, there were many other differences in subjects between articles not taken into consideration including: age, training duration and volume (MICT vs. HIIT), body weight, body fat percentage, medications, clinical versus healthy populations; sleep, psychological status and the gut microbiome. Together, these are potential epigenetic modifiers (e.g. DNA methylation and histone acetylation) that can influence gene expression, molecular function and thereby influence VO 2max training response [61,66]. Whether genes or epigenetic modifiers play a larger percentage role in adaptive variability in a specific situation requires further exploration.
To address these limitations, larger-scale studies are required to ascertain if the 97 predictor genes identified from this review are similar in various cohorts (e.g. several ethnicities, ages, gender). The Athlome Project Consortium, which includes the Gene SMART study, is an example of a current larger-scale investigation examining 'omic markers' of training response, elite performance and injury rates/predisposition in variety of populations [67]. Ideally, future studies will complement and expand on this research, and consider alternative forms of exercise training intensity and volume, lifestyle factors, general health, diet, medications and health history when implementing interventions and analyzing data.
Furthermore, the role of the gut microbiome, and its influence on metabolism and physiology, needs to be explored. For example, gut microbiota (which has its own genome) can interact with the tissue cellular environment to regulate gene expression [61]. Poor diet, stress, illness, the use of antibiotics, environmental toxins and poor lifestyle choices can increase inflammation within the gut, causing dysbiosis; this appears to contribute to chronic diseases and other illnesses, irrespective of genotype, age and gender [68,69]. Interestingly, VO 2max was recently shown to be related to gut microbial diversity in a human cross-sectional study [70], suggesting a link between VO 2max and gut microbes. Pre-and probiotics, resistant starch and a Mediterranean diet (dietary diversification) can alter the gut microbiome [68]. Investigating how the gut and human genome interact to positively influence VO 2max is warranted.
With these points in mind, the analysis of stool samples, in addition to incorporating epigenetic, transcription and proteomic analysis, may help to identify the best aerobic training or lifestyle intervention to upregulate or downregulate certain genes, signaling pathways and molecular responses required for a greater VO 2max training response. Implementing tightly-controlled studies examining various mediators (training intervention, diet, lifestyle) and molecular biomarkers across various populations will help to capture accurate information related to ideal traits for VO 2max trainability.

Conclusion
In total, 97 genes that predicted VO 2max trainability were identified. Phenotype is dependent on several of these genotypes/variants, which may contribute to approximately 50% of an individual's VO 2max trainability. Higher responders to exercise training have more positive response alleles (greater gene predictor score) than lower responders. Whilst these findings are exciting, further randomized-controlled research with larger and diverse cohorts are needed. Additional exploration is required to identify genetic variants and the mediators (training intensity and volume, diet, drugs, other lifestyle factors) that can potentially affect gene expression, molecular function and training response. Findings from this review and future research may assist clinicians to provide precision evidence-based medicine centered on phenotype, contributing to the fight against chronic disease.