Detection of Functional Overreaching in Endurance Athletes Using Proteomics

No reliable biomarkers exist to identify athletes in various training states including functional overreaching (FOR), non-functional overreaching (NFOR), and overtraining syndrome (OTS). Participants (N = 10, age 38.3 ± 3.4 years) served as their own controls and in random, counterbalanced order either ran/cycled 2.5 h (70.0 ± 3.7% VO2max) three days in a row (FOR) or sat in the lab (rest) (separated by three weeks; 7:00–9:30 am, overnight fasted state). Participants provided fingerprick samples for dried blood spot samples (DBS) pre- and post-exercise/rest, and then during two recovery days. DBS proteins were measured with nanoLC-MS in data-independent acquisition (DIA) mode, and 593 proteins were identified and quantified. Proteins were considered for the FOR cluster if they were elevated during one of the two recovery days but not more than one of the exercise days (compared to rest). The generalized estimating equation (GEE) was used to identify proteins linked to FOR. A total of 13 proteins was linked to FOR and most were associated with the acute phase response and innate immune system activation. This study used a system-wide proteomics approach to define a targeted panel of blood proteins related to FOR that could form the basis of future NFOR- and OTS-based studies.


Introduction
Successful training leading to enhanced performance involves cycles of overload and adequate recovery [1][2][3]. A primary goal during training is to avoid the combination of excessive overload and inadequate recovery leading to "overreaching", defined as a short-term decrement in performance with or without related physiological and psychological symptoms in which restoration of performance takes several days to several weeks [1].
"Functional" overreaching occurs when athletes deliberately use a short-term period (e.g., training camp) to increase the training load resulting in short-term performance decrements without serious, long-lasting psychological or other negative symptoms [2]. "Functional overreaching" (FOR or short-term overreaching) will eventually lead to an improvement in performance after recovery. Non-functional overreaching (NFOR or extreme overreaching) occurs when athletes train beyond their ability to recover with concomitant performance decrements and psychological disturbances that include decreased vigor and energy, increased fatigue, and loss of desire to train [1]. NFOR can result in a prolonged recovery time with sleep disturbance, elevated resting heart rate, illness, and psychological stress. A hallmark feature of FOR and NFOR is the inability to sustain intense exercise for a prolonged period of time.
Lab ( Figure 1). Participants reported to the lab in an overnight fasted state, provided a fingerprick sample, and completed the Training Distress Scale (TDS), a 19-item self-reported questionnaire that calculates training distress and performance readiness [19]. At 7:00 am, participants started running/cycling on laboratory treadmills or their own bicycles on CompuTrainer Pro Model 8001 trainers (RacerMate, Seattle, WA, USA) at 70% VO 2max , or sat in the lab for 2.5 h in an adjoining room to the lab. Heart rate, rating of perceived exertion (RPE), oxygen consumption (Cosmed CPET metabolic system), and ventilation were measured and recorded every 30 min during the 2.5 h exercise bouts. Participants consumed 2-3 mL/kg water every 15 min, and no other food or beverages were consumed during the 2.5-h bouts. An additional fingerprick sample was collected immediately post-exercise. Participants repeated these procedures for two additional days (thus, three 2.5-h exercise bouts or rest periods on Monday, Tuesday, Wednesday), and then returned to the lab to provide fingerprick blood samples and TDS responses on Thursday and Friday mornings at 7:00 am. After a 3-week period, participants crossed over and repeated the counterbalanced procedures. During the 3-day period when participants sat in the lab, moderate but not intensive training regimens were allowed. questionnaire that calculates training distress and performance readiness [19]. At 7:00 am, participants started running/cycling on laboratory treadmills or their own bicycles on CompuTrainer Pro Model 8001 trainers (RacerMate, Seattle, WA, USA) at 70% VO2max, or sat in the lab for 2.5 h in an adjoining room to the lab. Heart rate, rating of perceived exertion (RPE), oxygen consumption (Cosmed CPET metabolic system), and ventilation were measured and recorded every 30 min during the 2.5 h exercise bouts. Participants consumed 2-3 mL/kg water every 15 min, and no other food or beverages were consumed during the 2.5-h bouts. An additional fingerprick sample was collected immediately post-exercise. Participants repeated these procedures for two additional days (thus, three 2.5-h exercise bouts or rest periods on Monday, Tuesday, Wednesday), and then returned to the lab to provide fingerprick blood samples and TDS responses on Thursday and Friday mornings at 7:00 am. After a 3-week period, participants crossed over and repeated the counterbalanced procedures. During the 3-day period when participants sat in the lab, moderate but not intensive training regimens were allowed. Research design with study participants (N = 10) randomized to 3-day periods of 2.5 h/day running/cycling or sitting and two days resting recovery, with crossover to the counterbalanced condition after a 3-week washout period. Fingerprick blood samples were collected pre-and postexercise/sitting sessions during each 3-day period, and at 7:00 am the following two mornings (overnight fasted state). The Training Distress Scale (TDS) was administered at 7:00 am each of the five mornings in the lab.

Proteomics Procedures
In this study, dried blood spot (DBS) specimens were collected via fingerprick onto standard blood spot cards (Whatman ® protein saver cards, Sigma-Aldrich, St. Louis, MO, USA) and dried overnight. Samples were shipped to Biognosys (Schlieren, Switzerland) for global proteomics analysis [20,21]. A puncher was used to remove the middle of the DBS, and proteins were solubilized, reduced and alkylated, and digested into peptides using trypsin. Samples were cleaned up using C18 columns and dried down. To reduce variance, all 16 samples for each athlete (batch) were randomized and subsequently measured consecutively by mass spectrometry (MS). MS sensitivity and precision were monitored using a pooled sample of all DBS samples, with injections before, after, and twice during each batch. MS analyses were performed on a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) coupled to a nanoLC autosampler. 1 µ g of DBS peptide of each sample was injected and peptides were separated with reverse phase nanoLC chromatography. All samples were measured with data independent acquisition mode (DIA).

Data Processing
The DIA files were processed using Spectronaut ™ software (Biognosys). Spectronaut was also used to calculate the false discovery rate (FDR) of identified peptides and a cut-off of 0.01 was taken across all samples. For each protein, the three most abundant peptides were used for quantitative analysis (in case more than three peptides per protein were identified). Research design with study participants (N = 10) randomized to 3-day periods of 2.5 h/day running/cycling or sitting and two days resting recovery, with crossover to the counterbalanced condition after a 3-week washout period. Fingerprick blood samples were collected pre-and post-exercise/sitting sessions during each 3-day period, and at 7:00 am the following two mornings (overnight fasted state). The Training Distress Scale (TDS) was administered at 7:00 am each of the five mornings in the lab.

Proteomics Procedures
In this study, dried blood spot (DBS) specimens were collected via fingerprick onto standard blood spot cards (Whatman ® protein saver cards, Sigma-Aldrich, St. Louis, MO, USA) and dried overnight. Samples were shipped to Biognosys (Schlieren, Switzerland) for global proteomics analysis [20,21]. A puncher was used to remove the middle of the DBS, and proteins were solubilized, reduced and alkylated, and digested into peptides using trypsin. Samples were cleaned up using C18 columns and dried down. To reduce variance, all 16 samples for each athlete (batch) were randomized and subsequently measured consecutively by mass spectrometry (MS). MS sensitivity and precision were monitored using a pooled sample of all DBS samples, with injections before, after, and twice during each batch. MS analyses were performed on a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) coupled to a nanoLC autosampler. 1 µg of DBS peptide of each sample was injected and peptides were separated with reverse phase nanoLC chromatography. All samples were measured with data independent acquisition mode (DIA).

Data Processing
The DIA files were processed using Spectronaut™ software (Biognosys). Spectronaut was also used to calculate the false discovery rate (FDR) of identified peptides and a cut-off of 0.01 was taken across all samples. For each protein, the three most abundant peptides were used for quantitative analysis (in case more than three peptides per protein were identified).

Statistics
The statistical methodology used to detect protein responses to exercise involved the use of Generalized Estimating Equations (GEE) and Generalized Linear Mixed Models (GLMM) at single protein levels. The interaction between time of the measurement and condition status (exercise, rest) was used as a categorical predictor. In the current study, GEE was the preferred choice given the research design (2-arms, randomized, crossover) and the sample of ten athletes. For the statistical power simulation test, GLMM was used. In this study, the data were first corrected for potential batch effects, prior to normalization relative to the maximum value in each row. To account for technical variability, the significance threshold (p-value) was set to 0.01 if the coefficient of variation (CV) in the z-score of a given protein in technical repeat samples exceeded 15% across athletes and time points. Otherwise the p-value significance level was set at p ≤ 0.05. The simulation generated artificial, normally distributed data using the technical variance of the original data. Different levels were derived from twelve proteins randomly selected from the subset of athletes (from 10 to 50 with steps of 10), with the assumption that the effect would be notable in 80% of the athletes at each time point. Bonferroni correction was applied assuming a final panel of five proteins. After estimating the GEE and GLMM models, pairwise comparisons between each time-by-condition level were calculated, and the Tukey correction for multiple comparison was applied to adjust the significance level. The GEE analysis of the original data and the GLMM analysis of simulated data delivered identical results for given proteins.

Protein-Protein Interaction Network Analysis
Proteins expressed acutely following each of the three 2.5-h exercise sessions, and those expressed on day 1 and/or day 2 of recovery were mapped onto STRING v10 to build two protein-protein interaction networks. STRING v10 (search tool for the retrieval of interacting genes/proteins) is a database of known and predicted physical/functional protein associations based on genomic context, high-through put experiments, co-expression and previous knowledge (http://string-db.org/) [22].

Results
Study participant data (N = 10 males) are summarized in Table 1. Metabolic monitoring of the three 2.5-h exercise sessions showed that the athletes averaged a heart rate of 140 ± 4.3 beats/min (79.6 ± 2.0% maximal heart rate or HR max ), oxygen consumption of 29.4 ± 0.9 mL kg −1 min −1 (70.0 ± 1.2% maximal oxygen consumption or VO 2max ), and ventilation of 65.1 ± 1.9 L/min. The rating of perceived exertion (RPE) averaged 15.4 ± 0.3 units at the end of each session (rating of "hard"). No significant differences were found between the runners (N = 3) and cyclists (N = 7) for the data listed in Table 1 and exercise performance data, and all analyses were conducted for the combined group of these 10 athletes. The simulation using GLMM modelling with 34 proteins showed that a relatively low number of participants was needed to show significant changes in protein up and down regulation, supporting the use of 10 athletes in this randomized, crossover trial. Data consistency was monitored through several pooled samples, and the results supported a reproducible protein quantitation.
The total Training Distress Scores (TDS) were higher at 7:00 am (pre-lab sessions) in the exercise compared to rest trials on the second and third days, and the first day of recovery (interaction effect, p < 0.001) ( Figure 2). The total Training Distress Scores (TDS) were higher at 7:00 am (pre-lab sessions) in the exercise compared to rest trials on the second and third days, and the first day of recovery (interaction effect, p < 0.001) ( Figure 2). The DIA approach using the dried blood spot samples (16 samples per athlete) resulted in the identification of 6499 precursors. Protein inferences (discarding the non-unique peptide hits and assigning the peptides to the proteins of origin) resulted in a total number of 593 proteins (intra-batch technical median CV of 22%, batch correction using the R package Combat) (see Table S1) [20,21]. Of the 593 identified proteins, 60 increased significantly immediately post-exercise on day 1 (of the 3day exercise period) compared to the rest condition. Of these, 30 were related to immune function. Table 2 lists 15 of the proteins that increased significantly immediately post-exercise after each of the three days of exercise compared to rest, and most were related to immune function. The median technical CV of the listed proteins was 6%, with a range of 3-8%, except for O95810 with 23%. Figure  3A-H) compares exercise and rest intensity data for eight of these 15 immune-related proteins.  The DIA approach using the dried blood spot samples (16 samples per athlete) resulted in the identification of 6499 precursors. Protein inferences (discarding the non-unique peptide hits and assigning the peptides to the proteins of origin) resulted in a total number of 593 proteins (intra-batch technical median CV of 22%, batch correction using the R package Combat) (see Table S1) [20,21]. Of the 593 identified proteins, 60 increased significantly immediately post-exercise on day 1 (of the 3-day exercise period) compared to the rest condition. Of these, 30 were related to immune function. Table 2 lists 15 of the proteins that increased significantly immediately post-exercise after each of the three days of exercise compared to rest, and most were related to immune function. The median technical CV of the listed proteins was 6%, with a range of 3-8%, except for O95810 with 23%. Figure 3A-H) compares exercise and rest intensity data for eight of these 15 immune-related proteins.    STRING protein-protein interactions using proteins listed in Table 2 are depicted in Figure 4. Biological process pathways identified in STRINGS for the 15 proteins included four gene sets involved with killing of cells of other organisms, four to six with the defense response to bacterium, fungus, and other organisms, six to seven with chemotaxis and locomotion, and seven to eight with the immune system response (all FDR < 0.015).
Proteomes 2018, 6, x 9 of 17 STRING protein-protein interactions using proteins listed in Table 2 are depicted in Figure 4. Biological process pathways identified in STRINGS for the 15 proteins included four gene sets involved with killing of cells of other organisms, four to six with the defense response to bacterium, fungus, and other organisms, six to seven with chemotaxis and locomotion, and seven to eight with the immune system response (all FDR < 0.015).  Table 2. The thickness of the network lines indicates the strength of data support (https://string-db.org).
"Chronic proteins" in this study were defined as proteins that did not increase or decrease acutely after the three exercise versus rest bouts, but increased only on the morning of day 1 and/or day 2 of recovery. Of the 593 proteins, 71 chronic proteins were identified through GEE modelling. Three other criteria were applied to narrow down this list of proteins to those most strongly associated with the recovery period from the FOR exercise period. An overview of this approach is described in Figure 5, with emphasis on recovery day expression, those that showed a clear exercise versus rest pattern when graphically displayed, and proteins that had literature support and biological plausibility. After application of these criteria, 13 chronic proteins were included as listed in Table 3. Of these, at least 11 were related to immune function. Intensity data for five of the 13 proteins from Table 3 are presented in Figure 6A-E.   Table 2. The thickness of the network lines indicates the strength of data support (https://string-db.org).
"Chronic proteins" in this study were defined as proteins that did not increase or decrease acutely after the three exercise versus rest bouts, but increased only on the morning of day 1 and/or day 2 of recovery. Of the 593 proteins, 71 chronic proteins were identified through GEE modelling. Three other criteria were applied to narrow down this list of proteins to those most strongly associated with the recovery period from the FOR exercise period. An overview of this approach is described in Figure 5, with emphasis on recovery day expression, those that showed a clear exercise versus rest pattern when graphically displayed, and proteins that had literature support and biological plausibility. After application of these criteria, 13 chronic proteins were included as listed in Table 3. Of these, at least 11 were related to immune function. Intensity data for five of the 13 proteins from Table 3 are presented in Figure 6A-E.
Proteomes 2018, 6, x 9 of 17 STRING protein-protein interactions using proteins listed in Table 2 are depicted in Figure 4. Biological process pathways identified in STRINGS for the 15 proteins included four gene sets involved with killing of cells of other organisms, four to six with the defense response to bacterium, fungus, and other organisms, six to seven with chemotaxis and locomotion, and seven to eight with the immune system response (all FDR < 0.015).  Table 2. The thickness of the network lines indicates the strength of data support (https://string-db.org).
"Chronic proteins" in this study were defined as proteins that did not increase or decrease acutely after the three exercise versus rest bouts, but increased only on the morning of day 1 and/or day 2 of recovery. Of the 593 proteins, 71 chronic proteins were identified through GEE modelling. Three other criteria were applied to narrow down this list of proteins to those most strongly associated with the recovery period from the FOR exercise period. An overview of this approach is described in Figure 5, with emphasis on recovery day expression, those that showed a clear exercise versus rest pattern when graphically displayed, and proteins that had literature support and biological plausibility. After application of these criteria, 13 chronic proteins were included as listed in Table 3. Of these, at least 11 were related to immune function. Intensity data for five of the 13 proteins from Table 3 are presented in Figure 6A-E.    Intensity * * * *    Table 3 are depicted in Figure 7 (https://string-db.org). P01834 (Ig kappa chain C region) and P04220 (Ig mu heavy chain disease protein) for humans were not listed in STRING. Of the nine proteins networked in Figure 5 (all FDR < 0.001), seven were included in the gene set for the immune defense response (biological process), four in the acute phase response, three in complement activation, and three in the humoral immune response mediated by circulating immunoglobulins.

P05155
Plasma protease C1 inhibitor Crucial role in regulation of complement activation Q14624 Inter-alpha-trypsin inhibitor heavy chain H4 Glutaredoxin-1 Glutathione activity; cell redox homeostasis STRING protein-protein interactions using the proteins listed in Table 3 are depicted in Figure  7 (https://string-db.org). P01834 (Ig kappa chain C region) and P04220 (Ig mu heavy chain disease protein) for humans were not listed in STRING. Of the nine proteins networked in Figure 5 (all FDR < 0.001), seven were included in the gene set for the immune defense response (biological process), four in the acute phase response, three in complement activation, and three in the humoral immune response mediated by circulating immunoglobulins.  Table 3. The thickness of the network lines indicates the strength of data support (https://string-db.org). P01834 (Ig kappa chain C region) and P04220 (Ig mu heavy chain disease protein) for humans were not listed in STRING.  Table 3. The thickness of the network lines indicates the strength of data support (https://string-db.org). P01834 (Ig kappa chain C region) and P04220 (Ig mu heavy chain disease protein) for humans were not listed in STRING.

Discussion
Using a randomized, crossover design, ten male runners and cyclists sat in the lab or exercised intensely for 2.5 h each morning in an overnight fasted state for three days in a row. Total distress scores showed that the 3-day exercise period increased psychological distress to levels expected during functional overreaching [19]. Dried blood spot (DBS) samples were collected pre-and post-exercise/rest, and at 7:00 am the following two mornings, and the use of this practical blood collection method allowed the acquisition of a large number of samples with minimal discomfort to the athletes. DBS samples offer many advantages, especially in an athletic setting, as a sample format including ease and safety of transport and handling [23]. Analytes preserved within the DBS are stable for long time periods at ambient conditions and can be eluted in solvents for later proteomics analysis.
Global proteomics procedures from the DBS samples showed that of 593 proteins identified, 60 proteins increased significantly after the 2.5 h exercise bout on day 1 (of the 3-day exercise period) and 15 after each of the three days of exercise compared to rest. STRING protein-protein interactions showed that these 15 proteins expressed acutely post-exercise were involved with an activated immune system response including pathogen defense and immune cell chemotaxis and locomotion. Some of these proteins involved in the acute response to exercise have been identified previously, especially neutrophil elastase and protein S100-A8 [16,17,24,25]. S100-A8/A9 (calprotectin) is released primarily from activated neutrophils, is found in high levels in human blood samples (~5 mg/L), promotes phagocyte migration and functions as an alarmin and endogenous danger-associated molecular pattern (DAMP), and is regarded as a cardiovascular disease (CVD) risk factor when systemically expressed [26]. S100-A12 is a ligand for receptor advanced glycation end products (RAGE), promotes phagocyte chemotaxis, and is a potent stimulator of acute inflammation [27,28].
Polymorphonuclear neutrophils are the first cells recruited to inflammatory sites following exercise [28], and neutrophil elastase is one of three serine proteases stored in granules that act in combination with reactive oxygen species to help degrade engulfed microorganisms and debris [29]. Neutrophils read chemotactic peptides released from damaged cells or bacteria, and then respond by increasing the nucleation and polymerization of actin filaments. Profilins are small (12 to 15 kDa) and abundant proteins that have been found in all eukaryotic cells tested [30] and are involved in the dynamic turnover and restructuring of the actin cytoskeleton. Thus, post-exercise increases in profilin-1 and actin (cytoplasmic 1) appear to represent the increase in neutrophil actin filament polymerization that supports migration to involved tissues [31].
The primary purpose of this investigation was to identify a targeted panel of post-FOR chronically expressed proteins that could be utilized and validated in future overtraining-based investigations. Our analysis found that 13 of the 593 identified proteins did not increase acutely post-exercise, but increased on the morning of day1 and/or day 2 of recovery. STRING protein-protein interactions showed that most of these proteins were involved in the immune defense response including the acute phase response, complement activation, and humoral responses mediated by circulating immunoglobulins. Similar to the findings of the current study, others have shown that targeted protein biomarkers such as myeloperoxidase (MPO) and various acute phase proteins including serum amyloid A (SAA), complement factors, and alpha-1-acid glycoproteins are elevated after ultramarathon events or prolonged and intensive exercise training periods [9,18,25,[32][33][34][35][36][37][38].
The acute phase response is a systemic reaction to environmental insults including severe stress, infection, trauma, and late-stage cancer, and involves the hepatic production of many proteins including SAA, C-reactive protein, complement proteins, antiproteases, transport proteins, and those involved with the coagulation and fibrinolytic system [39,40]. During the acute phase response, plasma levels of SAA rise to very high levels, are produced by hepatocytes and tissue macrophages, and trigger multiple signaling pathways related to phagocyte migration and inflammation [39]. Similar to the findings of the current study, SAA was elevated 48 h following the 246-km Spartathlon race in a group of ultradistance runners [9,33]. In this study, IL-6 increased immediately after the Spartathlon race and then returned to normal in contrast to SAA that remained high during the 2-day recovery period [9,33]. IL-6, which rises to high levels following stressful exercise bouts, stimulates the production of most acute-phase proteins by hepatocytes that influence one or more stages of inflammation [40]. Animal-based studies have focused on the value of measuring SAA and other acute phase proteins in monitoring physiological responses to intensified training stress [10][11][12][13][14][15][16][17]. One study of 20 Arabian horses showed that higher compared to lower pre-race serum SAA levels were strongly linked to an inability to complete 120-and 160-km endurance events [10].
In the current study, MPO rose strongly on the second day of recovery from the 3-day period of intensified exercise. MPO is a lysosomal protein stored in azurophilic granules of the neutrophil and during degranulation is released into the extracellular space. MPO is a biomarker of neutrophil activation and inflammation following strenuous exercise [15][16][17] and is systemically elevated in patients with coronary artery disease [25,41]. MPO was increased for at least 19 days in 42 triathletes following an Ironman triathlon, although this study lacked a suitable control group [25].

Conclusions
Prior studies utilized a limited number of proteins, and most were based on acute shifts in racing dogs and horses. The chief contribution of the current study was the use of a system-wide proteomics approach to define clusters of blood proteins from DBS samples that were (1) expressed acutely post-exercise and (2) chronically during 2-day recovery from a 3-day period of intensified exercise (FOR). Of 593 proteins identified, 60 proteins increased significantly after the 2.5 h exercise bout on Day 1 and 15 after each of the three days of exercise compared to rest. Thirteen of the identified proteins did not increase acutely post-exercise, but increased on the morning of day 1 and/or day 2 of recovery. Most of these proteins (acute and chronic) signaled an exercise-induced activation of innate immune function, supporting prior research demonstrating the heavy involvement of the immune system in restoring homeostasis after intense exercise [42,43]. The next step in this line of research is to test the targeted panel of FOR-related proteins defined in this study in NFOR-and OTS-based investigations with high-level athletes. The "chronic" period included in this study after FOR (2 days of recovery) could be extended for weeks and months within the NFOR and OTS context. The ultimate goal is to refine the targeted proteomics panel so that when combined with various other tools such as the TDS and workload assessment will result in a highly predictive process that will assist the coach in individualizing training regimens to prevent NFOR and OTS in athletes.

Conflicts of Interest:
The authors declare no competing financial interest. Groen and Pugachev are founders and owners of ProteiQ (Berlin, Germany).

Abbreviations
The following abbreviations are used in this manuscript: