GC–MS-based urinary organic acid profiling reveals multiple dysregulated metabolic pathways following experimental acute alcohol consumption

Metabolomics studies of diseases associated with chronic alcohol consumption provide compelling evidence of several perturbed metabolic pathways. Moreover, the holistic approach of such studies gives insights into the pathophysiological risk factors associated with chronic alcohol-induced disability, morbidity and mortality. Here, we report on a GC–MS-based organic acid profiling study on acute alcohol consumption. Our investigation — involving 12 healthy, moderate-drinking young men — simulated a single binge drinking event, and indicated its metabolic consequences. We generated time-dependent data that predicted the metabolic pathophysiology of the alcohol intervention. Multivariate statistical modelling was applied to the longitudinal data of 120 biologically relevant organic acids, of which 13 provided statistical evidence of the alcohol effect. The known alcohol-induced increased NADH:NAD+ ratio in the cytosol of hepatocytes contributed to the global dysregulation of several metabolic reactions of glycolysis, ketogenesis, the Krebs cycle and gluconeogenesis. The significant presence of 2-hydroxyisobutyric acid supports the emerging paradigm that this compound is an important endogenous metabolite. Its metabolic origin remains elusive, but recent evidence indicated 2-hydroxyisobutyrylation as a novel regulatory modifier of histones. Metabolomics has thus opened an avenue for further research on the reprogramming of metabolic pathways and epigenetic networks in relation to the severe effects of alcohol consumption.


1.
Sample preparation, organic acid extraction and GC-MS analysis

Extraction and derivatization of organic acids
The 5 mL aliquots used for the analyses were thawed at room temperature and vortexed before use.
The volume of urine used for the organic acid analysis was based on the creatinine value of each of the samples: • For creatinine values higher than 100 mg% use 0.5 ml urine • For creatinine values less than 100 mg% but higher than 5 mg% use 0.5 ml urine • For creatinine values less than 5 mg% but higher than 2 mg% use 2 ml urine • For creatinine values less than 2 mg% use 3 ml urine.
The correct volume of urine was transferred to a 6 ml Kimax tube and 6 drops of 5M HCl was added to adjust the pH of the urine to below 2.
The volume of internal standard (IS) added was determined by multiplying the creatinine value (in mg%) with five times the volume of urine used. A mix of malonic acid (RT ~14 minutes) and 4-phenylbutyric acid (RT ~20 minutes) was used as internal standard. The IS was prepared by dissolving 52.5 mg of each in 50 ml milli-Q water. The solution was sonicated for 30 minutes to ensure that all compounds were properly dissolved. 4-Phenylbutyric acid was used owing to its absence in normal urine and in known pathological conditions. In addition, it elutes almost in the middle of the organic acid profile and theoretically co-elutes with very few, if any, other organic acids. Malonic acid was added as a second IS reference peak in the GC chromatogram.
The first extraction was done by adding 6 ml of ethylacetate to each of the samples. The tubes were capped and checked for leakage by inverting the tubes before being mixed on a rotary wheel for 30 minutes. The samples were then centrifuged at 3000 rpm for 3 minutes and the organic phase (top phase) was aspirated into a clean 6 ml Kimax tube.
The second extraction was done by adding 3 ml of diethylether to the aqueous phase (lower phase) and the tubes were once again capped and checked for leakage. The samples were then mixed on the rotary wheel for 10 minutes and centrifuged at 3000 rpm for 3 minutes. The organic phase was aspirated and added to the ethylacetate phase and the aqueous phase was discarded.
Approximately 2 ml of anhydrous sodium sulphate (Na 2 SO 4 ) was added to each sample and the tubes were capped and vortexed to mix. Proper dispersion of the Na 2 SO 4 ensures that all the water is removed from the organic phase. The samples were then centrifuged at 3000 rpm for 1 minute and the organic phase was decanted into a clean 3 ml Kimax tube. The samples were finally evaporated to dryness in a heating block at 37ºC under nitrogen gas for approximately 45 minutes.
Using a Hamilton syringe, the derivatization reagents Obis(trimethylsilyl)trifluoroacetamide (BSTFA), trimethylchlorosilane (TMCS) and pyridine were added to the dried samples. The volume of BSTFA added was determined by multiplying the creatinine value (in mg%) with three times the volume of urine used. The volume of TMCS and pyridine added was determined by multiplying the creatinine value (in mg%) with 0.6 times the volume of urine used. The tubes were capped and incubated at 70ºC for 45 minutes.
The derivatized samples were then transferred to 1.5 ml GC-MS vials with inserts, capped, and placed in the GC-MS autosampler for GC-MS analysis. Blank (hexane), QC and repeat samples were included in the injection sequence, as described in the main text.
The internal standards 4-phenylbutyric acid and malonic acid as well as the anhydrous Na 2 SO 4 were purchased from BDH. Ethylacetate and dietylether were obtained from Merck Chemicals. The derivatization reagents Obis(trimethylsilyl)trifluoroacetamide (BSTFA), trimethylchlorosilane (TMCS) and pyridine were purchased from Sigma Chemical Company.

GC-MS analysis
An Agilent GC-MS system was used in this study and consisted of a model 7890A gas chromatograph, a model 5975C mass selective detector, a Hewlett Packard 5970C quadrupole mass spectrometer and Agilent Chemstation. GC separation was achieved on a WCOT fused silica capillary column [30 m x 0.32 mm (i.d.)] coated with SE30 CB, film thickness 0.30 µm (Machery Nagel).
Samples were introduced into the column via a splitless injector at 280°C. The initial oven temperature was kept at 60°C for 2 minutes and then programmed to rise 6°C/min to a final temperature of 280°C. This temperature was maintained for 5 minutes. Helium was used as carrier gas at a pressure of 60 kPa and at a constant flow rate of 1 ml/min. The column was inserted directly into the ion source with an interface temperature of 280°C. The mass spectra of all GC peaks were generated by a mass spectrometer operated in the electron impact (EI) mode, with electron energy of 70 eV. The MS source and quadrupole temperatures were 250°C and 150°C respectively.

Qualitative analysis of metabolites in QC samples (metabolite identification)
After GC separation, each peak at an indicated RT was analysed in EI mode in order to investigate the fragmentation pattern of each compound. The m/z values selected for feature identification were either the base peak ion or one of the more abundant characteristic fragment ions.
Deconvolution and data analyses were conducted using AMDIS software (Version 2.71) linked to NIST Mass Spectral Search Program for the NIST/EPA/NIH Mass Spectral Library (Version 2.0F, built Oct. 8, 2008). Where authentic standards were available, their respective response factors were used, and for those compounds where no authentic standards were available, a response factor of 1 was assumed. The analytical setting of the AMDIS software was as follows: minimum factor -60%, threshold -"off", Scan Direction -"high to low", and type of analysis -''Use of an internal standard for RI''. The deconvolution settings were: component width -12, adjacent peak subtraction -1, resolution -medium, sensitivity -low, and shape requirements -low. The first hit of identified compounds and integrated area of the peaks were exported to Microsoft Excel.

Details of the 172 identified features
Supplementary Table S2 gives the details of each of the 172 identified and classified variables. A reference value for the normal concentration range in urine for adults of each identified variable is also given. The original list of 172 variables contained substances that were excluded from the metabolomics analysis to assess the influence of acute alcohol consumption.

Supplementary
Only the 120 included substances are numbered in column 4 of the table. The reasons for the exclusion of the other 52 substances are given in the final column of the table.
Categories for exclusion of variables were: 1. Exogenous contaminants that are not recognized biological substances. 2. Substances of a questionable biological function or origin, based on literature assessments. 3. Artifacts formed from chemicals (e.g. urea) present in the urine. 4. Artifacts due to the reaction conditions (e.g. formation of lactones, used for correction of the parent substance). 5. Artifacts due to formation of an additional TMS derivative (used for correction of the parent substance). 6. Substances showing multiple peaks in the GC-profile (used for correction to only one substance).

Power
Only a limited number of subjects could be included in an intervention study of this nature, i.e. exposure to a substance (alcohol) with known health risks. It was therefore necessary to ensure that the small sample size would not prevent the quantification of the effect of the intervention. A power calculation was performed assuming the following: (i) The effect of the intervention would be large, with an expected effect size of at least 0.8 when calculated based on Cohen's d-value 1 . This was considered a valid assumption given the design of the study, i.e. consuming a relatively large volume of alcohol on an empty stomach. (ii) The dependence between samples from the same individual would be taken into account by using the dependent samples, or paired t-test, to assess the significance of the effect of the intervention. Though this method is a parametric univariate method comparing only two groups, it was deemed appropriate as we could transform the data to achieve normality, we did not know beforehand how many metabolites would be extracted, and the most dramatic effect of the intervention was expected to be between time 0 (before consumption of the alcohol) and 1 hour after consumption based on a pilot study. (iii) A significance level of 5% for a two-tailed test and a sample size of 12. We did not want to speculate on the metabolites that would be extracted and hence could not speculate on the direction of the expected change. The two-tailed hypothesis was therefore selected.
The power under these conditions is plotted against the effect size in Supplementary Fig. S1, from which it is evident that power values above 0.7 are achieved for effect sizes of 0.8 or higher. This was acceptable in the current context. The power curve given a one-sided hypothesis is also included for reference purposes to show the increased power.  Figure S1. Power curves Power curves for a dependent samples t-test for a sample size of 12 and a 5% significance level, when setting a two-sided (a) and one-sided (b) alternative hypothesis.

Data pre-processing
The collection and analysis of biological samples culminated in 12 subjects observed across 5 time points and measured over 120 metabolites. A 50% zero filter was applied, taking time into account -that is, variables were excluded if they contained more than 50% zero-values in all 5 times. Two variables were removed based on these criteria. The remaining zero-values were imputed for each variable by randomly generating numbers from the tail of beta-distribution fitted to the non-zero observations. The resulting random numbers were all smaller than half of the minimum observed value. The data were then log transformed and centred prior to further statistical analysis.

Statistical methods for variable selection
It was decided to deconstruct the three-dimensional data tensor into less complex cross-sections in order to select a subset of statistically important metabolites to aid biological interpretation of the observed perturbations.
The assumption was made that the most biologically acute effect of the intervention would peak one hour after alcohol consumption. The cross-section between time 0 (before consumption of the alcohol) and one hour after consumption was therefore analysed using univariate and multivariate statistical methods.
The univariate nonparametric Wilcoxon signed-rank test was used to assign significance levels (p-values) to changes in metabolite levels from time 0 to time 1. The dependent t-test was also applied, after data transformation to ensure normality, but not used as current research revealed that the Wilcoxon signedrank test to be a suitable alternative with marginally more power for a nonnormally distributed data set of smaller sample size 2 .
Multivariate partial least squares-discriminant analysis (PLS-DA) is a method that constructs a regression model to predict group membership by projecting variance in the metabolites measured, as well as in the observed group membership, to new spaces. PLS-DA models can easily overfit and produce models with inflated predictive ability unless extensively validated through test data and permutation testing. For this reason, the PLS-DA results are used cautiously as the small sample size did not allow for proper validation. The results used are limited to the VIP values, which are produced by the PLS-DA model for each metabolite as an indication of its predictive strength. Given our concern here, it is again important to emphasize that the aim of this selection was not to model the observed data, but rather to gain a deeper understanding of the predominant biological changes occurring due to the intervention. Ranking and selecting metabolites according to VIP values enabled us to identify metabolite combinations which dominated the observed change in metabolomic state. Ultimately, we revert back to the raw data (means and standard deviations) to interpret the resulting list against established metabolic pathways, and including proposals for extension or modification of them, resulting in a representation of a global metabolite profile reflecting the metabolic consequences of the alcohol consumption.
PCA is similar to PLS-DA in that observed data are also projected, but to new spaces that maximize variation along fewer hyperplanes, not taking the group membership into consideration.
A subset of metabolites was subsequently identified to gain a deeper understanding of the predominant biological changes occurring due to the intervention. The selection was not made for the purpose of modelling the effect of the intervention in time nor to predict group membership. Changes in metabolites levels were then ranked based on their multivariate VIP values. The significance of changes in high-ranking variables was then established through fold change (FC) values and non-parametric Wilcoxon signed-rank test (WRT) pvalues. The selection criteria were then as follows: VIP ≥ 1.0, WRT p ≤ 0.05 and |FC| ≥ 1.5.

Correlation analysis
Associations between metabolites shortlisted for biological interpretation were assessed using Spearman's rho or rank order correlation coefficient. This method is nonparametric as this takes the ranks of the data as inputs. The correlation coefficients (r-values) produced are seen as biologically relevant if |r| ≥ 0.5 1 .

Statistical methods for repeated measures data
The data tensor evaluated here has a very specifically designed structure with the same individuals being measured over and over, hence a repeated measures design over 5 points in time. Although the Wilcoxon signed-rank test accounts for groups of data not being independent but paired, standard PCA and PLS-DA applications do not. To understand the importance of this information for these models, a multi-level PCA and PLS-DA on the within-subject variation was also performed.
Supplementary Fig. S2 provides the scores plots corresponding to the crosssections of the data used in the main text. The PCA scores plots are comparable with those included in the main text, showing some differentiation one hour after the alcohol consumption (Fig. S2A), followed by complete separation after 2 and 3 hours. The separation is, however, retained after 4 hours in the multi-level PC model. That said, a slight yet progressive return after 3 and 4 hours to the profile at time 0 is still evident when considering the decreasing variance explained by the first principal component. The PLS-DA plots again showed a complete separation for all four times following alcohol consumption relative to time 0. Finally, the entire data tensor was modelled using PCA. This was achieved by unfolding the data along the time dimension 3 . This approach allows for the projection of data to fewer dimensions, while retaining the maximum amount of variation and understanding of the changes observed in time. This can then be observed and compared over all individuals, by averaging scores, or for each individual to allow for comparison. In addition, the contribution of each metabolite to the projection can be established based on their loadings, but more noteworthy, based on the directional loadings vectors as reflected in the bi-plots (Fig. 4 in the main text).

The vehicle/hippuric acid effect
The consumption of the flavoured water as vehicle is an intervention in its own right, given the relatively high concentration of sodium benzoate, used as a preservative. We have previously described the effect of the vehicle consumption 4 . From this investigation it became clear that hippuric acid dominates the urine profile, which we took in consideration when using flavoured water as vehicle for alcohol consumption -we applied a paired PLS-DA on the original data, excluding the information on hippuric acid, and compared the VIP values from this analysis to those from the unpaired PLS-DA which included hippuric acid. Both analyses produced 27 metabolites with a VIP > 1.0, with only the 12 metabolites defined as important indicators, and listed in Table 1, being common to both analyses when applying the additional criteria of WRT p ≤ 0.05 and |FC| ≥ 1.5. These results imply that the presence of hippuric acid in the data set did not influence the metabolites listed in Table 1, and, per implication, did not affect the outcome of the study.

Statistical software
Supplementary Table S3 lists the data analysis software used to perform the different statistical analyses discussed throughout this section.

INFORMED CONSENT TO PARTICIPATE IN EXPERIMENT WITH ETHICAL APPROVAL FROM THE NORTH-WESTUNIVERSITY TITLE:
An investigation into the metabolic responses to acute alcohol consumption

AIM OF EXPERIMENT:
To determine the metabolic perturbations in young males in a fasted state, due to the consumption of a fixed acute dose of alcohol and/or NAD, as investigated by a metabolomics methodology. The North-West University's Centre for Human Metabolomics aims to investigate perturbations associated with human metabolism by means of a metabolomics approach. Metabolomics is a biochemical technique, involving a comprehensive study of low molecular weight biomolecules, commonly known as metabolites. Biochemical analysis of biological samples (such as urine or blood) provides a large comprehensive list of metabolites. The data are analysed by means of bioinformatics, a field of science incorporating statistical multivariate analysis, providing information used to determine/distinguish any potential irregularities within the metabolic profile. The focus is the identification of very specific metabolites that can statistically discriminate between normal and abnormal metabolic situations. These metabolites are known as biomarkers.

PURPOSE OF THE EXPERIMENT
This experiment constitutes part of the investigator's M.Sc. thesis involving the study into the metabolic perturbations associated with alcohol consumption. This experiment is aimed at determining the changes in the metabolite profile that occur after the consumption of an acute alcohol dose and what, if any, effect the administration of NAD together with the alcohol has to prevent these metabolic changes. To minimize variation, a homogeneous, defined experimental group, namely young males between the ages of 20 and 30 years in an overnight fasted state, will be used. The biological specimens used in this experiment will be a blood sample taken before the start of the experiment, as well as urine samples taken at defined intervals of time, followed by a metabolomics analysis.

PAYMENT OR REIMBURSEMENT
Participants will not be paid for their participation and do not contribute to the costs of the study. An amount of R100 will however be paid to each of the participants for travel expenses and other inconveniences that resulted from participation in the study as well as R50 for the light meal of choice after completion of the experiment.

CONFIDENTIALITY
All research records are confidential unless the law requires disclosure. No name or other personal identifying information of the participants will be used in any reports or publications resulting from this study. Data from this study will be used in an anonymous statistical analysis and reported as such by the NWU. No patient's identification details will be reported or made known to other parties.

VOLUNTARY PARTICIPATION AND CONDITIONS OF WITHDRAWAL
Your participation in this study is completely voluntary. You may choose not to participate in this study to which you are otherwise entitled.

CONSENT
I, ______________________________________, have read and understood the preceding information describing this research study and my questions have been answered to my satisfaction. I voluntarily consent to participate in this research study. I do not waive my legal rights by signing this consent form. I will receive a signed and dated copy of this consent form.