A benchmarking protocol for breath analysis: the peppermint experiment

Sampling of volatile organic compounds (VOCs) has shown promise for detection of a range of diseases but results have proved hard to replicate due to a lack of standardization. In this work we introduce the ‘Peppermint Initiative’. The initiative seeks to disseminate a standardized experiment that allows comparison of breath sampling and data analysis methods. Further, it seeks to share a set of benchmark values for the measurement of VOCs in breath. Pilot data are presented to illustrate the standardized approach to the interpretation of results obtained from the Peppermint experiment. This pilot study was conducted to determine the washout profile of peppermint compounds in breath, identify appropriate sampling time points, and formalise the data analysis. Five and ten participants were recruited to undertake a standardized intervention by ingesting a peppermint oil capsule that engenders a predictable and controlled change in the VOC profile in exhaled breath. After collecting a pre-ingestion breath sample, five further samples are taken at 2, 4, 6, 8, and 10 h after ingestion. Samples were analysed using ion mobility spectrometry coupled to multi-capillary column and thermal desorption gas chromatography mass spectrometry. A regression analysis of the washout data was used to determine sampling times for the final peppermint protocol, and the time for the compound measurement to return to baseline levels was selected as a benchmark value. A measure of the quality of the data generated from a given technique is proposed by comparing data fidelity. This study protocol has been used for all subsequent measurements by the Peppermint Consortium (16 partners from seven countries). So far 1200 breath samples from 200 participants using a range of sampling and analytical techniques have been collected. The data from the consortium will be disseminated in subsequent technical notes focussing on results from individual platforms.


Introduction
The analysis of volatile organic compounds (VOCs) in exhaled breath offers a non-invasive method for the discovery of prospective biomarkers with the potential for developing clinical and research applications. Exhaled breath contains several hundred VOCs [1], and the complete panel of endogenously-derived breath VOCs has yet to be described. Nevertheless, breath biomarkers have been proposed for a range of diseases [2].
Breath research encompasses a wide range of sampling and analytical techniques. A diversity of approaches, and a variety of targeted diseases, has made comparison and assimilation of research outcomes and data difficult, and this likely contributes to the current lack of replication of research findings [3][4][5]. Agendas for standardizing different methods (sampling, analytics, reporting) and benchmarking have been described for specific diseases [6]. A list of the many factors that need addressing in standardization includes: participant/patient preparation; chemistry of the materials used in the construction of the sample pathway; diurnal cycle considerations; sample size/sampling dynamics (volume, flow, duration), breathing patterns and manoeuvres; management of exogenous confounders (environmental, lifestyle, medication/drug intake, dietary); sampling environment (temperature, pressure, relative humidity); sample storage/stability; quality assurance and quality control factors; as well as data processing and archiving protocols, amongst others. Standardization efforts have been undertaken elsewhere, such as in the metabolomics field, where the Metabolomics Standards Initiative (MSI) provides comprehensive guidelines on how samples should be taken, stored, and analysed while also setting minimum reporting standards for results [7,8]. While recent appraisals of the success of the MSI suggest that adherence to the minimum reporting standards is lower than expected, and although comparative datastewardship from the breath research community is in its infancy, the MSI provides a framework and approach to follow.
The Peppermint Initiative was established within the Sampling and Standardization focus group of the International Association of Breath Research (IABR) [9], and aims to propose and recommend a set of benchmark values determined through a collaborative and standardized Peppermint breath experiment. The experimental protocol must be straightforward to ensure that it can be followed by any researcher or practitioner performing breath VOC experiments using their own methodologies. The Peppermint breath experiment was developed from preparatory studies undertaken by members of the Peppermint Initiative whereby the pharmacokinetic washout of terpenoid compounds in breath after an intervention was studied [10][11][12]. The Peppermint experiment concept is based on the development of a transient and well-characterized perturbation in the VOC breath profile of a participant following ingestion of a food supplement capsule containing peppermint oil [13]. Despite the washout profiles of the peppermint oil constituents varying between participants, we hypothesize that statistical analyses of these profiles provides useful information about the reproducibility (precision) and analytical method sensitivity (limit of detection) of the sampling and analysis methods employed. Further, examination of the fidelity of the analytical features (isolated components, analytical peaks) will provide useful comparative evaluation of the methodology and approach being tested. This introduction to the Peppermint Initiative describes the Peppermint breath experiment, reviews the formation and working practices used by the partners in the initiative, and presents pilot data to illustrate the concepts and intended outcomes. A series of companion publications will report the benchmarking results for a range of breath analysis techniques that currently include gas chromatography mass spectrometry (GC-MS) and comprehensive GC × GC-MS, gas chromatography ion mobility spectrometry (GC-IMS), proton transfer reaction mass spectrometry (PTR-MS) and selected ion flow tube mass spectrometry (SIFT-MS). On completion of the first round of Peppermint experiments a synoptic evaluation and examination of the outcomes will also be presented that is anticipated to inform, amongst other things, power calculations for study designs and a deeper examination of the pharmacokinetics of the peppermint washout process in humans.
It is not anticipated that this current Peppermint Initiative will result in any standardized sampling or analysis protocols for breath in the immediate future, especially as different applications and techniques will require tailored sampling parameters and protocols; equally, the Peppermint Initiative was not established to proscribe or advise on sampling or analytical protocols. The Peppermint Initiative was formed to provide a benchmark for assessing the performance of breath sampling/analytical techniques, offer a broad comparative performance assessment of the current breath-analytical landscape, and encourage international, cross-platform and interdisciplinary collaboration 1 to take the first steps in tackling the issue of standardization in breath research.

Pilot study and definition of the Peppermint experimental protocol
Two experiments were carried out to identify sampling time points and compounds of interest for the Peppermint experiment. In the first, ten participants were recruited to establish the scale and nature of the changes in exhaled breath caused by ingestion of a peppermint capsule.
A 200 mg peppermint oil food supplement capsule (product no. 10115320, Boots UK Ltd, UK) with 100-150 cm 3 of water was ingested by participants, at time = 0 h, and breath samples were collected at −0.5, 2, 4, 6, 8 and 10 h. Figure 1 is an example of how the level of a participant's exhaled menthone (one of the main constituents of the peppermint oil capsule) changed following ingestion of a peppermint capsule.
For the second study, five participants were recruited and the same sampling time points and peppermint capsules were used. Breath samples from the participants were collected (30 samples in total) and subsequent sample analysis was performed using a commercial standalone ion mobility spectrometer 1 The Peppermint Initiative is an ongoing initiative that encourages participation by research laboratories working in breath analysis. New members are welcome; interest in participation can be directed to the corresponding author of this article. coupled to a multi-capillary column (MCC-IMS, Bio-Scout BD model, BS Analytics GmbH, Germany) with an integrated spirometric direct breath sampler (SpiroScout, Ganshorn Germany). The experimental parameters are summarised in supplemental table S2.
The protocol development work summarised here was undertaken in accordance with the Helsinki Declaration and was approved by Loughborough University Independent Ethics Committee (Ethics No: G09-P5). All staff were trained, and proficiency tested for breath analysis prior to the start of participant recruitment.

Identification of peppermint oil related compounds in breath
In the preparatory TD-GC-MS analysis of breath, eight compounds were attributed to ingestion of a peppermint oil capsule, see table 1. All of these were observed in breath for at least 6 h after ingestion of a peppermint capsule. All of these compounds, except p-Menthadien-7-ol and Dehydro-1,8cineole, are peppermint oil constituents identified with GC-MS headspace analysis in previous work carried out by consortium members, where over 20 compounds were reported with their relative concentration levels [13].

Data analysis, modelling, and data fidelity
Some participants reported mild indigestion-related discomfort from swallowing a peppermint oil capsule, which passed within 1 h after capsule ingestion. No participant asked to withdraw from the study. Figure 2 presents the peppermint oil washout profiles (fold change, compared to the reference sample, over time) of menthone as determined using an MCC-IMS with online breath sampling during the pilot study. The absolute levels of the washout profiles varied between participants, as demonstrated by the high relative standard deviation (% RSD), but the kinetics of the washouts were similar. The different profiles may also reflect differences in metabolism. Factors such as body mass index (BMI; kg m −2 ), age, and diet were collected to explore possible sources of variability in the peppermint washout profiles. Presenting data as a fold change (I/I 0 ) enables data sets from different platforms to be compared in a straightforward way.
Washout profiles were modelled with a power relationship: where I-intensity (counts), t-time (h), and B coefficients transformed to a linear form, as follows: The descriptive statistics of the washout profiles provided useful information about the precision and  performance of the analytical platform. A plot of the fold-change log 10 (I/I 0 ) vs log 10 (t) of the washout of peppermint oil VOCs, in this case menthone, yielded a linear plot with R 2 values between 0.979 and 0.997 ( figure 3). The x-axis intercept indicates the estimated time to return to initial concentrations. The curves also provide information on the sensitivity of a method (or limit of detection).  Extrapolation of the power relationships in the pilot data shown in figure 3 provides a prediction of how long it will take menthone in breath to return to its initial (pre-ingestion, reference) concentration. (Note that these data are illustrative and not intended to be used as benchmarks; the benchmarks require the consolidated data from the full-scale peppermint studies with larger numbers of participants.) For the pilot data plotted in figure 3 the mean (±99% confidence level) washout time was determined to be 39.3 (±18.6) h, as indicated by the solid horizontal bar and arrow. Figure 2 highlights the biological variability between the participants with different individual uptake and elimination rates. Combining the data from all the peppermint studies will provide a larger sample size to enable a detailed examination of the nature of this variability. This will be addressed in a future synoptic paper.
It is important to note that variability in an individual's uptake and elimination of peppermint volatiles also extends to how long it takes for the exhaled concentration of peppermint volatiles to reach its maximum. Some participants will have a washout profile that results in less than four data points obtained after the maximum exhaled peppermint concentrations. Such profiles do not yield enough data to generate a reliable washout model, and should be excluded from benchmarking evaluation. Consequently, it appears that a precautionary approach of standardizing ranges of the BMI, age, and diet in the participant cohort may reduce the effect of variability in the participants' peppermint uptake and washout behaviour on the benchmarking evaluation. The influence of these factors on the washout of ingested volatiles is not well defined, and this aspect of the Peppermint Initiative will be studied and disseminated once all the benchmark datasets become available and have been analysed; see the synoptic paper comment above.
An example of how potential differences in data from different laboratories may be evaluated rapidly is also included in figure 3 with a dataset of a single washout curve from an alternative instrument, with a different sampling approach, obtained at a different laboratory (labelled 'F'). The data from experiment F is clearly different to the other pilot data. However, a complete Peppermint experiment dataset would be needed from the analytical system used for the F data to provide sufficient data on variability before an informed comparative evaluation could be reached. However, at face-value, the differences in datasets between the two techniques is sufficient to justify proceeding with caution when combining data from the two systems and running a full Peppermint experiment to establish a more rigorous comparison.
While the washout data (figures 2 and 3) reveal much about the precision and overall sensitivity of a technique towards a targeted compound, they do not disclose the complete range of attributes of all the data obtained from the study. The washout data tracks a peppermint marker but discloses little else about the sample being analysed. Further information about the characteristics of all the data acquired during the benchmark study has significant utility in evaluating the overall attributes of the method being used. The approach proposed below can be especially useful for the breathomics studies, where sensitivity, resolution, and richness of the information contained within the samples are crucial aspects for marker discoveries. Figure 4 is an example of how the cumulative frequency distribution and intensity distributions of all analytical features may be combined usefully; obtained from an MCC-IMS method run on a participant's sample taken at 60 min, when the exhaled abundances of peppermint markers were at their highest. The nature of the 'breathome' recorded may be discerned by combining the distribution of the intensity of the features with the cumulative frequency distribution against retention time. Highlighting the peppermint washout markers, α-pinene, an unidentified feature, and eucalyptol (indicated with solid markers with increasing retention times, respectively), along with their extracted signals, places the peppermint washout data in the broader context of the individual's breathome for this sample. Consolidating all the data from a peppermint experiment will provide a systematic overview of the performance of the methodology under controlled conditions with a panel of peppermint volatiles providing a frame of reference.
Experiment F was a comparative peppermint breath experiment undertaken with different instrumentation and a different sampling method to the pilot data. The difference between the observed washout in experiment F and the pilot data illustrates why multiple peppermint washout experiments are needed, and how data from this test may be used to compare the analytical performance of different breath analysis techniques, methods, and studies.
Combining and summarising the cumulative frequency and feature intensity distributions with the washout data from a peppermint experiment provides an overview of the performance of a technique for breath analysis and enables inter-/intralaboratory differences in the fidelity of breath data to be ascertained. Ultimately, such data enable inter-technique comparisons based on a common approach to monitoring a controlled intervention, which considers the inherent biological variability present in all breath data.

Peppermint Initiative and final experimental design
he results from the pilot work led to the final protocol design and the formulation of the data processing and presentation approaches needed to compare different data sets in a systematic way, enabling the production of consistent benchmark values.

Peppermint Initiative description
All partners in the Peppermint Initiative received institutional ethics board approvals to undertake these studies. Each contributing group obtained approval from their local ethical review board to participate in this study. Written informed consent was obtained from all volunteer participants. All study protocols complied with the Declaration of Helsinki.
The 16 participating research groups and a summary of the initial techniques used for the first phase of the Peppermint study are listed in table 2.
The Peppermint Initiative consortium is coordinated by Loughborough University, with an oversight team additionally comprising Radboud University, Fraunhofer IVV, and the University of Manchester. Regular (teleconference) meetings between the consortium partners have been held to coordinate the organization, management and operation of the study, data curation methods, data management, dissemination of results, publication strategy, and recruitment of new participating laboratories.

Study documentation.
A description of the peppermint breath experiment was provided to each participating group; DESCRIPTION_OF_THE_BENCHMARK_STUDY _PEPPERMINT_FOR_THE_IABR TASK_FORCE (appendix A). The document contained a detailed description of all the relevant information regarding participation in the benchmark study, including the aims of the Peppermint Initiative, the sampling protocol, as well as data curation and management plans.
To assist each institute with the ethical approval, an example research proposal (appendix B) was shared with each group.
A participant information (appendix C) sheet was given to each volunteer prior to participation in the peppermint breath experiment. The document contained information about the aims and requirements of the study.
A participant informed consent form (appendix D) was provided together with the participant information sheet, and volunteers were required to sign this before taking part in the study to confirm their informed and written consent.
A participant questionnaire (appendix E) was used to collect metadata on the participant, including height, weight, age, sex, and food intake on the day of the experiment.
All documents (excluding the consortium agreement) were supplied as templates, with highlighted sections in each document to be customized for each institute. In addition, each partner was permitted to further adapt these documents to best suit the needs of their institution. Copies of these documents are provided as supplementary materials to this paper (appendices A-E).

Methodology
Thirty-six bottles containing 60 × 200 mg peppermint oil capsules, obtained from Boots UK Ltd. (product number: 10 115 320) were acquired from the same production batch (batch no. 200 207) and distributed to the participating groups of the Peppermint Initiative.
To account for inter-participant variability a study group of ten participants is recommended. Participants are requested to exclude peppermint and peppermint-associated products from their diet and personal care routines for 24 h prior to participation in the study and until completion of the experiment.
Each participant provides a reference breath sample for analysis and then ingests a peppermint capsule washed down with 100 to 150 cm 3 of tap water (t = 0 min). It is important that rigorous time-management is maintained, and that the peppermint capsule is taken within 30 min of the reference sample.
Each participant provides five additional breath samples at 60 min, 90 min, 165 min, 285 min, and 360 min post-ingestion.
In addition to the breath samples, at least one environmental air sample should be collected after one of the sampling points (selected at random) for each participant while both researcher and participant were still present in the room. An air supply/instrument blank sample should also be taken after completion of a sampling series (i.e. after 360 min).
This protocol was based on a 1D centralcomposite experimental design applied to a representative washout profile, obtained from a preparatory investigation; see below (figure 1).
Evaluation of the washout data of the peppermint oil constituents combined with a global summary of the VOC features isolated under these conditions provides a standardized test of a methodology, and enables comparison against benchmark data.

Data management
The Peppermint Initiative relies on data evaluation and sharing. Within this study, each participating team acquired different types of datasets that included: raw data; processed data; summary results; informed consent forms; and participant metadata. Each group was provided with a secure file-store in a cloud repository by the study coordinator (see table 3). No personal information was shared or stored to ensure anonymity of participants. All data were assigned a study code by the study coordinator and anonymized before release. Consolidated data will be available in * .csv file format for wider use upon request to the corresponding author once the research results have been published.

Summary of the Peppermint Initiative
The Peppermint Initiative aims to establish a benchmark to ascertain performance of breath sampling and analytical methods, and to survey comparative performances of current breath research across the community. The pilot study presented here enabled the preparation and development of the methodology for the Peppermint experiment. Further, these initial trials highlighted the challenges associated with creating a benchmarking method and accompanying data work-up for breath analysis. Comparing different sampling approaches and analytical techniques requires sufficient sample numbers to account for both technical and biological variability. Alignment of data processing as well as quality assurance and control approaches across the Peppermint Initiative consortium are essential.
To date, 16 teams have provided data from 200 individual volunteer participants. Two hundred sets of six samples of comparative intra-subject variability profiles, 200 sets of comparative inter-subject variation, and 200 ambient air VOC profiles provide an objective multi-centre verified review of comparative analytical characteristics of different techniques and sampling approaches.
Moreover, this initial dataset and the accompanying protocol provides a benchmark standard  Table 3. An overview of information and data provided by each participating group.

Governance Data
Participant questionnaires Raw data Signed consortium agreement (file format: instrument-dependent) Signed ethical approval form Processed data ( * .csv file) Signed informed consent forms from participants Sample method data ( * .docx file describing (personal information is redacted and replaced sampling method protocol) with a participant code) Instrument method data ( * .docx file describing instrumental parameters) Quality control/assurance measures for assessments that will support and verify quality assurance and control activities undertaken in the field of breath research.

Conclusions
To achieve the continuous improvement in sampling, analysis, data processing and modelling needed for the development and adoption of breath tests, it is important to ensure quality assurance in breath analysis. In contrast to other approaches in biomarker discovery, a method for reliably pooling breath samples has yet to be described. In addressing this challenge, the ephemeral nature of breath samples needs to be recognized and the Peppermint experiment is an attempt to resolve this difficulty by providing an approach for data sharing and performance evaluation for breath researchers. Adoption of the Peppermint experiment enables informed comparison of analytical quality against a consensus benchmark.
Understanding the cause or causes of differences arising through use of similar technologies is essential in the development and quality of the research and researchers involved. To enable this, careful analysis of the sample as a whole is required. Possible factors driving variability include participant phenotype, participant compliance with the experimental protocol, environmental contamination, experimental procedure, researcher proficiency, instrumentation setup, or data processing, amongst others. Without benchmarking it is difficult to assert a level of reliability in individual datasets or associated findings. If the breath research community is to enhance the confidence in its reported discoveries and commit to the delivery of reproducible findings, continuous effort in verifying and sharing benchmark data will be needed for the foreseeable future.