Differentiating cancer types using a urine test for volatile organic compounds

Background. In the human body, volatile organic compounds (VOCs) are produced by different tissues then secreted in different body fluids and subsequently excreted. Here we explore a non-invasive method for the detection of liver, prostate and bladder cancers. Methods. We recruited 140 cases. There were 31 hepatocellular carcinomas (HCC), 62 prostate carcinomas, 29 bladder carcinomas and 18 non-cancer cases. Male to female ratio was 5:1 and mean age was 72 years. Urinary VOCs were detected by applying solid-phase microextraction (SPME) technique. Results. The sensitivity for detection of HCC with normal alpha fetoprotein (AFP) was 68% (SE 0.06, 95% CI 0.54 to 0.81 and P < 0.005). The VOCs sensitivity in the detection of HCC cases with raised AFP was 83%. (SE 0.05, 95% CI 0.73 to 0.93 and P < 0.0001). The VOCs sensitivity for prostate cancer detection was 70% (SE 0.049, 95% CI 0.60 to 0.79 and P < 0.0002) and sensitivity for bladder cancer detection was 81% (SE 0.052, 95% CI 0.70 to 0.91 and P < 0.0001). Conclusions. SPME urinary VOCs analysis was able to differentiate between controls and each of hepatocellular, prostate and bladder cancers. This suggests that urinary VOCs are cancer specific and could potentially be used as a diagnostic method.


Introduction
Cancer is the second leading cause of death worldwide and is responsible for an estimated 9.6 million deaths in 2018. Globally, about one out of six deaths is due to cancer. Approximately 70% of cancer deaths occur in low to middle income countries. Deaths from cancer are projected to continue rising with an estimated 13.1 million deaths in 2030 worldwide [1][2][3][4][5][6]. Hepatocellular, bladder and prostate cancers are very common.
The burden from cancer needs to be reduced, this can be achieved by timely and accurate detection to enable high chance of treatment and subsequent cure [7][8][9]. However, diagnosis of cancer remains difficult. Non-invasive radiological scans, for example computed tomography (CT), magnetic resonance (MR) and positron emission tomography (PET) can identify the site of the cancer, but these are expensive and not easily available or accessible by all the patients. Moreover, they are also not fully diagnostic because they usually require confirmation by a tissue sample. Obtaining histology of the affected organ carries a higher complication risk as it often requires invasive surgical or endoscopic techniques [10]. It is for these reasons that alternative noninvasive, low cost modalities are urgently required for cancer diagnosis.
In this study, we explored a simple non-invasive method by analysing the patterns of the urinary volatile organic compounds (VOCs) in cancer patients. Urine is an important biological medium and can be obtained easily from cancer patients. In the human body, metabolites that are potentially VOCs, permeate from the cell membranes, which primarily consist of phospholipids, carbohydrates, and proteins. The cell membranes of different cell types are distinct and different. On this basis, we know that cancer  cell formation is accompanied by gene and/or protein changes, which lead to oxidative stress, peroxidation of the cell membranes and production of different chemicals within the cells. This intrinsic switch within a cancer cell then alters its metabolic profile and following this cellular damage, the VOCs are released from the tissue and find their way into the systemic circulation due to the vascular nature of most cancers. When cancer metabolites appear in the systemic circulation, they reach the kidneys and following filtration, the cancer metabolic by-products are eventually then excreted in the urine. The VOCs have distinctive bio-signatures or smell prints and hence their detection can lead to the differentiation between normal and malignant tissues [11][12][13][14][15]. The metabolomics profile describes the health and disease status of an individual and this approach is becoming an attractive tool for the development of precision medicine. A summary of methods for VOC detection was discussed by Arasaradnam et al [16].

Material and methods
We conducted a prospective recruitment for the study.  obtained from all participants prior to enrolling into the study. Patients were recruited before receiving any form of cancer treatment for example chemo/radiotherapy or surgery. Patients who are less than 18 years old or pregnant were excluded from the study 140 cases were recruited to the study from University Hospital Coventry and Warwickshire between October 2014 and June 2019. There were 31 hepatocellular carcinoma (HCC) cases, 62 prostate carcinoma cases, 29 urinary bladder carcinoma cases and 18 non cancer cases. Male to female ratio was 5:1 and mean age was 72 years (range 42-94). All cancer patients were diagnosed by conventional clinical methods, including various imaging techniques, to locate the tumour and/or biopsy for final diagnosis. Controls included patients that were suspected of cancer but had negative investigations (table 1).
Following collection of 5 ml of urine from the participants, the samples were left to freeze within 2 h to −80 • C. Prior to analysis, samples were left to thaw in a water bath at 23 • C for 1 h. Urine was then placed into a 50 ml Falcon conical centrifuge tube with a modified cap which had two slots to allow two sensor tabs to be inserted. These tabs absorbed VOCs from the sample head space for a period of 300 s (figure 1).
VOCs were analysed by applying solid-phase microextraction (SPME) technique to the urine samples. This is a technique where special absorptive or adsorptive 'sorptive' materials are used to extract analytes of interest by capturing them in their matrices. The sorptive materials are then heated to a suitable temperature to cause the analyte to desorb from the coating to be analysed. This is established as a means for preconcentrating samples for analysis by gas chromatography where the SPME matrix is in the form of a coating on a fibre that is introduced into the injection port [17]. This yields information about the identification and concentration of volatile compounds and is applicable for both gas and liquid samples.
Here, a new variation of the technique was utilized, developed by SensAm Ltd, UK. Unlike a gas chromatograph, that is used for separating chemical components from a mixture and then applying them to an appropriate detector, the strategy was to capture and preconcentrate volatiles from urine headspace on to a flat tab that contains a proprietary semiconducting polymer (figures 2(a)-(b)). This is then desorbed on to detector that comprises an array of metal oxide gas sensors that generates a pattern of data that can be used to discriminate complex mixtures without separation into individual chemical components, which forms the basis of 'electronic nose' technology [18]. These polymer tabs are single use devices that have been pre-activated by heating during manufacture, do not require regeneration and thus do not suffer from recovery and carry-over problems that are found with SPME fibres used for gas chromatography.
The commercial metal oxide sensors (MOS), used as detectors in the system, are based on tin oxide that are doped with various metals that confer some selectivity to various chemical species. When heated to a temperature of about 300 • C, they function as semiconductors. Atmospheric oxygen residing on the MOS surface is reduced by the target gases, allowing more electrons in the conduction band of the metal oxide material. They exhibit a change in electrical resistance of the metal oxide due to adsorption of gases or vapours, and this is proportional to the concentration adsorbed. Using a range of sensors with different chemical selectivities, incorporated into an array, allows a pattern of information to be generated that can be processed by multivariate techniques to allow discrimination of complex mixtures of volatile chemicals [18,19].

Data processing
When the sensor tabs were inserted in to the SensAm device (figure 2(b)) for VOCs analysis, they were then automatically heated in the device to 120 • C. Responses from the array of gas sensors to the desorbed vapours were captured over a period of 180 s, digitized and stored. A schematic diagram of the data processing steps adopted is shown in (figure 3).

Feature selection and data analysis
The portion of the gas sensor response, corresponding to the detection of desorbed volatiles, was extracted and averaged to produce n = 5 samples straddling the response profile. This was normalised to minimise concentration effects. The autoscaling method was according to equation 1.
Where A is the feature matrix for n samples over p sensors, A ij is the ith sample of the jth sensor, A j contains all n samples for sensor j, and A i contains all p samples for the sensors at the ith sample. This constituted a pattern of relative responses of the entire sensor array to a particular analyte sample.
For each patient sample, databases of the patterns were created, and these formed the basis for further data processing. To visualise the data from all the samples that were processed, the method of principal components analysis (PCA) was used [20].
PCA visualises the data while retaining as much of the variance as possible. This is by mathematical orthogonal data linear transformation to principal components. The greatest variance is explained by first principal component, the second greatest variance is explained by the second principal component and so on. PCA makes no assumptions to which classes the data belong and to discriminate between different classes of data a neural network based on radial basis functions was employed because of speed and robustness of performance in many practical applications [21,22].
Radial basis function networks (RBFN) are a variant of three-layer feedforward neural networks. They contain an input layer, a hidden layer and an output layer where the transfer function in the hidden layer is called a radial basis function (RBF). To each individual pattern in the databases a class name was assigned-these were Normal, Prostate, Bladder and Liver. The neural network was trained against 50% of the samples and the rest were used for testing the prediction accuracy of the neural network against previously unseen samples. The output nodes of the neural network were scaled to provide an output scaled between 0 and 1 representing the probability of an input pattern belonging to a certain class. To generate receiver operating characteristics (ROC) curves, the neural network was trained on a series of test cases versus controls, and ROC curves were generated for each of the test cases from the outputs of the neural network when tested against previously unseen patterns, using established algorithms [23,24]. These produce reports on Sensitivity (the ability of the test to correctly identify those patients with the disease), Specificity (the ability of the test to correctly identify those patients without the disease), the Positive Predictive Value-how likely it is that the patient has the disease if the test is positive (this is also known as Precision), and Negative Predictive Value-how likely is it that the patient does not have the disease if the test result is negative (also known as Recall). These are used to produce a F-score which is a measure of a test's accuracy calculated from the precision and recall of the test, where the precision is the number of correctly identified positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of correctly identified positive results divided by the number of all samples that should have been identified as positive [25]. The neural network was also tested to discriminate all cancers individually against controls as well as to discriminate each type of cancer.

Results and discussion
The urine test was able to clearly differentiate between controls and each cancer group as shown in the PCA plots (figures 4, 5 and 6).
For HCC, the sensitivity of the conventional alpha fetoprotein (AFP) alone in our study for detection of HCC was 54.8% with raised result (AFP cut off applied >10 ku l −1 ) in only 17 cases. When comparing the urine test to AFP, it showed good discrimination in diagnosis of HCC ( figure 7). The urine test sensitivity in detection of HCC cases with raised AFP was 83%. (ROC Curve Area was 0.83, SE 0.05, 95% CI 0.73 to 0.93 and P < 0.0001). The sensitivity for detection of HCC with normal AFP was 68% (ROC Curve Area was 0.68, SE 0.06, 95% CI 0.54 to 0.81 and P < 0.005). All prostate cancer cases had raised prostate specific antigen (PSA) of ≥4 ng ml −1 and were proven histologically. The VOCs also showed good sensitivity in detection of prostate carcinoma (figure 8). The sensitivity was of 70% (ROC Curve Area of 0.70, SE 0.04, 95% CI 0.60 to 0.79 and P < 0.0002). The VOCs sensitivity in detection of bladder carcinoma was 81% (ROC Curve Area of 0.81, SE 0.05, 95% CI 0.70 to 0.91 and P < 0.0001) (figure 9). Box plots were formulated to compare the relative response of each of the eight metal sensors in the cancer groups and controls. Visually, sensor 4 and sensor 8 were the two sensors with the least relative responses in all groups. For HCC, sensor 5 was predominantly responsive. However, we did not identify any unique relative response between different cancers or controls. Relative responses from the eight sensors are shown in (figure 10).
Using the data-processing flow shown in figure 3, table 2 shows the results obtained from training a radial basis function neural network against the imbalanced data sets available (table 1) to investigate whether these cancers could be discriminated from each other on the basis of the urine VOCs that were captured. Unlike an ROC curve which is essentially a binary classifier, this method allowed us to address a multi-class problem to discriminate between the different cancers tested in this study. Testing the trained network against previously unseen samples, showed that the average potential accuracy for this could be as great as 93.7%, based on the limited number of samples available. These results should be taken as cautiously indicative at this time as the neural network could be overtrained and would require a population study with a large cohort of samples for verification.
The methodology in our study is unique in detecting relative responses of metabolic by-products present in urine of different cancer patients. Our study identified that use of VOCs from urine in diagnosis of these cancers has potential and that it can be cancer specific. Other studies demonstrated the presence of VOCs in exhaled breath of different cancer patients and were also able to differentiate between different histological cell lines [26][27][28]. This points to the role of different VOCs in carcinogenesis, which is based on the hypothesis, that metabolic by-products of cancer cells are different [29,30]. An in vitro study has identified the presence of VOCs in the head space from HCC tissue [31]. We took this further by demonstrating the presence of VOCs in the urine of HCC patients and also by the potential use of VOCs in HCC detection. Our centre uses a common immunoassay for AFP detection with the lower range of detection being up to 10 ku l −1 . It is accepted that values >10 ku l −1 are considered raised and suspicious of HCC. Our HCC group had 17 patients with raised AFP, leaving the sensitivity of AFP in this group for HCC detection to be 54.84% (95% CI 36.03% to 72.68%). Urinary VOCs detection for HCC in patients with normal AFP alone was better at 68% and in those with a raised AFP was 83%. This indicates that the pattern of urinary VOCs holds promise for HCC detection given that AFP is not recommended for routine use in HCC diagnosis/surveillance [32,33].
The currently used biomarker for detection of prostate cancer is PSA. It is a protein that is produced by normal prostate cells. PSA is specific to the prostate, but not the prostate cancer, and it can be increased in prostatic infections and benign prostatic hyperplasia [34,35]. PSA is used as screening method for prostate cancer because rising levels are associated with prostate carcinoma. PSA cut off of 4 ng ml −1 has a sensitivity of 67.5%-80%, which implies that 20%-30% of cancers are missed when only the PSA level is obtained. This also could result in unnecessary biopsies from patients; hence a lower cut off of 3 ng ml −1 is now needed for a biopsy [36]. We identified that the urinary VOCs were able to demonstrate sensitivity of 70%. A recent study showed that there were up to 282 VOCs in prostate carcinoma patients and utilising a prediction model relying on 11 VOCs the sensitivity was of 96% and specificity of 80% [37].
Another study showed a diagnostic accuracy of 65% and this result was based on a model of four VOCs [38]. The reported differences in sensitivities can be explained by utility of different technologies and prediction models applied. Our findings add to the literature that VOCs in the urine can effectively be used in diagnosis of prostate cancer in patients.
The third group we looked into were urinary bladder cancer patients. Bladder cancer has many histological subtypes including transitional cell carcinomas (90%), squamous cell carcinomas (5%), and adenocarcinomas (less than 2%) [39], hence it is heterogeneous in nature. We found that the VOCs sensitivity in detection of bladder cancer cases was of 81%. Our results were comparable to two studies that found relation between bladder cancer and urinary VOCs to have 70% overall accuracy (70% sensitivity and 70% specificity) and 100% sensitivity and 94.6% specificity respectively [40,41].
On close visual observation of the eight relative responses by the MOS in each cancer group (figure 10), we note that the values of the relative responses are normally distributed and spread out. We were limited in part from identifying exact chemical composition in the study. However, the relative response from each sensor is indicative that these cancers could be differentiated because mixture of chemicals is different between each cancer group and controls. The use of mass spectrometer could help in identifying these chemicals.
Most VOCs studies were done on patients with good case selections. Therefore, VOCs analysis and exploration will require being prospective in patients who are at risk of developing cancer over a period of time.
In most studies, VOCs analysis was performed on breath, frozen urine and/or frozen faecal samples. A study has shown that urine VOCs appear stable over time when stored-even up to four years when frozen [42]. However, the sampling conditions of biological specimens need standardizing. For example, a recently published study by McFarlane et al [43] recommended a 12-h maximum duration at room temperature prior to storage of the biological sample. Therefore, interested VOCs research groups should form a consensus around biological specimen conditions.
Benefits of VOCs detection in the urine represent their ease to be produced by patients on demand and that it could provide real time check of metabolic activity. VOCs may also be affected by diet and medications and future studies should address this relationship further. Our study suggests that the future is bright for VOCs analysis in cancer diagnostics and perhaps even monitoring in response to treatment.

Conclusion
Urinary VOCs analysis in diagnosis of cancers has future potential as a diagnostic method particularly as it appears to be cancer specific. Naturally, this will require further validation in population-based cohorts as well as correlation with existing clinical markers. There is also potential to track response to treatment through non-invasive means. Characterisation of the precise metabolic by-products in these cancers will shed light on the biology of such cancers but unlikely to improve its diagnostic potential.

Supplementary materials
All data is available in this manuscript.

Author contributions
AB was involved in data curation patient recruitment, participated in sample analysis and writing the original report. FK and KP were involved in sample and data analysis and reviewing and editing of the manuscript. RA was involved in conceptualization and supervised the project. RA acts as a guarantor for the study.

Acknowledgments
We acknowledge the help of the consultant staff in the urology department of University Hospital Coventry and Warwickshire in identifying cases. We also thank Sister Davina Hewitt and Sister Samantha Hyndman for their help of recruitment of bladder and prostate cancer cases. We also acknowledge Sean James (head of Arden tissue bank) and Parmjit Dahaley and Andrew White (tissue bank biomedical assistants) for their help in storage and transfer of samples.

Funding
AB was supported partly by the Medical and Life Sciences Research Fund (MLSRF) which is a registered UK independent charity. MLSRF had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.

Conflicts of interest
The authors declare no conflict of interest.

Ethical statement
Ethical approval was granted by Coventry and Warwickshire Research Ethics Committee, UK as part of the FAMISHED (Food and Fermentation using Metagenomics in Health and Disease) multicentre study (09/H1211/38). The study protocol conformed to ethical guidelines of the 1975 Declaration of Helsinki as reflected by the institution's human research committee. Written informed consent was obtained from all participants prior to enrolling into the study.