A study on volatile organic compounds emitted by in-vitro lung cancer cultured cells using gas sensor array and SPME-GCMS

Volatile organic compounds (VOCs) emitted from exhaled breath from human bodies have been proven to be a useful source of information for early lung cancer diagnosis. To date, there are still arguable information on the production and origin of significant VOCs of cancer cells. Thus, this study aims to conduct in-vitro experiments involving related cell lines to verify the capability of VOCs in providing information of the cells. The performances of e-nose technology with different statistical methods to determine the best classifier were conducted and discussed. The gas sensor study has been complemented using solid phase micro-extraction-gas chromatography mass spectrometry. For this purpose, the lung cancer cells (A549 and Calu-3) and control cell lines, breast cancer cell (MCF7) and non-cancerous lung cell (WI38VA13) were cultured in growth medium. This study successfully provided a list of possible volatile organic compounds that can be specific biomarkers for lung cancer, even at the 24th hour of cell growth. Also, the Linear Discriminant Analysis-based One versus All-Support Vector Machine classifier, is able to produce high performance in distinguishing lung cancer from breast cancer cells and normal lung cells. The findings in this work conclude that the specific VOC released from the cancer cells can act as the odour signature and potentially to be used as non-invasive screening of lung cancer using gas array sensor devices.


Background
Cancer is one of the leading causes of mortality among humans worldwide. These phenomena are mainly because cancer commonly detected at a very late stage. The American Cancer Society [1], estimated about 1,685,210 new cases of cancer to be diagnosed and 595,690 cancer related deaths to be reported in the United States in the year 2016. It is also reported that lung cancer (LC) is the second most common cancer affecting men (14%) and women (13%) behind only prostate cancer (21%) and breast cancer (29%) respectively [1]. In Malaysia, LC has been reported to be the second most common cancer affecting men and the third most common cancer affecting females with 2,100 Malaysians diagnosed each year [2].The diagnosis of lung cancer at an early stage, particularly when the tumour is discovered at its local site, has been shown to improve the survival rate of patients [3,4]. Hence it is critical that high risk patients are screened. However, the established and widely used screening techniques, such as chest radiography and cytological examination, often give poor results in detecting small and resectable cancers [5].
Currently, the application of low dose computed tomography (LDCT) as an early stage lung cancer screening technique shows reduction in the number of lung cancer-based deaths [6]. Yet, this method exposed patients to great risk as the high amount of radiation used can lead to several complications [4,7]. Generally, conventional methods are invasive and might delay the therapy if the cancer is found [8,9]. In addition, only selected hospital with the right expertise and facilities can perform such screening tests. Thus, a new screening approach based on the cell biology theory [4] using the analysis of volatile organic compounds (VOCs) linked to lung cancer has been receiving considerable attention from researchers. This new screening technique is noninvasive, reliable and inexpensive [10,11].
The change in metabolic pathways (gene or protein changes) in cancerous cells during tumour growth may lead to peroxidation of the cell membrane and production of certain VOCs [12,13]. These VOCs can be detected directly on the headspace of the cancer cells [8,14], or exhaled breath of cancer patients [10,15,16]. In the case of exhaled breath air, VOCs generated by the cancer cells are released by blood and exchanged through the alveolus in the lung [17]. The potential of detection of VOCs in the breath of lung cancer patients to be used as diagnostic or screening tools have been extensively analysed and studied for several years [18]. However, in order to provide cellular and biochemical origin information of VOCs to clinicians for the decision on the specific treatment for the cancer, the analysis should also be compared with cancer cells (0either in-vivo or in-vitro) [19,20].
Many studies of in-vitro cultured cells as a model system to demonstrate the discrimination between tumour and normal cells using spectrometric technique have been reported [21][22][23][24][25][26][27][28][29][30][31][32]. However, the results are somewhat equivocal and more studies are essential to identify VOC biomarkers of lung cancer [32]. There are only few studies conducted using an array of sensors to distinguish types of lung cancer cells based on in-vitro cultured cell lines samples [8,33,34] as shown in Table 1. These reports show substantial results in term of performance of the sensors. However, the use of the right classification algorithms for e-nose performance with the aid of SPME-GCMS analysis is crucial to strengthen the findings and progress the aim of non-invasively cancer diagnosis [35,36].
In this study, the VOCs signature of the two types of lung cancer cell lines which are A549 and Calu3 will be investigated. The normal lung cell line and the breast cancer cell line are used as control samples to differentiate the lung cancer-related VOCs. As to date, no known reported work investigating VOC patterns released by both lung and breast cancer cultured cell lines under the same conditions, environment and at different growth stages.
This paper presents new results distinguishing the VOCs generated by two types of cancer cell lines, namely lung cancer (A549 and Calu-3) and breast cancer (MCF7), as well as normal lung (WI38VA13) cell lines at different proliferation stages using the Cyranose320 enose device. Also presented are results of five different classifiers for the e-nose to perform the VOCs classification. To the best of author knowledge, this paper also presents a novel work by investigating the use of Naïve Bayes (NB) and One versus All-Support Vector Machine (OVA-SVM) to classify the VOCs emitted by the in-vitro cell lines using e-nose. Table 2 shows the parameters used in this study.
The Cyranose320 is an array of 32 conducting polymer coated carbon black sensor-based e-nose and the pattern of change in the resistance of the sensor array is used to identify smells [37]. This feature can assist to detect even the slightest difference in headspace or complex volatile organic compounds (VOCs) emitted by the exhaled breath [38] or in vitro cultured cells [34,[39][40][41]. The Cyranose320 was used to detect and discriminate the volatiles collected from the different cell lines with the aid of pattern recognition methods.
The VOCs collected were classified using different multiclass classifiers that best utilise the effectiveness of Cyranose 320 in distinguishing the lung cancer cells from control samples. GCMS-SPME analysis also performed for each sample. This pre-concentrated volatile compound extraction method was able to determine the  [34] specific compound emitted by each type of cells. The compounds were identified using NIST library and compared with e-nose data. Thus, the significance of this preliminary results and its support in the application in lung cancer clinical screening are discussed.

Cell culture preparation
Cancerous lung cell lines A549 (ATCC ® CCL-185™) and Calu-3(ATCC® HTB-55™), normal lung cell line WI38VA13 (ATCC® CCL75.1™) and breast cancer cell line MCF7 (ATCC® HTB-22™) were obtained from the American Type Culture Collection and being maintained at the Cell and Tissue Culture Engineering Lab (CTEL), Department of Biotechnology Engineering, IIUM. Table 3 shows the characteristics of the cell lines used in this project. Based on the Table 3, the A549 and Calu3 are representing same histology which is adenocarcinoma but claimed to be from different origin. Thus, the VOCs signature of both A549 and Calu3 will be also covered in this work. The A549, WI38VA13 and MCF7 cells were revived and cultivated in DMEM (Dulbecco's Modified Eagles Medium) supplemented with 10% (v/v) FBS (Fetal Bovine Serum). Meanwhile, the Calu-3 cell line was grown in Eagle's Minimum Essential Medium (EMEM) with 10% (v/v) FBS. The cells were grown in 25cm 2 T-flasks and incubated in a carbon dioxide (CO 2 ) incubator at 37°C/5% CO 2 [22,23,36].
Upon reaching 70-90% confluence, the cells were harvested and then seeded into new flasks with an initial density of 1×10 5 cells/ml in 5ml media for each cell line respectively. The culture condition was as reported in our previous work [39]. The blank mediums, DMEM (without cells) and EMEM (without cells) samples were also triplicates respectively as control samples and incubated together with A549, Calu-3, MCF7 and WI38VA13. Same cell culture preparation and environmental conditions were maintained for both e-nose and SPME-GCMS measurement. The odour samplings were taken after 24 h of incubation using SPME fiber (Divinylbenzene/Carbonexen/Polydimethylsiloxane), while for Cyranose320, the measurement commenced at 24th, 48th and 72nd hours of incubation.

E-nose headspace sampling
The prepared samples in fully sealed T-flasks were placed in the biosafety cabinet. Then the flasks were connected to the inlet of Cyranose 320 for data collection. The sampling setup using e-nose is shown in Fig. 1. Table 4 shows the configuration of the data collection process using Cyranose 320. The baseline purge was set to be at 10 s before data collection. The odour samples were drawn for 180 s to allow it to cover all the 32 sensors. This duration will enable all the sensors inside the Cyranose320 to detect the VOCs in the odour. The sniffing process was set to be repeated for 5 times.

Data analysis
The collected data were then analysed using SPSS 17.0 and MATLAB R2012a to evaluate the e-nose performance. Each individual sample was described by a unique set of measurement known as features. The Cyranose 320 used in the work contains 32 conducting polymer sensors, and hence creates 32 features for each odour sample. Each feature forms a dimension in a space known as feature space. For each sample including the blank mediums, the experiments were replicated 3 times and each sniffing was repeated for 5 times at 24th, 48th and 72th hours respectively. For the e-nose analysis, each sample including blank mediums were replicated into three flasks, with datasets of two flasks used for training and the final one for testing. The sample datasets were divided into two parts and assigned as training and testing sets with a 2:1 ratio  respectively. This study uses 18 different classes for classification purposes (total of six (6) classes multiplied by three varying incubation times). Figure 2 shows an example of five complete cycles of feature space from sensor 12 of the Cyranose 320. Figure 3 shows the block diagram of the summary of data analysis conducted in this study.

Signal pre-processing
The Savitzky-Golay filter was selected to remove noise from the gas sensor signal while preserving the height, width, amplitude and overall profile of the response [37,39]. The datasets were normalized using fractional difference method as in Eq. (1) [42]: Where Ro is the baseline and the R is the steady state of the sensor response to the gas sample of the system. This fractional method helps to reduce the signal drift problem [43]. All data were further normalized using sensor auto scaling global method, scaled to zero mean and standard deviation of one [42,44].

Feature extraction
The consideration of features extraction is essential to point out the discriminating information that would aid the improvement of classification performance [38]. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two commonly used feature extraction techniques [45,46]. In this present study, both techniques were conducted to evaluate the best method for reducing dimensionality by preserving the minimum information about the dataset. Hence the component and discriminants from PCA and LDA respectively were used for class separability visualisation. The PCA provides unbiased projection, which gives better information on the clustering behaviour of each class, while LDA maximizes the intergroup variance and minimizes within group variance. Further, the LDA data was considered as the input for different classifiers. This LDA data able to provide the highest possible discrimination between different classes of data and help to classify the data accurately [47][48][49][50].

Proposed classification algorithms
To date, various classification algorithms are proposed for cancer detection particularly those related to e-nose. In this study, the effectiveness and robustness of e-nose in distinguishing lung cancer cell lines were tested using several classification algorithms namely LDA with fisher criterion, K-Neighbour Neural Network (KNN), Probabilistic Neural Network (PNN), Naïve Bayes (NB) and Multi-class Support Vector Machine (SVM). The statistical significance of all 32 independent sensors was evaluated by comparing the mean score of 18 different groups using the Wilk's Lamda method. A multi-class odour classification model (LDA-based classifier) was later proposed to evaluate the robustness of an e-nose system in classifying cancerous cell samples.
The LDA classification was conducted using leaveone-out approach for the error estimation. The fisher criteria was reported to be able to overcome the nonnormally distributed data [51], hence being employed in this work.
PNN, which is defined as an implementation of Kernel discriminant analysis contains operations, which are organized into multi-layered feed forward network with four layers [52]. Although PNN algorithm required a large memory for training, it requires less training time [52,53]. The spread value (σ) was determined using 10fold cross validation and a value of 0.1 were obtained as appropriate for the dataset with acceptable classification accuracy [54].
On the other hand, KNN classification is known as the simplest classification which uses neighbour characteristics to determine the class of the data samples. This classifier is able to rapidly evaluate the unknown inputs by calculating the distance between a new sample and mean of training data samples in each class weight by their covariance matrices [23]. By considering the theoretical method the best k-value (one; 1) and the distance metric of Euclidean were selected as maximum accuracy obtained using these parameters [24].
Meanwhile, naïve Bayesian (NB) is a simple probabilistic classifier which applies Bayes's theorem with naïve independence assumption. It is known as an efficient and effective classification technique to create models with predictive capabilities [55]as the algorithm does not have several free parameter settings, does not require large amounts of data for training and computationally fast in decision making [56,57]. In this study, the NB classification with normal (Gaussian) was chosen and the prior probabilities for the classes specified to empirical.
Finally, SVM analysis is a linear classifier which is able to find the best separating line between two classes in higher dimensions [58]. However, the SVM can be directly used for binary classes only. For cases with more than two classes, the multi-class SVM can be implemented by dividing the single multiclass problem into multiple binary classification problems. There are three type of multi-class SVM, namely one versus all (OVA), one versus one (OVO) and Direct Acyclic Graph (DAG)-SVM [59]. The OVA based SVM was used in this work to classify the 18 classes. This classification was trained with RBF kernel functions which were obtained from optimization method [60]. Various pairs of box constraint (C) and sigma (σ) were tested for each dataset and the final obtained values were: C: 2 10 and σ: 2 -3 for this dataset.

Performance evaluation
The performance of each of the classifiers are presented using the accuracy (ACC) achieved. This is defined as the percentage (%) of correct classification over the total cases presented. However, since the accuracy alone might not give the best classification performance; sensitivity (SEN), specificity (SPE), precision (PREC) and Matthews Correlation Coefficient (MCC) measurements for each class were calculated to provide more relevant and interpretable information about the results [61,62]. There are a few terms that are commonly used to measure the performance rate, namely, true positive (TP), true negative (TN), false positive (FP) and false positive (FP) [63].
The application of MCC in the multiclass case was originally reported in [64] which was used to measure the classification correlation. The value of MCC varies between -1 and 1 (where 1 is perfect prediction quality, while -1 is in the extreme misclassification of a confusion matrix and 0 specify random correlation) [62,65]. This paper will report the accuracy, sensitivity, specificity, precision and MCC measures as well for all 18 classes for the best results.
Gas chromatography mass spectrometry-solid phase micro extraction (GCMS-SPME) GCMS-SPME headspace sampling The SPME-GCMS was used to identify the headspace VOCs that were released by each type of cultured cell lines (A549, Calu-3, WI38VA13 and MCF7) and blank mediums. Preheated solid phase micro extraction (SPME) was used to collect the VOCs released from the cells. The inner needle, which is the fiber of SPME or known as Divinylbenzene/Carbonexen/Polydimethylsiloxane (DVB/CAR/PDMS), was used in this work. The DVB/CAR/PDMS coated fiber was chosen as it has been optimized to extract a wide range of molecular range of molecular weight of both volatile and semi volatile molecules [66]. The needle was exposed to headspaces of cell cultured in the 25cm 2 T-flask for 15 min as shown in Fig. 4. At the end of the VOCs extraction time, the fiber was immediately inserted into GCMS Agilent 7890 sample point.
The DB-WAX capillary column (30 m x 250 μm x 0. 25 μm) was used with the injector temperature of 250°C to allow desorption of VOCs thermally. The oven temperature was initially set to be 50°C and held for 0.5 min, then ramped 10°C/minutes up to 180°C for 1 min and then again ramped 15°C per minute until it reached 250°C and held for 5 min. The carrier gas Helium flow rate was 1ml/min. The total analysis took 24.17 min to obtain the results. The MS analyses were done in full scan mode (TIC mode) with the scan range between 40 to 200 a.m.u and the electron impact ionization was done at 70eV to separate the compounds [30].

Identification of VOCs
The potential VOCs were only identified by using the spectral match in this study [29,64]. The identity of each compound was determined using the Agilent Chem Station Software by searching on the "NIST" Mass Spectral Library 11 which provides the use of retention time and m/z of VOCs of interest. Each chromatograph was integrated and the peaks were matched and aligned in order to obtain a matrix that contains all peaks found in the whole set of measurements. The peaks or compounds that are missing in other replicate samples were eliminated. In this analysis, peaks less than 80% of the matching percentage to the NIST library (Qualitative) and peak area less than 3000 were excluded [27]. Those peaks identified as arising from column, empty flask and fiber (siloxanes) were excluded in this study [19,29]. The significant differences on the relative abundances of identified VOCS were conducted using the t-test and considered significant at P < 0.05. Table 5 shows a representative result of Wilk's Lambda test of day 1 dataset to show the contribution of variation in the discriminant function (df ). The functions with p-value less than 0.05 (p < 0.05) were chosen, as this corresponds to the ability of the function to discriminate the groups. Figures 5 and 6 show 3D scatter plots to visualize the variability between VOCs of cell lines detected by e-nose using LDA and PCA analysis respectively. Based on Fig. 5, the result shows that the samples of A549, Calu-3, MCF7, WI38VA13 and blank mediums were well separated with 100% discriminant function. The test data samples were matched closely with the distribution of different groups of cell lines in the training data. A significant clustering between lung cancer cell, breast cancer and the control samples was observed. This indicates that the different cell lines are emitting different profile of VOCs and that the e-nose is able to Fig. 4 The GCMS-SPME odour sampling procedure. SPME coated needle was exposed to the headspace of cultured cell. The experiment was conducted in an incubator (37°C/5% CO 2 ) detect these variations. Both of the non-small lung cancer cells, A549 and Calu-3 ,were observed to be very close together but with a distinct separation. The scores of other samples were well distributed within each group, respectively with visible separation for the combination of all days.

E-nose performance
PCA was performed on the data and the eigenvectors and eigenvalues were calculated using correlation matrix. The eigenvectors of eigenvalue higher than 1.0 can be selected as principal components (PC) and value lower than 1.0 can be considered to be excluded, in this study, the first three PCs with eigenvalue higher than 1. 0, were selected for dataset at 24th, 48th and 72nd hours. Based on Fig. 6a, the samples were observed to be well separated. The total percentage of principal components (PC1, PC2, and PC3) in the PCA analysis as shown in Fig. 6a is 93.56%, which indicates that the each of the cell lines are separable. In order to emphasise the ability of sensors to distinguish the different lung cancer type, the PCA plot for Calu-3 and A549 were enlarged in Fig. 6b. The sensors managed to distinguish the 2 types of lung cancer each other might be due to the specific VOCs emitted from the cell lines since the origin of the A549 and Calu-3 cells are from epithelium and pleural effusion, respectively.
However, based on the PCA grouping behavior, it is observed that the features within the group were separated spatially compared to the LDA. The clustering of A459 and Calu-3 (lung cancer cells) observed to be significantly separated from the MCF7 (breast cancer cell) and WI38VA13 (normal cell) clusters. Overall, the extracted feature by LDA indicates good separability of different samples. Thus the LDA-based features were used to test the four different classifiers.

Classification results
The LDA-based features were used to test the four classifiers (LDA, PNN, KNN, NB and OVA-SVM) using 10-       high accuracy, sensitivity, specificity, precision and MCC in the testing phase. On the contrary, the LDA classifier has the least performance achieved and many samples were wrongly classified. Although LDA-based OVA-SVM showed the best performance, the percentage of accuracy, sensitivity, specificity, precision and MCC values using PNN algorithm shows consistently high for every class. The prediction quality value (MCC) of DMEM using LDA-based PNN algorithm shows only 0.3 lesser than the SVM. To support this fact, a study conducted by F.Moderasi (2014), suggested that the PNN algorithm can be used as an appropriate alternative for SVM as the training process of the PNN algorithm is easier than SVM algorithm [67].
The performance of NB was observed to be less than SVM, KNN and PNN classifier because it is a generative classifier, and generally this classifier is not as accurate as the discriminative classifiers [68]. However, the NB is still preferred to be used for the medical diagnosis application because of it is simple to build, easy to train and able to deal with the missing information [56,57]. According to K. Huang (2005), the NB performance can be improved by training the NB classifier in a discriminative way [68] .Thus, this method can be considered in future work to obtain excellent results from NB classifier.
When the LDA-based OVA-SVM performance rate was investigated according to samples at different incubation time, it was found that the classification accuracy rate improved significantly, achieving approximately 99% for the growth features of 24th-hour incubation period. The performance rate was observed to also improve for samples at 48th and 72nd-hour of cell growth. These may indicate that the VOCs of each sample increased with prolonged incubation periods.
The low performance of OVA-SVM for the 24th-hour compared to the 2nd day data may due to the insufficient time for the metabolites or compounds to be released by the cells to into the headspace. This may also happen due to relatively low cell numbers which cause the lower production of VOCs compared to the 48th and 72nd-hour of incubations. This corresponds to a previous study on in-vitro lung cancer cells by , where a number of compounds in the headspace are directly proportional to number of cells. This problem can be overcome using more concentrated cell seeding that might also help the differentiation between the other cell lines at an early stage of growth [69].

Identification of the VOCs of lung cancer cell lines and normal cell lines by SPME-GCMS analysis
The VOCs related to lung cancer cell metabolism were investigated using SPME-GCMS analysis. The headspaces of cultured lung cells have been compared to the headspace of medium with breast cancer cells, the normal lung cells and without cells, respectively. The complete list of identified VOCs, based on the average peak of total chromatograms of three replicates of each sample is tabulated in Table 11. These 32 selected compounds are supposed to emitted from the both background culture media and the metabolic activity of the cells.
Statistical significance of the relative abundances of the VOCs released from the lung cancer cell lines and the blank mediums have been evaluated using the t-test by considering p value less than 0.05 as statistically significant. This analysis conducted to eliminate confounding VOCs which are due to the different substrates rather than to the cell metabolism. The results were shown in Table 12. The same analysis also has been conducted on the VOCs released by the different cancer cell (MCF7) and the normal lung cancer (WI38VA13). The compounds and their significant differences have been tabulated in Table 13.
Among the 32 VOC compounds detected, 20 are related to the lung cancer cell lines. Out of these, 18 are observed to be significantly more in the headspace of lung cancer samples compared to the blank medium ( Table 12). Out of those 18, nine were observed to be absent from the blank samples. This indicates that these nine VOC compounds have specific association with the lung cancer cell metabolism.
In order to eliminate the influence of VOCs of culture media on the VOCs of lung cancer, the VOCs that found exclusively in the blank medium (statistically not significant) have been removed in the further analysis aimed at studying the properties of cancer cell lines. Furthermore, the aromatic compounds such as styrene, dimethyl silanediol, benzene and ethylbenzene are more linked to the contaminants [19,50,70,71], thus these compounds are also eliminated for further analysis.
Overall, the 11 VOCs identified as statistically significant in previous analysis for the discrimination between normal lung cell and breast cancer cell line. The abundances of each VOC related to lung cancer cells was compared to both lung cells and breast cancer samples and tabulated in Table 13.
As seen in Table 13, four VOCs, namely dodecane, decanal 2-ethyldodecanol and heneicosane, are specific to lung cancer cells. They are absent from the control samples. The VOC whose abundance significantly decreases in the lung cancer cells are propylbenzene, nonanal, 3, 4-dimethylheptane, 2, 4-dimethylundecane and 2ethylhexanol. The decane was observed to be increases significantly in the cancer related cell samples compared to normal lung cell line, indicating this compound more related to cancerous volatile. These results indicated that the headspaces of lung cancer cell lines are characterized by a specific VOCs signature.

Discussion
The VOCs analysis in the medical field offered a great alternative approach to cancer diagnosis. However, till date the use of VOCs analysis in the clinical approach is still limited due to the lack of validation of cancer related metabolites and sensing performance of VOCs sensors. In this work, the VOCs emitted by the 2 different lung cancer cell lines and the controlled cell lines, both breast cancer cell and normal lung cell lines were analyzed using the commercialized CP gas sensors (Cyranose 320) and GSMS-SPME. This work is highlighting the potential of these analysis techniques in providing meaningful information in the clinical application of lung cancer diagnosis. The Cyranose 320 e-nose used to analyze the headspace of conditioned culture cell lines (in-vitro) in the proliferative conditions for 3 days to discriminate the VOCs patterns released in the headspace of the cell lines during normal and proliferation stage. Results from the e-nose analysis highlighted that the cancer cell lines are able to classified with high accuracy using the VOCs patterns even at the early stage of cell proliferation (24th hours of incubation time). ++: Percentage of peak area more than 50%; +: percentage of peak area less than 50%; -: not detected (peak area < 1%) The ability for the Cyranose320 to be able to discriminate the VOCs of the cell samples with high accuracy even at the 24th hour of incubation provides a motivation to perform GCMS-SPME analysis. This allows the identification of the specific VOCs that are associated with the cancer cell growth. This was achieved by comparing the VOCs from lung and breast cancer cells to those of the blank mediums. Comparison of the chromatograms indicated that there were significant differences between the cell culture samples based on several compounds. There are total four specific VOCs identified as lung cancer related volatile, namely, heneicosane, dodecane, 2-ethyldodecanol and decanal.
The GCMS result also shows that higher alkanes group; heneicosane was found in both lung cancer cell lines, A549 and Calu-3, statistically significant from the controlled samples. This indicates that the heneicosane has high potential to be the lung cancer related biomarker. There are studies claimed the heneicosane as a candidate of the biomarker from lung cancer patients breath [28,72,73]. However, the origin of heneicosane in lung cancer cell remains unclear.
Another compound with a higher alkane group known as dodecane was observed to increases significantly in Calu-3 during the incubation period. There are few studies on lung cancer biomarker suggested n-dodecane to be associated with lung cancer in adenocarcinoma tissues [29], patient's breath, especially in EGRF mutated adenocarcinoma patient's breath [74]. Dodecane also found to be related to breast cancer [75].
Among the detected VOCs, one specific compound, namely decane, which is also from the high alkanes group, was observed to be emitted by all of the three cancer cells. Similar results were obtained by Yishan. W and B G.Hyun. the decane is found in the lung cancer Table 12 VOCs discriminating the headspace of lung cancer cell lines and blank mediums. Analysis of abundances of VOCS in the headspace of lung cancer cell lines using GCMS-SPME. VOCs increased (emitted) and or decreases (consumed) by lung cancer are reported with respect to blank medium. A p-value < 0.05 has been considered statistically significant  Table 13 VOCs discriminating the headspace of lung cancer cell lines and control cell lines. Analysis of abundances of VOCS in the headspace of lung cancer, breast cancer and normal lung cell lines using GCMS-SPME. A p-value < 0.05 has been considered statistically significant The (↑) and (↓) shows the trend of abundances increases and decreases in lung cancer cell line samples respectively tissue of patients [29,72]. Another study by Chen. X, using different lung cancer cells also found that decane to be one of the 11 compounds with higher concentrations compared to those of normal cells [76]. Decane also considered as a lung cancer biomarker in a patient's breath [77,78]. A significant difference found in the concentrations of decane in the patient's breath before and after surgery [79]. Still, the origin of decane in breast cancer cell has never been reported in any previous studies. According to a study by Meggie. H (2010), representative of hydrocarbon is reported as potential biomarker of lung cancer and suggested that these compounds are probably the outcome of oxidative stress [80]. The alkanes are mostly produced from lipid peroxidation by reactive oxygen species (ROS) supported by few studies stating that alkanes and methylated alkanes are found in lung cancer [50,70,71,80] and breast cancer [31,34,81].
A specific VOC released by A549 cell lines distinguished this cell line from other cell lines and blank medium which is decanal. A study in 2011 reported that decanal was used as a biomarker to detect non-small lung cancer using electronic nose with 95% sensitivity and 70% specificity [82]. Decanal was used as one of the primary contributors to separate non-small cell lung cancer and small cell lung cancer as well, with 100% sensitivity and 75% specificity by Barash. O in a study conducted in 2012 [33]. Whereas, there is only one specific VOC, 2-ethyldodecanol has been emitted by Calu-3.
The obvious VOCs emitted by MCF7 cell in this study were 3, 4-dimethylheptane, hexadecane and 2-phenyl-2butanone. This finding is in line with one study which found hexadecane in the breath of a breast cancer patient [31]. However, no previous published studies on volatiles from breast cancer have reported the existence of 3, 4-dimethylheptane and 2-phenyl-2-butanone. The normal cell WI38VA13 emitted four different VOCs which were Amphetamine, Xylene, 2, 4-dimethylundecane and heptadecane. The 2-ethydodecanal, 3, 4-dimethylheptane, 2-phenyl-2-butanone, Amphetamine, Xylene, 2-4-dimeth ylundecane and heptadecane have not reported to date as biomarker in any in-vitro studies. Thus, the significance of these compounds remains unclear. Besides, the measurement time for VOCs collection used was in contrast with previous studies, where the VOCs collected after 24 h of cell growth. This is to ensure the compounds were collected at proliferation stage.
Nonanal and 2-ethylhexanol from WI38VA13 cells were found to be significantly more than that from A549 and Calu-3. In contrast to results observed in this study, it has been reported that the detection of nonanal is significant [83,84] and used to separate adenocarcinoma and squamous cell carcinoma [74]. As for 2-ethylhexanol, the results here corresponds to other previous studies on lung cancer detection, and was never found to be one of the biomarkers. This indicates that these compounds might have a specific association related to cell metabolism. The WI38VA13 cells also share aromatic compounds with DMEM, which might be the reason for the overlapping of DMEM group in the WI38VA13 in the PCA and LDA analysis as shown in Figs. 5 and 6a.
In summary, the VOCs that exist in lung cancer cell lines but not in the control samples and those which exists in higher concentrations in the former may be considered as possible biomarkers as shown in Table 14. Decanal, dodecane, 2-ethyldodecanal and heneicosane may potentially be used to discriminate lung cancer cells from other type of cancer or normal cell lines. Decane on the other hand can potentially be used as a specific biomarker for cancer. These findings suggested that the identified VOCs are able to offer more information regarding in-vitro cultured cell line metabolism and aid

Conclusion
This study presents the possibility of using VOCs as biomarkers for cancer cells. Specific VOCs are verified to be specific to cancer cells compared to of the normal samples. The headspace of in-vitro cultured cell lines were analyzed using a Cyranose320 e-nose consisting of an array of sensors and GCMS coupled with SPME. Several classifiers were used to validate the ability of the enose to discriminate the cancer cells to that of the normal samples and blank mediums, namely the LDA, NB, KNN, PNN and OVA-SVM. The investigation was carried out to identify cell lines VOCs at three different proliferation stages under a normal laboratory condition.
The results from this study shows that the Cyra-nose320 was able to discriminate the VOCs released by the various cancer and healthy cells as well as the blank mediums. The classifiers tested were able to perform high levels of accuracy. The LDA based OVA-SVM records the best performance with 100% successful classification, even at the early stage of cell growth (24th hours of incubation) and managed to maintain this performance at 48th and 72nd hours.
The VOCs pattern collected from e-nose results were validated by the GCMS-SPME. The results show that particular cell lines produced specific VOCs. This study provides a list of possible VOCs, which is believed, can be specific biomarkers for lung cancer, even at the 24th hour of cell growth. The potential list of VOCs obtained from this study was compared with the previous studies as shown in Table 14. This also concludes that the enose in conjunction with GCMS-SPME is able to be a non-invasive screening tool at an early stage. This is particularly useful for the clinician to understand in the event any occurrences of overlapping groups in the enose results.
Besides, this study also shows that the use of existing tools such as GCMS-SPME and e-nose-based gas sensor array system promises the potentials to improve the cancerous VOCs detection system by optimizing the sensor selections. The sensors with higher selectivity and sensitivity are essential in order to capture the specific biomarkers. Therefore, further studies on optimizing the sensor system and using in-vivo studies (e.g. using breath samples) are underway with the ultimate goal to develop a complementary tool for clinical testing.