Demonstrating the application of Raman spectroscopy together with chemometric technique for screening of asthma disease.

Medical biophotonic tools provide new sources of diagnostic information regarding the state of human health that are used in managing patient care. In our current study, Raman spectroscopy, together with the chemometric technique, has successfully been demonstrated for the screening of asthma disease. Raman spectra of sera samples from asthmatic patients as well as healthy (control) volunteers have been recorded at 532 nm excitation. In healthy sera, three highly reproducible Raman peaks assigned to β-carotene have been detected. Their sensitive detection is facilitated due to the resonance Raman effect. In contrast, in asthmatic patients sera, the peaks assigned to β-carotene are either diminished or suppressed accompanied by other new Raman peaks. These new peaks most probably arise due to an elevated level of proteins, which could be used to identify/differentiate between asthma and non-asthma samples. Furthermore, a partial least squares discrimination analysis (PLS-DA) model was developed and applied on the Raman spectra of diseased as well as healthy samples, which successfully classified them. The correlation coefficient (r2) of the model was determined as 0.965. Similarly, the root mean square errors in cross-validation (RMSECV) and in the prediction (RMSECP) are 0.09 and 0.25, respectively. PLS-DA has the potential to be incorporated in a microcontroller's code attached with a hand-held Raman spectrometer for screening purposes in asthma, which is a disease of great concern for the clinicians, especially in children.


Introduction
Asthma is a heterogeneous inflammatory (airways blockage) disease followed by wheezing, breathlessness and chest tightening [1,2]. These symptoms can vary in terms of time and severity of conditions. The blockage and inflammation of airway during the onset of this disease are due to the release of certain inflammatory cytokines like interleukin (IL) 4,9,13 by T-helper type-2 cells (Th2) and effector cells, which assemble toxic inflammatory molecules that eventually evoke the blockage of airways [3,4]. This disease affects the people of all ages and ethnic groups. Asthma is not considered to be a single disease, rather a syndrome that can be triggered by multiple biological mechanisms. Classification of asthma is purely based upon the severity of symptoms, forced expiratory volume per second and peak respiratory flow rate [5,6]. There are six different endotypes of asthma out of which four are related to severe asthma [7,8]. These four types comprised of allergic asthma, late onset asthma, aspirin exacerbated airway disease and allergic bronchopulmonary mycoses [9].
Approximately 300 million people are suffering from this disease worldwide and this number is estimated to be increased due to urbanization [10]. Several environmental and internal factors like obesity, pollutants, previous exposure to respiratory infections etc. trigger the development of asthma in adults and in children. In early childhood onset of asthma, about 95% children became victim of asthma before the age of six years. Different factors are associated with early childhood asthma which includes allergens, bacterial exposure and viral infections [11]. In an allergic asthma the incidence is as high as 12 cases per 1000 persons per year [12]. Adult-onset or late-onset asthma appears from the age 12 to ≥ 65 [13,14]. On contrary to childhood asthma, very little information is known about the widespread and risks associated with adult onset asthma. Studies have shown that patients with adult onset asthma has poor prognosis and rapidly lose their lungs function accompanied with more severe obstructive airflow [3,15,16].
Conventionally, diagnosis of asthma is totally based on history of wheeze, shortness of breath, and cough, but these symptoms are varying from patient to patient [17]. Currently in use methods for the clinical diagnosis of asthma includes spirometry, sputum induction techniques, peak expiratory etc. along with some disadvantages associated with these techniques [18]. On the other hand, body fluid screening shows desirable results because body fluids are very important for biomarker identification [19]. Identification of the biomarkers would be of great clinical advantage which in turn provides a vivid picture of the pathobiological pathways experienced by the victims of asthma and to what extent this disease can be controlled. Currently exhaled nitric acid is the most commonly used biomarker of inflammation in case of asthma [20]. Moreover, in tissues methods such as Broncho alveolar lavage (BAL) and bronchoscopy with bronchial biopsy known to be the gold standard for screening airway inflammation and remodeling in asthma. Furthermore, identifying asthma in children having age less than five years is different. Instead, doctors rely only on certain symptoms that a child may have asthma which sometime could be misleading. Nevertheless, invasiveness of these techniques confines their use in daily clinical practice [21].
During last decade, biophotonic techniques have attained the status of medical instruments, while many are still in research and development phase which shows very useful results in detecting many diseases [22]. Among these, Raman spectroscopy has been considered as the most sensitive method in providing information about the biochemical nature of tissue samples in real-time and in an automated manner. It has been widely used for the characterization of lung cancer from saliva, breast cancer, atherosclerosis and other types of cancer screening from sera samples [23,24]. Also in the context of asthma, Raman spectroscopy was applied at an early phase pilot study of disease diagnosis based on serum. In that study, sera samples from 44 asthmatic patients and 15 reference subjects were used to acquire Raman spectra at 785 nm excitation [25].
The current study demonstrates the applicability of Raman spectroscopy at 532 nm excitation together with chemometric techniques to identify the molecular changes in response to the onset of asthma disease in human blood sera. The spectral differences observed in blood sera of thirty healthy and eighty asthmatic patients' data are incorporated into machine learning algorithms that can in turn be executed in real-time to yield classification. Principal component analysis (PCA) and PLS-DA were used to develop a predictive model that generalized the features obtained from any given data by clustering them into different groups based on the differences as well as their similarities. Hence Raman together with chemometric technique exploited to monitor tissue physiology as well as extending the same knowledge for tissue diagnosis.

Sample collection and preparation
Blood samples from eighty asthma confirmed patients of different age, gender and conditions have been collected from Pakistan Institute of Medical Sciences hospital, Islamabad. For comparison, control (healthy subject, not getting any medications) samples from thirty volunteers have also been collected. Proper written consent was obtained from all blood donors. The demographic and clinical characters of patients with complete history was collected with the help of a proper questionnaire, including different parameters like name, age, sex and the condition of disease. A 5cc blood sample was collected from each patient in gel tubes. Samples were placed standing for 30 minutes for clot formation. All these blood samples were then centrifuged at 3500 rpm for 10 minutes using Hittich Centrifuge D-7200 for serum extraction. The obtained serum was poured in different tubes and stored at −16°C until recording Raman spectra. During the entire procedure, from blood collection until sera extraction, safety precautions have been strictly followed. Age and gender distribution of 110 samples collected for this study have been given in Table 1.

Raman spectra recording
A drop of 30 µl from each serum sample has been put on the glass slide and kept at room temperature for water moisture to vaporize (naturally evaporated under room temperature). Raman spectrum from each sample was recorded using Raman spectrometer (µRamboss DONGWOO OPRTON, South Korea) with spectral resolution of 4 cm −1 as explained in our previous study [26]. A laser diode emitting CW laser beam at 532 nm has been used for the excitation. The laser power at the sample surface was 40 mW. To maintain the integrity of the sera, laser power at the sample surface was less enough to avoid any kind of photodegradation. From each sample, at least five spectra were recorded by focusing laser light at different transverse positions. The integration time for exposure was set at 5 seconds. A 100x magnification microscope objective lens has been used with numerical aperture of 0.7, for focusing purpose as well as collection of backscattered light. The spot size of the tightly focused beam was of the order of micron on the sample surface. Raman spectra from all samples have been acquired in spectral range from 600 to 1800 cm −1 .

Data processing
All Raman spectra of biological samples involve noise due to chemically complex nature and widely varying targets of the source. Apart from that, the strongest challenge in obtaining Raman spectra from biological samples is to suppress the intrinsic fluorescence background arises due to natural fluorophores. Nowadays, different software tools are available in the market for background subtraction from raw Raman signals. For pre-processing, a computer code was developed in the MATLAB environment [26]. Before making analysis in current study, noise contribution was removed in the entire spectral data. De-noising of spectra is done by means of wavelet decomposition and reconstruction method 'wden' and Stein's principle of unbiased risk for soft thresholding. After denoising, the spectral data from both types of sera sample have been smoothed Savitzky-Golay filter applying five points span with 3rd order polynomial fitting. Such smoothening functions eliminate noise more efficiently without attenuating the Raman spectral data features. For background removal (fluorescence subtraction), a built-in 'msbackadj' function in MATLAB (2016) with a window of width 200 separation units and a 3rd degree polynomial for baseline estimation was used.

Development of multivariate model
Principal component analysis (PCA) is a statistical technique working in unsupervised way on input data by converting a set of values of possible correlated variables into linearly uncorrelated variables called principal components. Using this chemometric technique, any given data can be displayed into different groups based on the differences as well as their similarities. This technique is commonly based on underlying mathematical rules. The basic principle of this technique is the conversion of data set containing large number of variables in to possibly small number of variables (dimensionality reduction). In Raman spectral analysis dimensionality reduction of variables in the data set is mostly done using PCA. In this way, PCA categorize the correspondence in the data and help in accomplishing better and quick results. The principle component (PC) with the largest variance in it is called first PC.
To calculate the second one, it must be orthogonal to the first PC and captured second largest variance. This statement can be clarified by the fact that the second PC records the possible variance that was omitted by the first PC and so on [27]. PLS-DA is another recently developed predictive model that generalizes the features obtained from PCA. Such models are useful tools when prediction is the goal rather than understanding the underlying relationship between the variables. PLS regression predicts responses (Y) from variables (X). The optimum projection of X is achieved through PCs. The responses (Y) are then predicated by applying a regression based on these PCs. In PLS regression, the response Y and variable X are decomposed to compute their associated variations with a smaller number of PCs. The basic aim of this decomposition is to maximize the covariance between these variables. In our case, the Raman spectral intensities of all samples are employed in X, whereas Y is a column matrix that consists of clinical results of all samples. The samples of asthmatic patients are labeled as '1', while the healthy volunteers or negative samples as '0'.
To perform PLS-DA, data segregation is performed by providing a number in graphical user interface (GUI) that how many samples will be used for testing of the developed model. To avoid over training of the model it is necessary to select minimum number of PCs, which explain most of the variance. In this study, we have used the methods of Kaiser, Scree and Parallel factor to achieve this goal. These methods have shown that two PCs are sufficient because they accumulatively explain 88% of the variance.

Spectral data analysis
Raman spectra recorded from any biological samples like blood sera comprised of many peaks ranging from larger to smaller ones. The analyses of Raman spectra are based on the differences that arise due to intensity variation or position of the various Raman bands between normal and abnormal body fluids or tissue samples. Each peak in the Raman spectra represents a specific vibration of a biomolecule, while its intensity primarily reflects the concentration of a molecular bond represented by this vibration within the sample. The full Raman spectrum corresponds to a specific fingerprint, which can be used for differentiation of samples. Figure 1 shows the vector normalized Raman spectra of overall asthmatic and normal blood sera samples. Similarly, Fig. 2 shows, the vector normalized mean Raman spectra of asthmatic and normal sera. For demonstration, the Raman spectra of normal blood sera are shown in blue color, whereas the Raman spectra of asthmatic blood sera are shown in red color.
In normal blood sera, three prominent Raman peaks at 998, 1156 and 1515 cm −1 can be seen that are assigned to carotenoids [28]. These three Raman peaks are highly reproducible and conserved and are strongly enhanced at 532 nm excitation due to the resonance Rama effect. In case of asthmatic patients sera, the Raman peaks at 1156 and 1515 cm −1 are either diminished or suppressed. Contrarily, new peaks arises at different shift position such as at  749, 830, 930-950, 1190, 1230, 1330, 1449 and 1656 cm −1 as can be seen in Fig. 1 and Fig. 2. The appearance of these additional peaks in the Raman spectra of asthma patients' sera are most probably due an elevated level of serum proteins albumin and globulin relative to carotenoids. It is well known that these proteins make up 98% of material in serum. The detailed assignments of all Raman peaks are given in Table 2.
The Raman peak appeared at 998 cm −1 is due to symmetric ring breathing mode of phenylalanine in proteins and β-carotene [29]. Figure 1 as well as Fig. 2 shows slight change in the intensity at this peak position when comparing normal and asthmatic sera samples. In the presence of the other prominent Raman peaks at 1156 and 1515 cm −1 corresponding to βcarotene [30], the peak at 998 cm −1 is rather assigned to β-carotene whereas it is assigned to phenylalanine in the presence of protein peaks and absence of carotene peaks. Our results clearly show a decrease in β-carotene in diseased samples when compared to the normal. The primary function of β-carotene is prevention of DNA damage with reference to lipid peroxidation. Studies also revealed that β-carotene is known to be involved in the pathophysiology of many diseases, including diarrhea, acute respiratory infections, heart disease, immunological disorders, and asthma [31]. There is a strong relationship between reduced serum vitamin-A (β-carotene) levels and chronic airway obstruction in adults [32]. Similarly, some specific cells involved in airway inflammation in case of asthma are involved in the production of reactive oxygen species (ROS). The significant decrease in β-carotene level in sera samples of asthmatic patients could be due to its antioxidant nature. On the other hand, the Raman peaks arise near shift position 750 (Trp), 830 (Tyr), 930-950 (C-C of peptide backbone), 1000 (Phe), 1449 (CH 2 ) and 1656 cm −1 (amide I of peptide backbone) account for proteins. Typical Raman spectra of serum were recently shown in studies of extracellular vesicles [33]. Here, the spectral contributions of carotenoids were lower because the resonance enhancement was lower at 785 nm excitation than at 532 nm excitation. In asthmatic serum samples, the band near 1000 cm −1 is mainly assigned to Phe in proteins. Overall, the molecular assignment of the Raman peaks of normal as well as asthmatic patients' sera along with variations in their peak intensity (increase/decrease) are shown in Table 2. The combination of these Raman bands indicates a lower carotenoid to protein ratio in asthmatic blood sera, which could help the clinicians in identifying/differentiating between asthma and non-asthma diseases, based Raman technique.

Chemometric data classification and analysis
In recent years, great efforts have been made in applying multivariate techniques for spectroscopic data analysis. The spectral differences acquired using Raman spectroscopy was further authenticated with the help of multivariate statistical techniques like PCA and PLS-DA that empowers its user to classify even minute levels of variations in biological samples. One of the attractive features of these models is that they consider full Raman spectra for characterization rather than few noticeable bands. This ensures the detection/recognition of minute spectral variations in samples. Figure 3 depicts the scattering plot of healthy and asthmatic samples in the PCA domain. By plotting PC1 vs. PC2 the data set is clearly separated into two classes along the PC1 axis. Furthermore, it is noted from Fig. 3 that the within-class spread in data set is low particularly in case of asthma patients' data, whereas the separation (between-class distance) is also of acceptable limit. Both these properties are desirable for discrimination of data points using chemometric techniques. Similarly, in Fig. 4 is shown PC loading vectors of healthy and asthmatic blood sera. PCA differentiate between the two classes based on their loading vectors. The loadings plots depict several peaks that show their contribution to the PC scores. For example, at shift position 998, 1150 and 1510 cm -1, the PC1 loadings have negative values, showing that the intensities of Raman scattering signal at these peeks, which are assigned to carotenoid, are high for healthy samples as compared to asthmatic samples. Similarly, spectral peaks at 750, 830, 998, 1449 and 1656 cm −1 the PC2 loadings have positive values, showing that the intensities of Raman scattering signal at these peeks, which are assigned to protein bands, are high for asthmatic samples as compared to healthy samples. These results are in good agreement and are as shown in the original Raman spectra. As PC2 scores do not significantly contribute to the separation of healthy and asthmatic blood sera (see Fig. 3), protein variations (as seen in Fig.  4 by PC2 loadings) are less relevant. PCA can be considered as deconvolution of the data set into spectral contributions of carotenoids in PC1 and proteins in PC2. Moreover, PLS-DA based predicted values for testing as well as training data sets are plotted as shown in Fig. 5. Positive samples for asthma were labelled '1' while negative samples for asthma were labelled as '0', as displayed along x-axis. Prediction values yielded by PLS-DA are plotted along y-axis. A dashed line at 0.5 is cut-off line with black dashed line at 0.4 and 0.6 to demonstrate a grey region. For a sample, if PLS-DA yields a value which is greater than 0.6 then it is declared as predicted-positive sample. Whereas, if the PLS-DA yields a value which is less than 0.4 then it is declared as predicted-negative sample. A sample is said to be in grey region and inconclusive if PLS-DA yields a value between 0.4 to 0.6. For testing data set; the sensitivity, specificity and accuracy were found to be 100% each. The correlation coefficient (r 2 ) of the PLS-DA model was determined as 0.965. Similarly, the root mean square errors in cross-validation (RMSECV) and in prediction (RMSECP) are 0.09 and 0.25 respectively.

Conclusions
The present study demonstrates the application of Raman spectroscopy together with chemometric techniques by considering the molecular changes in response to the onset of asthma disease in human blood sera. The spectral differences observed in blood sera of healthy and asthmatic patients' data are incorporated into machine learning algorithms that can in turn be executed in real-time to yield classification. Model developed by PLS-DA consists of a regression vector, which can easily be incorporated in the code of a microcontroller attached with a hand-held Raman spectrometer to screen the disease in real time in a cost-effective manner. Hence, one can conclude from the current study that, Raman together with chemometric technique could help in identifying between asthma patients and healthy controls. Furthermore, development of such technique could be indeed helpful for screening of asthma particularly in the earlier stage of a child which is of great concern for the clinicians. A possible option to improve the sensitivity is to mix blood sera with gold or silver nanoparticles which enhances Raman signals due to the surface enhanced Raman scattering (SERS) effect as shown for colorectal cancer detection [34]. It is difficult to compare the results with the previous Raman study of sera from asthma patients [25] because of differences in sample preparation, acquisition of Raman spectra at 785 nm excitation and data analysis by principal component based linear discriminant analysis. In this regards, further studies will extend the knowledge on how to use Raman spectroscopy together with chemometric technique for diagnostic purpose for other chronic/infectious diseases. The applicability of same technique (approach) for other diseases will step towards generalization of the tool which is the ultimate goal set by the scientific community.