Quantitative analysis of human blood serum using vibrational spectroscopy

Graphical abstract


Introduction
Although concepts of biomedical applications of vibrational spectroscopy have been discussed for more than three decades, and numerous studies have demonstrated feasibility in, for example, histological and cytological studies, clinical translation has been slow, prompting consideration of achievable, strategic targets [1,2]. In this context, applications for bodily fluid analysis have attracted increasing attention over the past [3,4]. In a general clinical context, bodily fluids (e.g. plasma, serum, saliva or urine) are emerging as an important source of samples for disease diagnosis and therapeutic monitoring, as their collection is relatively simple, largely non-invasive, and cost effective [5][6][7][8][9].
Blood plasma/serum remains the primary clinical specimen of interest, and has been studied even before genes were known to exist [10]. It contains more than 300 types of proteins, as well as carbohydrates, lipids and amino acids, and up to 114,000 known metabolites at varying concentrations [11]. As well as a rich source of biomarkers for disease diagnostics, imbalances in endogenous plasma/ serum constituents themselves are of considerable clinical importance. In the high molecular weight fraction (HMWF), levels of albumin, the most abundant plasma protein, are typically elevated in cases of acute hydration [12], and decrease dramatically in critically ill patients (hypoalbuminaemia) [13], in sepsis and after major surgical stress [14,15]. Many clinical studies have associated elevated levels of fibrinogen in patients stroke [16] with cardiovascular disease [17,18] and thrombosis and pulmonary embolism [19]. Variations of levels of lower molecular weight fraction (LMWF) constituents such as glucose, must be monitored on a regular basis in the case of diabetes patients [20], and urea, are important for evaluating kidney or liver function, and even heart failure [21][22][23]. Quantitation of exogenous agents such as viral loads can be critical for assessing the severity of an active viral infection, prognosis and to guide antiviral therapy [24]. Therapeutic dose monitoring (TDM) in the blood stream is becoming increasingly important to understand the pharmacokinetics of chemotherapeutic agents and to individualise dose regimens per patient [25].
Blood collection is a routine clinical process which is largely standardised, although some variations in collection protocols can occur. For blood plasma collection, differing anticoagulants, including heparin and Ethylenediaminetetraacetic acid (EDTA), are used in order to prevent blood clotting. It has been shown, however, that the choice of anticoagulant is noticeable in the resultant infrared spectrum, which may conceal underlying biological [26,27]. The poorly soluble protein, fibrinogen, has also been seen to cause a large scattering background in Raman measurements of liquid samples [28,29]. For this reason, many spectroscopic studies have been conducted on human serum, rather than plasma samples.
There have been a number of reviews in the recent past of vibrational spectroscopic analysis of biofluids, particularly with a view towards clinical applications [4,[30][31][32]. In the following, a review of the methodologies specifically for quantitative analysis of blood serum constituents using both infrared absorption and Raman spectroscopies is presented, drawing in particular from the experience of the authors, to assess challenges and highlight developments in the field. Clinically relevant imbalances in endogenous constituent components, such as high molecular weight proteins, as well as low molecular weight constituents such as glucose and urea are considered. Potential clinical applications in monitoring viral loads and therapeutic drug monitoring are discussed.

Materials and methods
Specific examples of experimental data are drawn from previously published work, with appropriate reference and copyright permissions. Summary details are provided below. In the case where measurements of human patient samples are referred to, specifics of compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) are provided in the original referenced publications.

FTIR spectroscopy
FTIR Transmission spectra [33] were recorded using a Perkin Elmer Spotlight 400 N FTIR imaging system. The system is equipped with an AutoImage microscope system operating with a X40 Cassegrain objective. FTIR spectra were collected in transmission mode over the nominal freescanning spectral range with an interferometer speed of 1.0 cm/s, using a liquid nitrogen cooled mercury cadmium telluride (MCT-A) line detector, with a pixel size of 25 mm x 25 mm and a spectral resolution of 4 cm À1 .
Background measurements of 120 scans per pixel were acquired on a blank substrate, whereas 8 scans per pixel were recorded from the sample, deposited on a CaF 2 window and air dried before recording.

ATR-FTIR spectroscopy
ATR-FTIR spectra were recorded with either a Perkin Elmer Spotlight 400 N Universal Attenuated Total Reflectance (UATR) accessory [33] or a Bruker Vector 22 equipped with a UATR module [34]. Sample penetration is both wavenumber and sample dependent, but is typically on the order of 1 mm. Prior to recording, a background spectrum was recorded in air and automatically subtracted by the software.

Raman spectroscopy
A Horiba Jobin-Yvon LabRAM HR800 spectrometer was used for all data presented. The spectrometer, housed in a stabilised room temperature (18 C) environment, was coupled to an Olympus IX71 inverted microscope. A X60 water immersion objective (LUMPlanF1, Olympus) was employed, providing a spatial resolution of $1-2 mm at the sample, with a laser intensity of between 35-40 mW. The confocal hole was set at 100 mm for all measurements, the specified setting for confocal operation. Either a 532 nm [28,[35][36][37][38] or 785 nm [33] laser was used as source, and the system was spectrally calibrated to the 520.7 cm À1 spectral line of silicon, by an automated instrument software routine, to ensure consistency within and between datasets. In all experiments, a 300 lines/mm grating was used, providing a spectral dispersion of approximately 1.5 cm À1 per pixel with the 785 nm laser line, and approximately 1.0 cm À1 per pixel with the 532 nm laser line. The detector used was a 16-bit dynamic range Peltier cooled CCD detector, and the backscattered Raman signal was typically integrated for 2 Â 30 s [33] or 3 Â 80 s [28,[35][36][37][38] over the spectral range from 400 to 1800 cm À1 .

Materials
Gelatin (BDH, Ireland, 44045) [33] and glycine (Cooper, France) [34] were analysed in both the powder form and after dilution in distilled water to systematically varied concentrations. Solutions were either analysed in the liquid state, or after deposition on a CaF 2 window (Crystran Limited, UK), or ATR crystal, after which they were air dried. g globulins (G4386), albumin (A9511), urea (F3879) and b-carotene (C9750À5 G) were purchased from Sigma Aldrich, Ireland. Solutions of varying concentrations of urea (1À1000 mg/dL) were prepared in distilled water by varying the concentrations over a physiologically relevant range.
Methotrexate (MTX -A6770) and Busulfan (Bu -B058) [38] were purchased from Sigma Aldrich, Ireland. Stock solutions of 0.1 mg/mL Bu in methanol and 1 mM MTX in 0.1 M NaOH were prepared. Commercial human serum was spiked with Bu and MTX over therapeutically relevant concentration ranges, to achieve the final concentrations of (0À0.05 mg/ mL) for Bu and (0-100 mM) for MTX. The spiked concentrations of Bu in serum are expressed in mg/mL and MTX in mM to be consistent with previous studies [39,40].
Sterile filtered human serum from normal mixed pool (off the clot) was purchased from TCS Biosciences (Ireland), or Sigma Aldrich (Ireland), or was donated by the University Hospital (CHU) Bretonneau de Tours (France), as part of the Cancéropôle du Grand-Ouest consortium [34]. Dglucose (Fisher scientific, UK) was employed to spike serum samples over the concentration ranges 0.0 mg/dL (control) -220 mg/dL [41] and 440 mg/dL [35], to explore the dynamic range and sensitivities of the respective spectroscopic techniques. The concentrations were selected to encompass a wide range of physiological relevance to simulate hypoglycaemia (<60 mg/dL), normal level (70/110 mg/dL) and hyperglycaemia (>120 mg/dL), in order to evaluate the potential of the protocols for clinically relevant human serum monitoring.
Depending on the experiment, samples were prepared either by deposition of 20 mL of the serum on a CaF 2 window (Crystran Limited, UK), followed by air drying or filtration of 0.5 mL of the serum using Amicon Ultra-0.5 mL centrifugal filter devices (Merck, Germany). In all cases, the procedure for washing the centrifugal devices prior to serum analysis was adapted from Bonnier et al. 2014 [42], according to manufacturer's guidelines. Either 3 kDa, 10 kDa or 50 kDa devices were employed, individually or in combination, and typically 0.5 mL of the serum was placed in the device and centrifuged at 14,000 x g for 30 min. The filter device is then placed upside down in a new Eppendorf and spun down at 1000 g for 2 min. Two fractions were obtained following each filtration; the first representing constituents with a molecular weight higher than the cut off point of the filter used (concentrate); the second corresponding to the fraction passed by the membrane and collected in the vial (filtrate). For liquid state Raman measurements [28,[35][36][37][38], the substrate used was a Lab-Tek plate (catalogue number 154534) with a 0.16À0.19 mm thick glass bottom, 1.0 borosilicate cover glass, purchased from Thermo Fischer Scientific, Ireland.

Patient samples
Patient serum samples (n = 25) were donated by the University Hospital CHU Bretonneau de Tours (France) [35,41]. Initially, the samples were collected during routine blood check-ups, 1 mL of the vial remains being provided for further spectroscopic analysis. Albumin, g globulins (IgG, IgM and IgA), total protein, urea and glucose concentration levels were obtained by routine biochemical analysis using a COBAS 1 analyser (Roche Diagnostics), following the CHU guidelines for routine biochemical analysis. Further patient details and serology analyses have been provided in the original manuscripts [35,41]. Samples were fractionated, as appropriate, using Amicon Ultra-0.5 mL centrifugal filter devices (Merck, Germany).

Spectral preprocessing
Depending on the analysis, different pre-processing steps were performed using Matlab (Mathworks, USA). In the case of FTIR and ATR-FTIR analyses shown, the spectra collected from the air dried whole serum and the filtered samples (LMWF) were processed using baseline correction (rubber-band) followed by vector normalisation [34].
In the case of Raman analyses of spiked and patient serum samples, smoothing of the raw data was done using the Savitzky-Golay method (polynomial order of 5 and window 13) and the rubberband method [43] was found to be appropriate to baseline correct the smoothed reference spectra of all the analytes (used for EMSC, see below) and the smoothed spectra of varying concentrations of albumin spiked in distilled water. An adapted Extended Multiplicative Signal Correction (EMSC) algorithm, details of which, as well as the (Matlab) source code are available in [44], was applied to the raw dataset to remove the spectral interferents from the data. The algorithm fits the measured spectra with a weighted sum of a reference spectrum, spectra of the interferents to be subtracted, and a baseline polynomial, also to be subtracted. EMSC was applied to remove the underlying water spectrum from all the datasets. The weighting of the contribution of water to each spectrum can also be used to scale the analyte spectra, assuming a constant water contribution to all sample spectra [35]. In the case of the analysis of total protein, g globulins and albumin content [37], EMSC was also applied to remove the betacarotene. Since the Lab-Tek plate has a thin glass bottom, no glass correction was required. Raman spectra of the pure serum, g globulin (>100 mg/mL), albumin (>100 mg/mL), glucose (>450 mg/mL) and urea (>100 mg/mL), b-carotene (>100 mg/mL) prepared with minimal amount of water were used as reference spectra for EMSC.

Spectral analysis
The different data analysis steps were performed using Matlab (Mathworks, USA). The principle methodology for quantitative analysis of both spiked and human patient serum samples was that of Partial Least Squares Regression (PLSR). PLSR is a multivariate statistical method which aims to establish a model that relates the variations of the spectral data to a series of relevant targets, in this case analyte concentrations, or clinical parameters. The PLSR model attempts to elucidate factors which account for the systematic majority of variation in predictors 'X' (spectral data) versus associated responses 'Y' (target values of analyte concentration) [45]. PLSR enables the construction of a regression model, which can then be used to predict the concentration of the target based on a spectral measurement. Constructed based on the spectra of samples of known analyte content, either solutions of varying concentrations in distilled water or those of the patient serum, the model is then validated using a rigorous cross validation procedure which evaluates its performance in accurately predicting the analyte concentrations [35]. This approach involves randomly dividing the set of observations into approximately equal size, $50 % of the spectral data randomly selected as test set, while the remaining $50 % is used as the training set. The crossvalidation process is then repeated n times (the folds), whereby all observations are used for both training and testing, and each observation is used for testing exactly once. The results from the folds can then be averaged to produce a single estimation. The Root Mean Square Error of Cross Validation (RMSECV) is calculated from the n iterations to measure the performance of the model for the unknown cases within the calibration set. The correlation between the concentration and spectral intensity is given by the R 2 value. The number of latent variables used for building the PLSR model is optimised by finding the value that is equivalent to the minimum of the RMSECV. Using an unseen dataset, the predictive capacity of the model can be verified by the Root Mean Square Error of Cross Prediction (RMSEP).

Endogenous constituents of human serum
In describing their protocol for analysis of protein secondary structure using FTIR, Yang et al. emphasise the importance of measuring such biological macromolecules in an aqueous environment [46]. An optically thin cell, whose thickness is defined by spacers of order <10 mm, with windows of CaF 2 or other such IR transparent material, is employed to minimise the pathlength, and the contributions of the aqueous environment, which overlap the amide I bands of the proteins ($1640 cm À1 ), can be digitally removed by subtraction. Relatively high concentration protein concentration solutions (>3 mg/mL) are recommended, to minimise the effects of under or over subtraction, which can introduce artefacts to the analysis, and cautious recording of the appropriate "blank", of identical composition and under identical conditions, is also emphasised.
Although similar transmission cells can also be employed in FTIR microscopy, and notably have been employed for live cell analysis [47], even at sub-cellular resolution [48], they are cumbersome for clinical applications, and in biofluid measurement, a droplet deposited on a suitable substrate is commonly analysed, in either transmission or (trans) reflection mode. The water of an aqueous droplet deposited on a transparent substrate is strongly absorbing, however, and therefore the droplet is usually allowed to dry [49]. This process gives rise to the socalled phenomenon of the "coffee-ring effect", illustrated in Fig. 1 for the case of a human serum droplet deposited onto a CaF 2 disc, measured by FTIR transmission [33]. In the specific case of human serum, this effect should more correctly be referred to as the Vroman effect, named after the Dutch haematologist, who studied the patterns and sequences of protein deposition on surfaces from human blood serum [50]. The result of this process is that the deposit is spatially highly physically inhomogeneous. In the case of the undiluted serum droplet, the thickness of the deposit is such that the absorbance is saturated in the centre and the edges, and the averaged spectrum is dominated by the more transparent intermediate regions. The deposit is also chemically inhomogeneous, as the different serum proteins have differing affinities for adsorption onto the substrate surface which results in varying rates of deposition and, even when diluted, the measured absorbance spectrum is seen to vary significantly across the cross section of the edge, resulting in apparent shifting of peaks, as indicated by the dashed lines in Fig. 1 (Bottom) [33,51].
In Attenuated Total Reflection mode, the FTIR source is not transmitted through the sample, but rather the sampling is by the evanescent wave of a high refractive index crystal, such as diamond or germanium, which extends $1À5 mm into a sample deposited on it [49].
Due to the low penetration depth, problems of saturation of the absorbance by thick samples are largely avoided. Nevertheless, it should be noted that, for dried samples deposited from a solution, the quantitative relationship between the measured absorbance and the analyte concentration in the source solution is rapidly lost, as the deposited layer becomes thicker than the spatial extent of the evanescent field. This is shown clearly in Fig. 2(a), for the case of ATR-FTIR spectra of aqueous solutions and dried samples of glycine [34]. Whereas a linear, Beer-Lambert relationship between measured absorbance and analyte concentration is observed for the aqueous drops, this is lost for dried deposits.
Furthermore, a comparison of the absorbance spectra of dried and solvated analytes clearly illustrates a fundamental phenomenon, that the change in the local molecular environment and potentially conformation, as well as molecular aggregation, between the solvated and dried state can significantly impact on the measured spectrum, as shown in Fig. 2(b), also for the case of ATR-FTIR spectra of aqueous solutions and dried samples of glycine [34].
The issues of sample inhomogeneity can be at least be partially alleviated by micropipetting and/or ensuring the whole drop is sampled, and there have been studies that show excellent specificity/sensitivity of classification of diseased states using ATR-FTIR analysis of dried human blood samples, for cancer detection, recently reviewed by Sala et al [52], including ovarian cancer [53], breast cancer [54], as well as Alzheimer's disease [55,56]. Raman spectroscopic signatures were identified in patients, with and without hepatocellular carcinoma (HCC) [57] and signatures of extensive fibrosis in the liver, characteristic of the early developmental stages of HCC have similarly been identified using ATR-FTIR [58]. Ollesch  brain tumour patients and controls, but also effectively predict tumour grade, highlighting the great potential of ATR-FTIR spectroscopy of human serum for determination of the severity of brain tumours [60].
Notably, in such classification tasks, a quantitative evaluation of any one, or multiple, analytes is not required. In fact, multianalyte, dried serum analysis has previously been reported using mid-infrared spectroscopy in transmission mode, for the simultaneous quantitation of eight serum analytes: total protein content, albumin, triglycerides, cholesterol, glucose, urea, creatinine and uric acid [61]. More recently, measuring samples of whole dried blood to demonstrate the ability of ATR-FTIR identify and quantify malaria parasites, Roy et al. also spiked samples with glucose (0-400 mg/dL) and urea (0-250 mg/dL), and achieved relative RMSECVs of 16 % and 17 %, respectively, using a PLSR analysis [62]. The study demonstrated the capacity of ATR-FTIR spectroscopy to perform multianalyte/disease diagnosis, enabling the simultaneous quantification of glucose and urea as well as malaria parasitemia, using a single spectrum from a single drop of dried blood on a glass microscope slide. Spalding et al. [63]were able to demonstrate quantitative analysis of protein levels in pooled human serum samples spiked with varying concentrations of human serum albumin (HSA) and immunoglobulin G (IgG) using ATR-FTIR spectroscopy of dried samples. Using a validated PLSR method, for the IgG spiked samples, a linearity of R 2 as high as 0.998 and a RMSEV of 0.49 AE 0.05 mg/mL was achieved. To demonstrate the potential for quantification in a clinical setting, analysis of patient samples was performed, yielding R 2 values of 0.992 and a corresponding RMSEV of 0.66 AE 0.05 /mL. Notably, the sample preparation protocol was optimised to the measurement of 10 % diluted, air dried samples. No preprocessing has been applied. Additionally the dotted line represents the contribution of the water to the signal collected from the solution. The spectra have been offset for clarity [34]. The dynamic range of the constituent biochemical concentrations in human serum remains challenging to quantification of their by many techniques. By depleting the abundant high molecular weight proteins, which otherwise dominate the signatures collected, the ability to monitor changes in the concentrations of the low molecular weight constituents is enhanced. The technique of fractionation is currently extensively used, both in general serology [64][65][66] and in vibrational spectroscopic analysis for improved quantitative analysis of low molecular weight biomarkers [33,63,67].
The process of fractionation using centrifugal filtration for improved quantitative analysis of low molecular weight biomarkers using ATR-IR spectroscopy was specifically explored by Bonnier et al. [33,34,41]. A single filtration using, for example a 10 kDa filter, produces two fractions; a fraction which remains in the filter, of molecular weight greater than 10 kDa, which is concentrated by a factor of 5-10, and a filtrate, containing the low molecular weight fraction <10 kDa. Using a cascade of multiple filters (e.g. 100 kDa, followed by 50 kDa) results in multiple concentrates and filtrates, for analysis of both the high and low molecular weight fractions [37]. Initially employing glucose as a model spike in human serum, it was demonstrated that fractionating the serum prior to ATR-FTIR spectroscopic analysis of dried samples can considerably improve the precision and accuracy of quantitative models based on PLSR [41]. In the case of patient samples, previously clinically screened for glucose levels, the Root Mean Square Error for the Validation set (RMSEV) was improved by a factor of 5 following fractionation, yielding an average relative error in the predictive values of $3 mg/dL (less than 1%).
Currently available clinical techniques offer significantly higher performance in glucose level monitoring, with a standard deviation of 0.72 mg/ dL (0.04 mmol/L), and therefore it is difficult to conceive of ATR-FTIR achieving better results. However, an RMSEV of 3.1 +/-0.13 mg/dL, places the precision of the approach of ATR-IR anslysis after fractionating the human serum into a clinically relevant range of concentrations, enabling, for example, the identification of patients with abnormal glucose levels (either hypo-or hyper-glycaemia), in a rapid, low cost fashion.
It should be noted that, because of the low sampling depth, it is also possible to use ATR-FTIR for measurement of analytes in water solution. Fig. 3 shows the example of gelatin in water solution, chosen as a model compound, measured by ATR-FTIR [33]. Although the water absorption masks that of the analyte in the high wavenumber regime (3800À2800 cm À1 ), gelatin features are clearly discernible in the fingerprint region, from $1600À1000 cm À1 , emerging with increasing concentration. The strength of the absorption follows a Beer-Lambert like behaviour, as a function of concentration, over the range of concentrations studied [33], as shown also for the case of aqueous solutions of glycine, in Fig. 2 (a) [34]. However, using the example of glycine spiked in serum, Bonnier et al. demonstrated that removal of water, by drying, and the HMWF, by centrifugal filtration, leads to an improvement of the sensitivity of the technique which allowed to identify blinded concentrations of glycine up to 50 times lower compared to measurements in the unfractionated liquid form [34].
Raman spectroscopy is potentially more amenable to the quantitative measurement of analytes in aqueous solutions, as the water bands are  Lyophilized gelatin and G: gelatin 10 mg/mL (2 Â 150 s -X60 water immersion objective) after water subtraction (intensity x8 for comparison). Spectra have been offset for clarity. II: Raman spectra recorded in the high wavenumber region of gelatin solutions at a concentration of 200 mg/mL (pink), 100 mg/mL (green), 25 mg/mL (red) and 10 mg/mL (blue). For comparison, a spectrum of distilled water has been added (light blue) No offset or correction has been applied [33]. relatively weaker, in comparison to IR. As an illustration, Bonnier et al. performed a direct comparison of the techniques in aqueous solutions of gelatin, human serum, and centrifugally filtered human serum (Fig. 4) [33]. The ability to measure biomolecules in their native environment eradicates potentially detrimental effects of aggregation on the spectral profile, the need for spatial averaging over the area of a dried deposit, and problems with saturation in optically thick samples. Measurement can be performed by focussing the Raman source laser into the fluid (rather than onto the surface) and long focal depths sample increased number of analyte molecules, although there is a trade-off with the collection efficiency of the object, which decreases with numerical aperture, in the backscattering geometry. In a study to optimise the spectral acquisition process, significant improved signal to noise/background was achieved using an inverted geometry, in which the light was delivered to (and collected from) the liquid sample on a CaF 2 disc by a X60 immersion objective, via a water droplet. Centrifugal filtration of the serum using a 3 kDa filter concentrated the bulk of the serum constituents, further enhancing the spectroscopic signatures (Fig. 5). Parachalil et al. adapted the set-up by introducing a commercial, cover slip (0.16À0.19 mm) bottomed vesicle (Lab-Tek plate) as the substrate substrate [28]. The use of glass precludes the use of a 785 nm laser [68,69], and thus a 532 nm laser was chosen as the source This set-up also has the added advantage of providing high quality, consistent Raman spectra from a sample volumes as low as 1 mL.
As a direct comparison of the relative merits of the techniques, Parachalil et al. [35] reproduced the sample fractionation protocol of glucose spiked human serum and patient samples, and applied the same PLSR analysis algorithm of the ATR-FTIR study of Bonnier et al. [41], to Raman analysis of liquid samples. Measured in its aqueous form, the primary background to the 10 kDa fraction of the (spiked or patient) serum is that of water, which was removed using an adapted EMSC algorithm [44] and the reference spectrum of glucose, measured in a high concentration (>45 mg/mL) glucose solution (Fig. 6B). Also shown in Fig. 6A is the reference Raman spectrum of urea, measured in a highly concentrated solution (>100 mg/mL), the most prominent feature of which ($1000 cm À1 ) is in close proximity to the strong features of Fig. 5. Raman spectra recorded from human serum using a Horiba Labram HR 800. I: A: Using the inverted set up after centrifugal filtration; B: In the standard upright position using a X60 immersion objective and C: Using the inverted microscope couple to the immersion X60 objective. II: Examples of different settings used for the analysis of human serum using Labram HR 800. A: in the standard upright position; B: In the standard upright position using a X60 immersion objective and C: Using the inverted microscope couple to the immersion X60 objective. III: A: Raman spectrum recorded using the inverted set up from distilled water; B: Raman spectrum recorded from the Human serum; C: Raman spectrum recorded from the Human serum after ultrafiltration using the Amicon ultra-0.5 centrifugal filter devices 3 kDa. No offset or correction has been applied [33].  [37].
glucose, between $1000 and 1200 cm À1 . Although excellent PLSR models were achievable for aqueous solutions of glucose, over the extended concentration range of 0À1000 mg/dL, in the analysis of glucose content of the spiked human serum, interference of urea features in the PLSR analysis was clearly evident, and a satisfactory PLSR model for glucose was only achievable for the reduced spectral range of 1030cm À1 to 1400 cm À1 . Fig. 7 illustrates the results of (A) the EMSC correction of Raman spectra of the 10 kDa filtrate of the human patient serum samples, (B) the process of varying the number of latent variables towards optimisation of the PLSR model, (C) the PLSR model regression co-efficient, (D) the PLSR model which shows (spectrally) predicted versus clinically measured patient glucose levels. The spectral features of the regression co-efficient match well the features of the reference Raman spectrum of glucose (Fig. 6B), a critical consideration for confidence in the analysis process. Table 1 compares the results of the analyses, clearly indicating that Raman analysis provides a considerable (almost factor of 2) improvement over the equivalent ATR-FTIR measurement. A potentially more important differentiator between the two techniques is the requirement for drying in the ATR-FTIR technique, adds considerably to the sampling time and, of particular importance in a clinical context, the complexity of the workflow [67,70].
It should be noted that a similar direct comparison of the techniques of mid-IR and Raman spectroscopy, together with multivariate data analysis, for the quantitative analysis of the serum of 247 blood donors was previously performed by Rohleder et al. [71]. The IR analysis was undertaken on dried droplets, whereas the Raman analysis was of the liquid serum and/or serum filtrates. Under their investigation for the quantification of glucose, urea, uric acid, LDL cholesterol, HDL cholesterol, total protein, cholesterol and triglycerides, Raman and mid-infrared spectroscopy delivered similar accuracies for the prediction of physiologically relevant analyte levels, for the example of glucose, RMSEP of 14.7 and 17.1 mg/dL for mid-infrared and Raman spectroscopy, respectively [71]. As long ago as 1999, Berger et al. undertook a Raman microscopic analysis of whole human blood and serum samples in liquid form, to simultaneously quantify the content of six target analytes, namely glucose, cholesterol, triglyceride, urea, total protein and albumin [72] For the case of glucose, an RMSEP of 26 mg/dL was achieved. However, in neither case was fractionation of the serum undertaken to deplete the HMWF analytes, and comparison of the results with those of

Table 1
Comparison of the results of ATR-FTIR [38] and Raman spectroscopic analysis [32] of patient sample set for monitoring the glucose levels.  Table 1 clearly demonstrate the benefits of serum processing. More recently, Parachalil et al. further explored the suitability of Raman microspectroscopy as a bioanalytical tool, when coupled with fractionation and multivariate analysis, to analyse and quantify constituent components of both the HMWF (total protein content, g globulins and albumin) and LMWF (urea and glucose) of human patient serum, in the native liquid form [37]. Multiple centrifugal filtration steps were carried out, as illustrated schematically in Fig. 8, and the Raman spectra of the resultant processed serum samples are shown in Fig. 9.
Using PLSR analysis, the g globulin and total protein analysis models, based on unfiltered patient serum, produced R 2 values of 0.88 and 0.82, and RMSECV of 126 mg/dL and 115 mg/dL, respectively. Post fractionation of the patient serum samples by centrifugal filtration using 100 kDa and 50 kDa filters, a similar analysis produced an R 2 value of 0.91 and RMSECV of 90 mg/dL for albumin, which is comparable to the values previously reported for a model of aqueous solutions of albumin over a similar concentration range [28]. In the case of urea, R 2 and RMSECV values of 0.92 and 1.73 mg/dL were achieved for the low molecular weight (<10 kDa) filtrate of patient samples, when the full spectral range of 400À1800 cm À1 was employed. Reducing the spectral range of the analysis to 800 cm À1 to 1030 cm À1 considerably improved the prediction accuracy and sensitivity, resulting in an R 2 value of 0.97 and RMSECV of 1.14 mg/dL. Table 2 summarises the results of the analyses. Notably, using 532 nm as source, the spectrum of the whole serum is dominated by features of carotenoid compounds (Fig. 9). Subtracting the carotenoid contributions using the adapted EMSC protocol, the total carotenoid content could be also be quantified, although in the study, total carotenoid content was not available as a clinical parameter for PLSR analysis.

Viremia
In addition to pathologically related variations of intrinsic constituents of blood, detection, monitoring and quantification of exogenous factors or agents is a potentially valuable clinical application of vibrational spectroscopy, and, in this context, the screening of viral infections may be a valuable strategic target. Although a virus replicates only inside the living cells of an organism, it can be transported readily throughout the body via the bloodstream (viremia), and in the case of, for example, the malaria parasite and dengue virus, infection is via the blood stream. Infection can also lead to the upregulation of several biomolecules, transported in the blood, which are commonly used as diagnostic markers [73]. Crucially, in addition to the detection of a viral infection, quantification of viral load can be of critical importance for monitoring disease progression, in terms of the health of the individual, and containment of the outbreak. Serology based methods, Enzyme Linked of Immunosorbent Assay (ELISA), which targets specific antibodies arising in the blood from the response of the immune system due to the viral infection, or more direct virus detection based methods such as Polymerase Chain Reaction (PCR), are established clinical techniques, but these modalities are usually complex, lengthy and costly [74][75][76][77], suggesting a potential niche area for label free spectroscopic analysis.
There have been several reports of (Raman and IR) spectroscopic signatures associated with human papilloma virus (HPV) infection, particularly relevant for cervical cancer screening [78,79]. In HPV infected cell lines, a semi-quantitative correlation was drawn between expression of the protein p16 and level of viral infection level, and a PLSR predictive model was developed, based on the FTIR signatures [80], although it is notable that no signatures of the virus itself were unambiguously identified. In blood based analysis, very notable and eyecatching translational work by the group from Monash University, based on the use of ATR -FTIR spectroscopy for the determination of malaria parasitemia in whole blood samples, has been recently demonstrated in field trials in austere environments, proving the robustness and capability of serum biofluid [62]. Fig. 8. Schematic overview of steps in fractionation of patient serum samples to separate g globulin, albumin, and urea/glucose [37]. Saade et al. [81], demonstrated the potential of Raman spectroscopic analysis of liquid sera samples to discriminate between normal and hepatitis C patient samples. Ditta et al. [82] used a similar approach to differentiate spectral markers of low, medium and high hepatitis C viral loads. Saleem et al. [83] reported a Raman spectroscopic analysis of dengue virus infection in (dried) human blood serum samples using Raman spectroscopy, and although a PLRS model was constructed, with a reported RMSEP of 0.0099 and correlation coefficient (R 2 ) of 0.9998, it was based only on normal/infected, rather than a continuous variation of viral load. Khan et al. [84] used Raman spectroscopy, coupled with support vector machine analysis, to diagnose dengue infection based on positive IgM status, in samples of human serum dried on glass slides, with a high degree of accuracy (85 %) and precision (90 %), and corresponding sensitivity and specificity of 73 % and 93 %, respectively.
In a more quantitative approach, Bilal et al. [85] performed a Raman spectroscopic analysis of dengue infected human serum samples in liquid form, and, based on PLSR analysis of data of IgM positive and normal human serum samples, established a prediction model with a correlation co-efficient R 2 of 0.929 and RMSECV for the entire predicted values of $10 %. Using the receiver operating characteristic curve (ROC) approach, an accuracy of 96.67 %, a sensitivity of 90 % and a specificity of 100 % were reported. In a similar approach, using Raman spectroscopy coupled with PLSR analysis, Nawaz et al. [86] reported an RMSEP of 0.2 %, for hepatitis C viral load in liquid samples of human blood plasma, as measured by the current clinical gold standard PCR. Although the study was only performed on a patient cohort of 10, and the methodology optimisation not explored extensively, the result justifies further exploration of the technique for clinical translation.
The potential role of vibrational spectroscopy in the control and monitoring of pandemic outbreaks is (at the time of writing) a very topical proposition [87]. In the current case of COVID 19, the transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is understood to be largely respiratory, and so initial population screening of, for example, nasopharyngeal aspirate may be most appropriate. However, post infection screening for blood borne antibodies could play a critical role in reintegration into society, post quarantine.

Therapeutic drug monitoring
A further potential application area of clinical relevance for vibrational spectroscopy is that of therapeutic drug monitoring (TDM). TDM refers to the clinical practice of management of a patient's drug dosage within a targeted therapeutic window, based on measurement of concentration of the drug in the bloodstream at timed intervals. For drugs with a narrow therapeutic range, such monitoring is essential to provide individualised patient treatment, while maintaining the efficacy of drugs and minimising drug toxicity and related adverse effects [25,88]. TDM has also been increasingly advocated to improve the standard of chemotherapy, in which side effects can be substantial and life threatening [89][90][91][92]. The currently employed technique of chemotherapeutic-dosage calculation, based on dose intensity and body surface area, has been reported to be inaccurate for patients undergoing sustained chemotherapeutic treatment [93,94]. In an era of rapidly increasing healthcare costs, it is desirable to explore rapid, sensitive, and costeffective, point-of-care techniques for TDM, which could quantitatively measure the serum concentration of drugs, such that the dosing strategy can be tailored to the metabolism of an individual patient, for a personalised therapeutic regime.
As a proof of concept study, the standardised, optimised methodology of Parachalil et al. [35] was applied to determine the Limit of Detection (LOD) and Limit of Quantification (LOQ) for TDM in spiked human serum, using the examples of Busulfan (Bu), a cell cycle non-specific alkylating antineoplastic agent, and Methotrexate (MTX), a chemotherapeutic agent and immune system suppressant [38]. The Raman spectra of aqueous solutions of the respective compounds, as well as their chemical structures, are depicted in Fig. 10 (A and B). The drug concentration ranges added to the human pooled serum as spikes were chosen to encompass the recommended therapeutic ranges and toxic levels in patients [38]. Centrifugal filtration of the spiked human pooled serum with 10 kDa filters efficiently recovered the drug in the filtrate, prior to performing Raman analysis. Predictive models were built by using PLSR, the regression co-efficients of which matched well the Raman spectra of the compounds, as shown in Fig. 10 (C &D), and values of LOD and LOQ were calculated directly. The LOD calculated for Busulfan was determined to be 0.0002 AE 0.0001 mg/mL, 30-40 times lower than the level of toxicity, enabling the application of this method in target dose adjustment of Busulfan for patients undergoing, for example, bone marrow transplantation. The LOD and LOQ calculated for Methotrexate were 7.8 AE 5 mM and 26 AE 5 mM, respectively, potentially enabling high dose monitoring.

Discussion
In the quest for strategic, target applications of vibrational spectroscopy with achievable clinical translation, there has been much debate over the relative merits of infrared absorption and Raman scattering spectroscopies [1,2]. While issues of intrinsic spatial and spectral resolution may be relevant to histo and cyto pathological applications, the main consideration for applications to biofluid analysis is that of the aqueous environment. The vibrational spectrum of water is dominated by combinations of symmetric stretch (v 1 $3250 cm À1 ), asymmetric stretch (v 3 $3450 cm À1 ) and bending (v 2 $1640 cm À1 ) of the OHÀ bonds (Fig. 3), whose spectral distribution can be modulated by local hydrogen bonding interactions [95]. The broad bands overlap with, and can obscure, significant features of relevant biomolecules (e.g. Amine A, CH 2 , CH 3 , OH in the high wave number region, and the Amide I band in the low wavenumber region). OÀH bonds are highly polar, and are thus very much more IR than Raman active, and therefore, in the frame of reference of the water, Raman spectroscopy is significantly more sensitive to the constituent components of biofluids than IR spectroscopy. It should be stressed, however, that IR measurement, and indeed quantitative analysis, in the aqueous environment is not impossible [33,34], but it has been demonstrated that the quantitative analysis, of fractionated serum samples by ATR-FTIR, is improved by precipitation of the analyte in the form of a dried deposit [41].
Significant progress has been made towards the clinical translation of ATR-FTIR platforms for diagnostic applications, based on analysis of dried plasma/serum, some notable examples being for brain cancer (recently reviewed by Sala et al. [52]), malaria [96] and Alzheimer's disease [55,56]. The potential of the technique is aided by increasingly sophisticated data analysis techniques, based, for example on machine learning algorithms [54], and the feasibility of ATR-FTIR analysis for clinical screening of human serum for brain tumour diagnostics has been subjected to a health economics study [97]. The process is readily adaptable to automation, and tailor made, disposable and low-cost microfabricated silicon substrates for use as ATR elements have been employed to batch process serum sample measurements for cancer diagnostics, in a clinical validation study [98].
Although ATR-FTIR of dried serum deposits has been demonstrated to perform well for classification applications, the technique struggles in terms of quantification of constituent analytes. As illustrated in Fig. 2(a), the quantitative (Beer-Lambert) relationship between the measured absorbance and the analyte concentration in the source solution is lost as the deposit thickness becomes comparable with, or exceeds, the spatial extent of the evanescent wave. The characteristics of the spectrum of the constituent biomolecules themselves are also perturbed by the change in the local environment, causing bathochromic and hypochromic shifts, as shown for the example of glycine, in Fig. 2(b). Moreover, the degree to which both these effects influence the measured spectrum can vary across the extent of the inhomogeneous deposit (Fig. 1). Averaging over an entire deposit serves to hide the effects, rather than eliminate them.
The arguments for the quantification of analytes in their native aqueous environment therefore favours the use of Raman spectroscopy. A number of studies have demonstrated the potential of the technique for quantification of both high and low molecular weight serum constituents, in the native aqueous environment, with a degree of accuracy at least as high as, or higher, than IR analysis of dried samples [35,71,72,99]. For both IR and Raman analyses, serum fractionation, to reduce the dynamic range of molecular weights of the constituent analytes, has been demonstrated to significantly enhance the sensitivity and accuracy of the measurement [33,34,37]. Fractionation by centrifugal filtration also increases the concentration of the fraction retained in the filter, improving the sensitivity of liquid based measurements [28]. In the application of such techniques, however, it is important to ensure the fidelity of the process, in terms of separation of constituent biomolecules purely based on their molecular weight. In a simulated plasma protein mixture, fractionation using 100 kDa filters was seen to be ineffective in separating the poorly soluble fibrinogen fraction from the albumin and lower molecular weight proteins, necessitating the use of alternative fractionation techniques, in this case, ion exchange chromatography [28]. Bonnier et al. demonstrated the efficacy of the process of separating high and low molecular weight constituents by centrifugal filtration of human serum using a 10 kDa device [34]. Nevertheless, although they have a molecular weight of < 10 kDa, the strong carotenoid signals observed in the Raman spectra of patient serum samples, using 532 nm as source, were still evident in the concentrate of 50 kDa filtration, and undetectable in the 10 kDa filtrate (Fig. 9 [37]). Albumin has been shown to have a high affinity for carotenoids, for which it plays a key role against oxidative degradation [100], as well as some chemotherapeutic agents such as doxorubicin [101], for which it aids in solubilisation and transport. Therefore, although such centrifugal fractionation techniques are commonly employed in serology [64][65][66], it is important to validate their fidelity using, for example simulated spiking experiments [37], and exploration of alternative fractionation techniques such as on exchange chromatography [28] is warranted.
Raman microspectroscopy for analysis of aqueous human serum offers a number of experimental options, towards optimisation of the technique. Bonnier et al. undertook a systematic study of different microscope geometries, concluding that an inverted geometry, using a X60 water immersion objective to focus a 785 nm laser source through the underside of a CaF 2 substrate gave the best performance [33]. A water droplet between the objective and substrate provided improved optical coupling of the source laser to the sample, and the backscattered Raman signal. The laser focus in the liquid sample is beyond the plane of the substrate, and the limited focal depth means that sample volumes as low as 1 mL can be measured. Parachalil et al. demonstrated that the CaF 2 substrate could be replaced by a commercial, glass coverslip (0.16À0.19 mm) bottomed vesicle (Lab-tek plate), more amenable to clinical translation, albeit using a 532 nm as source [28], to avoid the intrinsic fluorescence of glass at 785 nm [68,69]. A similar set up was employed by Medipally et al. [102], who used either 532 nm or 785 nm for analysis of blood plasma in a coverslip bottomed 96 well plate, suggesting that fluorescence from the glass is not an issue, under these conditions. They also demonstrated automated analysis of multiple samples, and therefore the potential for a higher throughout technique.
Choice of wavelength for Raman spectroscopy can also be influenced by the analyte of interest. Although the majority of biomolecules are transparent in the visible region of the spectrum, and so do not absorb or fluoresce, some are intrinsically stronger Raman scatterers than others. The complementarity of IR and Raman spectroscopy is based on the fact that, while highly polar bonds (e.g. OÀH) are very strong in IR spectra, polarisable moieties are strong in Raman spectroscopy [103]. This is particularly the case for p-conjugated (unsaturated) bonds, and thus the sharp phenyl alanine vibration at $1004 cm À1 is seen prominently in any Raman spectrum of tissue or cells, although it is not the predominant protein residue. As the conjugation length is extended across sequential -C = C-bonds, for example in carotenoids, the polarisability increases, and with it the Raman cross section [104]. As the conjugation increases, so too does the wavelength of the lowest energy optical absorption (HOMO-LUMO), giving colour to the biomolecular constituents of skin (melanin), blood (haemoglobin), and blood serum (carotenoids). When the wavelength of the source laser matches, or is in close proximity to, the absorption of an analyte, the Raman scattering is resonantly enhanced, by up to several orders of magnitude. Thus, for example, residual blood traces have been seen to dominate the 532 nm Raman analysis of cervical smear samples [105], and carotenoids blood serum samples [37,102,106]. Linear conjugated chains have strong electron vibrational coupling, and therefore do not fluoresce strongly, and so such resonance enhancement can be used to selectively enhance the Raman response of the analyte of interest, or alternatively longer wavelengths can be chosen to avoid the dominance of analytes which are not of interest. If the spectrum of an unwanted interferent is known, it can be "digitally" removed, as previously demonstrated for wax in tissue [107,108], glass substrate contributions [68], and carotenoid signals in blood serum [37]. Analysis of carotenoid content in serum may, however, be of interest for analysis of, for example, retinal health, characterisable by the carotenoids lutein and zeaxanthin [109,110].
Proximity of the laser source to the absorption resonance of an analyte can, however, be a double edged sword. Without considering any potential photochemical effects, strong absorption of an analyte present in high concentrations can result in a depletion of the laser intensity before it comes to a focus, and a reduced Raman signal, which can also be reabsorbed by the analyte, resulting in a loss of the (quantitative) linear relationship between the measured Raman signal and the analyte [111]. Excited chromophores can also re-emit light as fluorescence or phosphorescence, which can add significantly to the Raman background, at times swamping the signal. For wavelengths above $500 nm, no such chromophores exist amongst the endogenous constituents of human serum, and no strong fluorescence is observable.
For applications of Raman spectroscopy for quantification of exogenous agents, however, judicious choice of wavelength may be appropriate. In the case of viremia, the virus itself is not unambiguously identified, although it has been demonstrated that viral load can be quantified on the basis of the changes to the spectral profile of the patient serum. It may therefore be that the technique is sensitive to the effect, rather than the root cause. Choice of wavelength is therefore subject to the same considerations as analysis of endogenous serum constituents. However, an important consideration for the analysis protocol in this case is the regression algorithm. In cell cultures exposed to the drug cisplatin, it was demonstrated that PLSR using targets of the drug concentration gave different results to regression against the (sigmoidal) resultant cell viability, interpretable as the effects of the initial chemical binding of the drug in the nucleus, and the subsequent cellular response, respectively [112,113]. In its simplest form, PLSR assumes a linear relationship between regression target and spectroscopic response, but in monitoring physiological effects in human serum as a result of exposure to exogenous agents, more sophisticated data mining methodologies may be [114,115].
In the case of TDM, choice of wavelength can be dependent on the drug in question. The LOD and LOQ of an analyte in aqueous solution is dependent on the baseline signal of the "blank", with no analyte, and the strength of the Raman signal of the analyte. Molecules such as doxorubicin are highly absorbing and fluorescent at 532 nm, and so should be measured at longer, off-resonant wavelengths such as 785 nm [116,117], whereas shorter wavelengths are appropriate for other chemotherapeutic agents such as busulfan and MTX [38], and choice of wavelength could be chosen to decrease the LOD to clinically applicable ranges.
Surface enhanced Raman spectroscopy is often cited as a route towards significantly improved quantitative analysis of low molecular weight analytes. Bonifacio et al. have examined in detail the applications of colloidal SERS to analysis of human plasma and serum, and have concluded that, due to adsorption on the surface, preventing the formation of hotspots, intense and repeatable spectra are obtained only if proteins are filtered out from samples [118]. Nevertheless, the technique may be appropriate for highly sensitive detection and quantification of small blood borne molecules, and its potential role in TDM was reviewed in 2016 [39]. Using commercial SERS substrates, quantification of imatinib in blood plasma over the concentration range of 123-5000 ng/ mL has been demonstrated [119]. However, although it has demonstrated promising results in detecting drugs at low concentrations in biological matrices [40,90], qualitative variations within the SERS substrate, and interference of other biomolecules with the spectra, makes quantification in clinical samples a challenging task [39].
Note, however, that for the study of busulfan and MTX, the values of LOD and LOQ calculated are within therapeutically relevant ranges, encouraging for the potential for clinical translation [38]. Note also, that the fundamental rationale for TDM is to monitor the pharmacokinetics of the drug, due to absorption, distribution, metabolism, and excretion, the rate of which can vary significantly from individual to individual. Monitoring the rates of change of the serum levels of a drug, rather than the lowest depleted levels, may therefore be sufficient for effective TDM [120].
In the case of proof-of-concept Raman spectroscopic analyses, the majority have been performed using high specification, relatively expensive, research grade, microscope based instruments. In terms of initial capital costs, and field deployability, this makes the approach of Raman potentially less attractive than ATR-FTIR. Nevertheless, more compact, less expensive, but similarly high specification systems are becoming increasingly available, which would reduce the initial capital cost from hundreds, to less than a hundred, thousand Euro. Furthermore, for measurement in liquid form, the microscope serves only as a light delivery and collection vehicle, and significantly lower cost fiber based and "portable" options may be viable alternative, reducing capital costs to sim;euro;10À20k, comparable to those of compact ATR-FTIR instruments. Indeed, fiber based [121] and on-chip waveguide [122] systems potentially offer routes towards further enhancement of signal to noise ratios, and measurement sensitivity. The techniques of IR and Raman spectroscopy, in different manifestations, may therefore be equally viable candidates for clinical translation in terms of initial capital outlay, both are bench top techniques, requiring no specific services in terms of voltage supply or cooling, each entails similarly minimal sample preparation, and in an appropriately designed user interface, could be operated by a nonspecialist clinical technician, or general practitioner. Aspects of automation and increased throughput are, to a greater or lesser extent, similarly applicable to both techniques [52,67,102], although the ability to measure in aqueous environment, without the need for drying, may be significant in terms of clinical work-flow [70]. It should also be noted that applications of such techniques are not limited to human serum, or even to human biofluids, and may have potential in other fields such as veterinary care [123].

Conclusions and future perspectives
Numerous studies have demonstrated the potential for vibrational spectroscopy to quantitatively identify and monitor changes in concentrations of both endogenous constituents of human blood serum, and endogenous agents, such as viruses and drugs, as well as their effects, in both cases, within clinically relevant ranges. Because it lends itself more naturally to measurement in the native aqueous environment, Raman spectroscopy is a more attractive candidate for such clinical serological applications. As is the case for many established clinical methodologies, serum fractionation significantly enhances the sensitivity of the measurement, but independent measurement of high and low molecular weight fractions yields datasets which can be screened for multiple analytes simultaneously, in a label free manner, making it a potentially low-cost, rapid alternative to currently established clinical methods.
Many of the studies to date have been proof of concept, however, and there has been little concerted, systematic optimisation of experimental sampling conditions. Sample preprocessing, Raman source wavelength and data analysis, are all factors which could be optimised to fine tune performance. Patient cohort numbers have been relatively low, and to date the influence of potential confounding factors such as diet and medication have not been considered.
Nevertheless, the multiple proof-of-concept studies demonstrate the potential for clinical translation of quantitative serum analysis using vibrational spectroscopy, warranting larger, clinical scale trials, and indepth assessment of the associated clinical workflow, and health economics.

Declaration of Competing Interest
The authors confirm no competing interests in relation to the manuscript.