Automation of the Peak Fitting Method in Bone FTIR Microspectroscopy Spectrum Analysis: Human and Mice Bone Study

FTIR microspectroscopy (FTIRM) is a commonly used nondestructive method to characterise thin bone sections. However, spectrum analysis methods are often highly sensitive to small variations (e.g., boundary limits), thus implying a time-consuming and redundant analysis process. To solve this issue, software has been developed based on several algorithms to automate the analysis. Furthermore, a rigorous framework has been established concerning the peak fittingmethod to obtain the systematic best potential solution. Validation of the automatic method has been performed by comparison with the manual method. Results and validation proved the reliability of the automatic process.+e developed algorithms provide the means necessary to fully compare the results between bone FTIRM studies and between different laboratories.


Introduction
Bone analysis methods are quite varied [1][2][3], ranging from the noninvasive, e.g., X-ray scanning, to the destructive, e.g., mechanical rupture tests. FTIR microspectroscopy (FTIRM) is a promising field which characterises bone samples, as it is able to provide information on the chemical composition and structural organisation of the analysed matter without damaging it on a local level of a thin section [4][5][6][7][8][9].
is method combines optic, quantum mechanics, chemistry, and biology knowledge, which leads to a complex understanding that might have an impact on the conclusion of an FTIR study [10]. In addition, the spectrum analysis is laborious if done "manually." Several different methods are currently used to analyse FTIRM spectra (peak fitting method [11][12][13], second derivative method [13,14], etc.), and each has its advantages and limitations; moreover, none are considered to be a gold standard. We chose to stick with the peak fitting method used in [11,12]. e second derivative method, although quite promising due to its objective and purely mathematical analysis, is too sensitive to noise (producing peaks that are not there initially) and quite inadequate in the case of peaks that are too close and with different full width at half maximum (FWHM) (the wider peak is then masked by the thinner one). is is why the second derivative method is not chosen (for more explanation, cf. Appendix C).
Traditionally, the peak fitting of a bone spectrum is done by selecting a region of interest (ROI) in the bone spectrum, followed by fitting the curve with a sum of Gaussians. Previously, this fitting was done by assuming both position and characteristics of the peaks (height and FWHM), giving a starting state for the least-squares fitting method. e result of the fitting process was then left to the operator for visual verification. As one can easily imagine, this technique proved to be fastidious, as correct results were typically obtained through several tries. Indeed, the regression may sometimes be highly sensitive and be executed multiple times for one ROI in order to find the correct fitting result.
In a bone spectrum (Figure 1), six distinct regions are identified as the vibrational response of different constituents: (i) e amide (I, II, and III) areas (1300, 1700) cm −1 corresponding to the response of the organic matrix in the bone tissue (with also a ] 3 CO 3 contribution at around 1400 cm −1 ) (ii) e ] 1 ] 3 PO 4 area (900, 1200) cm −1 corresponding to the symmetric and antisymmetric stretching of the phosphates (iii) e ] 2 CO 3 area (800, 900) cm −1 corresponding to the symmetrical stretching of the carbonates (CO 3 ), which also contains a large HPO 4 contribution, which is removed before analysis by using a wisely placed baseline correction (iv) e ] 4 PO 4 area (500, 650) cm −1 corresponding to the antisymmetric deformation of the phosphates (PO 4 ).
We automated the methods used in [9,11,14,15], which are biologically relevant methods (not meant to be spectroscopically accurate), to obtain the five parameters within these regions to characterise the sample: Directly measured on the spectra are as follows:  [5,11].
e manual peak fitting method might have induced subjectivity and imprecision, as there were no strict means to analyse bone spectra. Also, these methods are not always spectroscopically accurate (i.e., there are not always the right number of peaks corresponding to the exact composition of the studied material); however, these methods have been validated and are sufficient to conclude the proportion of compounds in the analysed bone. ese methods might be improved by the careful study and translation of the analysis part into an algorithm implemented into custom software. e aim of this technical note was to offer a solution to the automation process of the peak fitting method used in FTIR spectrum analysis of the bone. Software, i.e., LYOS spectrum analysis, written in Python, has been developed to fully automate the analysis process of healthy human and thin mouse bone section's infrared spectra, even though the software can be used to analyse other types of spectrum.

Materials and Methods
e validation process has been done using the following: ( After dissection, samples were fixed in 70% ethanol for two weeks, dehydrated for 48 h in 100% ethanol, substituted for 48 h in methylcyclohexane (solvent that removes lipids in the samples), and then embedded in methyl methacrylate (MMA). e sections of 2 µm thickness were obtained with a microtome Polycut E (Reichert-Jung, Leica, Germany) and stored between two slides at room temperature. e raw spectra were collected at 1 cm −1 resolution and averaged by 40 scans per scanned region in transmission mode with a Perkin-Elmer GXII AutoIMAGE microscope (Norwalk, CT, USA) equipped with a wide-band detector (mercury-cadmium-telluride) (7800-400 cm −1 ). e instrument used a Cassegrain objective of numerical aperture 0.6. e system has a spatial resolution of 10 µm at typical midinfrared wavelengths. Contribution from air was subtracted from the original spectrum. e commercial software Spectrum (v6.2.0.0055) provided by Perkin-Elmer and Grams/AI 8.0 Spectroscopy Software provided by the ermo Electron Corporation offer various preprocessing options and customisation for the automatic peak fitting process. e mandatory methods used in these software have been implemented and improved for automation in the custom Python software and are described below, such as a transformation from transmittance to absorbance values, a spectrum component removal (to extract the components of interest from a raw spectrum), and a baseline correction method (to remove the high-scale effects of a bone FTIRM measure).

Transmittance to
Absorbance. e FTIRM spectrum of a thin section of the bone is obtained in transmittance. However, the spectrum analysis is conventionally done on an absorbance spectrum. e transmittance spectrum (T, in percent) is then transformed into its absorbance (A) form by the following function: (1)

Spectrum Component Removal.
in bone sections could not be realised without an embedding medium (methyl methacrylate, MMA).
is medium creates a potential infrared-reactive component in the spectrum that must be removed in order to obtain the bone-only spectrum.
is component presents a high absorption peak around 1730 cm −1 (Figure 2). e first step is to scan the MMA to have its pure spectrum in the same range as the bone spectrum (450, 4000) cm −1 .
en, a custom algorithm of optimisation subtracts the MMA spectrum from the bone spectrum (cf. Appendix A). is process is done by selecting the boundaries of the highest MMA peak (1826, 1706) cm −1 between which is computed with the second derivative of the bone spectrum (corresponding to the curvature of the spectrum). An iterative loop consecutively guesses a coefficient by which the whole MMA spectrum is multiplied, and the subtraction result is evaluated on the convexity of the shape of the spectrum. e convexity of a curve is mathematically defined by the positivity of the second derivative. e best result is the one that has the minimum negative part. Each loop will check the negative part of the second derivative of the bone-embedded spectrum and try to minimise it by subtracting the weighted MMA-only spectrum ( Figure 3). e algorithm can be summed up as follows: Find y i which f(y i ) is minimum with y i : the spectrum subtracted by the ith weighted MMA-only spectrum

Baseline Correction.
e baseline of the bone in the (450, 4000) cm −1 is convex. e purpose is to realign this line horizontally. A multilinear baseline is chosen as an approximation of the effects of the interferences (beam spreading, sample heating, etc). Based on the convexity hypothesis, the algorithm used to find the baseline will first detect the local minima using a Savitzky-Golay smoothing [16,17] (keeping intensity and position of the local extrema of the raw spectrum) and draw the segments between two minima containing the most minima above it while forbidding the presence of minima beneath it ( Figure 4).

Peak Fitting.
e peak fitting method is one of the methods used in the conventional FTIRM spectrum analysis [18] (least-squares method with trust region reflective algorithm to find the best suited parameters). e curve fitting function used is curve_fit from the scipy.optimize module version 1.0.0. Usually, only a first guess state for each Gaussian peak is given for the manual method, allowing the peaks to place everywhere with any characteristics (position, FWHM, and height).
is first guess-only method often induces inaccurate results and forces the operator to execute the method multiple times in order to find a valid fit. e automation process is implemented by adding boundaries to find each peak's parameter. ese boundaries are carefully chosen for the solution to be unique (restricting the plurality of solutions induced by the manual method with first state guessing only). Ideally, boundaries should be chosen considering the characteristics of each sample (temperature, internal constraints, crystal properties, etc.). However, which characteristic and how each influences a peak's parameter cannot be answered precisely with our current knowledge.
us, a more empirical approach has been chosen to determine the boundaries of each peak. Based on a set of more than a thousand spectra considered to be well peak fitted, some schemes have been observed. Eight peaks were found in the amide area, five peaks in the ] 1 ] 3 PO 4 area, and four peaks in the ] 4 PO 4 area. Boundaries for each peak's parameters have been established and are registered in Table 1. For each area, the first guess of each peak located at a specific wave number is set at 90% of the absorbance in height and at 5 cm −1 in FWHM. e ranges for the peak fitting of each area are chosen to give mathematically correct results. ey are related to the biologically relevant range given in the introduction part but generally need to be wider, as mathematical information is present beyond the biologically relevant range (for example, the amide area range is (1300, 1700) cm −1 and the mathematical range used is (1300, 1787) cm −1 , the same for the ] 1 ] 3 PO 4 and ] 4 PO 4 ). e algorithm can be summed up as follows: 2 and G j : amplitude of the jth Gaussian, x 0j : position of the jth Gaussian, of the jth Gaussian with respect to the constraints to each Gaussian's parameters explicited in Table 1. e area of interest inside the amide part of the spectrum lies around the (1600, 1730) cm −1 boundary ( Figure 5). Even if (1300, 1600) cm −1 is not used in the measured parameters, it was kept to accurately fit the (1300, 1730) cm −1 area. e main peaks are the dashed-blue peak (∼1690 cm −1 ) and the dashedpurple peak (∼1660 cm −1 ). e 1660 cm −1 peak is highly influenced by the 1633 cm −1 (dashed orange), which is as well influenced by the 1600 cm −1 peak (full green peak).
In bone, the 1633 cm −1 peak must be higher than the local minima at 1600 cm −1 , and the height of the 1600 cm −1 peak greatly influences the height of the 1633 cm −1 . All the parameters are constrained with static values (unchanged regarding the spectrum) (cf. Table 1). It may happen that the MMA peak is not correctly removed due to small differences of the shape of the MMA peak between the pure MMA spectrum and the MMA component in the bone + MMA spectrum. In order to minimise the implications of the residual MMA peak, an additional peak is included for the peak fitting method, which is located at 1745 cm −1 .
is is an automatic process realised when the algorithm detects a significant peak between 1730 cm −1 and 1760 cm −1 (i.e., when the positive area of the derivate between those values is superior to the half of the negative area of the derivate).

e ] 1 ] 3 PO 4 Area (910, 1184) cm −1 .
e two peaks of interest here are the 1030 and 1110 cm −1 peak (full purple and green) ( Figure 6). ese two peaks are mainly influenced by the position of the peaks between 1060 and 1070 cm −1 . Every parameter for this area determination is a static value (cf.  (Figure 7). is peak is mainly influenced by the 575 cm −1 peak. In bone, the 575 cm −1 peak's height must be inferior to the 565 cm −1 peak to avoid the 575 cm −1 peak to spread too much on the 604 cm −1 peak area. Every parameter for this area is a static value (cf. Table 1).

Statistical Analysis.
e validation of the spectrum analysis methods has been done using the root mean square of the coefficient of variation to assess the variation in manual methods, with the Bland-Altman method to compare manual and automatic results. Note. An iteration is not always better than the previous one (e.g., iterations 1 and 2), but it tends nonetheless to converge to the expected result (e.g., iteration 22), which is the convexity of the spectrum between the 1706 cm −1 and 1826 cm −1 boundaries. e final result of the MMA removal can be observed in Figure 1 (the MMA contribution, including the 1730 cm −1 peak, in the bone spectrum is removed).

3.1.
Validation. e first step of the validation process is done by analysing the precision of the manual peak fitting method. Two types of precision are measured: (1) the intraoperator precision that measures the ability of one operator to obtain the same result multiple times on the same spectrum and (2) the interoperator precision that measures the ability to get the same result from two different operators on the same spectrum. e first operator (operator 1) is experienced with the bone FTIRM peak fitting, and is considered the reference. e second (operator 2) is an inexperienced operator who has been given the instructions to execute the peak fitting analysis.
Five different bone spectra have been peak fitted with the manual method two times by each operator (cf. Table 2). For the manual peak fitting, the root mean square of the coefficient of variation shows that results are operator-dependant. en, we compared both automatic and manual methods based on the spectra extracted from the human samples and    Journal of Spectroscopy mouse sample. e Bland-Altman graph plots the means and the differences between automatic and manual methods measured by a human operator. For the human bones, Figure 8 shows, at first sight, an almost null shift for every parameter. e ratio of valid results (i.e., results between the mean ± 1.96 * SD) is 92% for the Crystallinity Index, 93% for the Mineral Maturity, and 92% for the Collagen Maturity. For the mice tibia (Figure 9), the shift is also almost null for every parameter. e ratio of valid results under the aforementioned hypothesis is 90% of valid results for the Crystallinity Index, 100% for the Mineral Maturity, and 100% for the Collagen Maturity.

Discussion.
Gaussians have been long used to establish methods and peaks of interests in human bone, meaning changing the fitting function would end in a change of the peaks of interest. us, to maintain consistency with previous studies, Gaussians are kept for the fitting method. However, it should be noted that Lorentzians or Voigt functions would be more accurate regarding the quantum reality of the spectra [19]. e development of the automatic method not only allowed a significant gain of time but also suppressed the interoperator and intraoperator variability.
Currently, there is no gold standard to check the exactitude of the results obtained. After careful examination of the automatic peak fitting of these spectra, it has been found that the results obtained with the automatic method are valid compared with the analysis done by the expert using the manual method. e observed shifts are mainly due to the choice of boundaries within which the peak fitting method is applied. Indeed, there are no strict boundaries required for the manual peak fitting of the ] 4 PO 4 area. is induces the variation of the shift, as shown in Appendix B. is problem is solved by the automatic method that fixes those boundaries, chosen after careful analysis of several sets of boundaries.
We could argue that it is not the most exact method in terms of localisation of the peaks, compared with the manual method (which is not a gold standard method and can be different from one laboratory to another). However, the main advantage provided by the automatic peak fitting is the absence of intraoperator and interoperator variation and reproducibility. e second advantage of the automatic process is the normalisation of its preprocessing methods (MMA removal and baseline subtraction), removing the influence a human operator has in the spectrum preparation. Finally, the automatic method reduces drastically the time needed to analyse a spectrum (10 s is required for the automatic process, while the manual process takes 5 to 10 min per spectra).

Conclusions
e algorithms described in this paper offer the possibility to analyse quickly an infrared spectrum of the thin bone section.  is automatic method will allow the analysis of large sets of samples, which confer robust results.
By using machine learning methods on large data of deconvolved FTIRM spectra, the automatic method could be improved. is could bring specificity of peak fitting parameters regarding the type of the sample analysed, thus improving the exactitude and reliability of the results. e software will be available under GNU GPL license.

A. Simple Algorithm of Spectrum Component Removal
Variables used in the algorithm ( Figure 10): OrigSpectrumData: raw MMA-embedded bone spectrum. RemSpectrumData: MMA-only spectrum.  Table 2: Differences of the manual peak fitting analysis intraoperator, interoperator, maximum absolute error, and variation of results using the automatic method for each parameter, based on the analysis of 5 bone spectra. Operator 1 is an experienced operator, and operator 2 is an inexperienced operator to whom has been given the instructions to analyse the spectra. Journal of Spectroscopy 7 FinalSpectrumData: bone spectrum (removed from the MMA component).
MainPeakBoundaryInf: inferior boundary of the position of the main MMA peak.
MainPeakBoundarySup: superior boundary of the position of the main MMA peak.
Coef: coefficient used to weight the MMA-only spectrum. CoefSaved: the current best coefficient used to weight the MMA-only spectrum.
Delta: increment for the coefficient. Area: area of the negative part of the second derivative of OrigSpectrumData − RemSpectrumData * coef.
AreaSaved: current minimum area. Precision: maximum precision for the coefficient.
MaxLoop: maximum number of loop. n: current number of iteration.

B. An Example of the Choice of the Region of Interest Boundaries for the Same Spectrum for the Crystallinity Index
In Figure 11(a), if we choose the (536, 634) cm −1 boundaries, compared to the (531, 659) cm −1 (Figure 11

C. Searching the Position of Underlying Peaks in a Spectrum: Commentary on the Method of Second Derivative
e purpose of this study is to balance advantage/ disadvantage of the method using the second derivative for the analysis of spectra. e second derivative of the Gaussian function (d 2 /dx 2 (a * e −(x−b) 2 /2c 2 )) gives a peak located at the same abscissa for the maximal amplitude, but its pseudofull width at half maximum (FWHM) is around 0.53ω0 (ω0 is the FWHM of the original Gaussian).
e main advantage is that peaks are well discerned using the second derivate compared to the original Gaussian, and then, this value (0.53 ω0) permits to better discriminate peaks in spectra. An example is given in Figure 12. Taking a Gaussian (blue) with FWHM � 1, after calculation of the second derivative of this function (green), we can see that the peaks are located at the same abscissa, and the second  Figure 10 8 Journal of Spectroscopy derivative also shows a thinner peak ( Figure 12, an example of a Gaussian with its FWHM � 1 and, for visualization, its second derivative normalized by the maximum of the Gaussian). However, the second derivative brings up two major disadvantages: e emergence of two perturbations from either side of the main peak (cf. the 2 small green peaks around the main one). e maximal amplitude of the second derivative (A″) depends not only on the maximal amplitude of the original Gaussian A, but also of its FWHM (A″ ≈ 2.77 * A/ω0 2 ). In other words, thinner peaks in the original spectrum will be more present and visible in the second derivative than wider peaks. In some cases, it may happen that the wider peak in the original spectrum will be hidden (in the second derivative) by the thinner one ( Figure 13).
If we use more complex spectra, the discrimination of peaks becomes more difficult. An example is given in Figure 14. As previously mentioned, the 5 original Gaussian curves (pink) are summed into one spectrum (blue). e second derivative (green) curve shows only 3 minima, which corresponds to the thinner or isolated original peaks. e position of the peak is also slightly shifted. In an ideal analysis, we should get 5 minima corresponding to the 5 original peaks (in abscissa).
A simple example is given in Figure 13. Two simple original Gaussian curves (dotted pink, vertical dotted line showing the position of the maximal amplitude for each) are summed into one spectrum (plain blue line). e second derivative of this spectrum is shown in green. As we can see, this method does not allow the obtaining of the 2 initial pink peaks. One of the wide pink peaks has been "hidden" after using the second derivative method.
And finally, if we introduce noise and a Savitzky-Golay smoothing (commonly find in the spectra treatment), other peaks start to appear and the position of peaks slightly shifts ( Figure 15).  -20 -10 -5 0 5 10 Figure 13: Sum of two Gaussians (blue), its second derivative (green), and the individual peaks (pink). It shows that only one peak is found. e wider peak is hidden in the right part of the second derivative. Journal of Spectroscopy An example is given in Figure 15. As previously mentioned, the 5 original Gaussian curves (pink) are summed into one spectrum. is curve is degraded by adding noise, and then, a smoothing is applied (Savitzky-Golay smoothing), to correspond to the "classical" spectra we obtain in the bone (blue curve). e second derivative (green) curve shows more minima (9) than the original Gaussian curves. e position of the peak is also slightly shifted.
e method second derivative is applied to our bone sample which is shown in Figure 16.
One of the domains of the bone spectrum of the bone is shown between 900 and 1200 cm −1 (phosphate), with smoothing and baseline correction. e second derivative (green) is calculated, and outcomes enounced previously are visible on the graph. ere are 5 minima. One appears (the first one) on the left and does not correspond to the presence of a real peak, and one disappears around (1100-1050 cm -1 ).
In conclusion, the second derivative method shows great advantages (peak thinning, easier discrimination, and localization of maxima in the case of a simple curve). However, in the case of complex spectra, the "purely mathematical" aspect of this method does not take into account the real composition of the analyzed samples and may give partially false results. In our lab, we decided to use another method: the peak fitting using a fix number of peaks, previously defined in [11,12,15], and based on the composition of the material (bone).

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
ere are no conflicts to declare.  Figure 14, with the introduction of noise, and then, a Savitzky-Golay smoothing is applied.