Wavelength weightings in machine learning for ovine joint tissue differentiation using diffuse reflectance spectroscopy (DRS)

: Objective: To investigate the DRS of ovine joint tissue to determine the optimal optical wavelengths for tissue diﬀerentiation and relate these wavelengths to the biomolecular composition of tissues. In this study, we combine machine learning with DRS for tissue classiﬁcation and then look further at the weighting matrix of the classiﬁer to further understand the key diﬀerentiating features. Methods: Supervised machine learning was used to analyse DRS data. After normalising the data, dimension reduction was achieved through multiclass Fisher’s linear discriminant analysis (Multiclass FLDA) and classiﬁed with linear discriminant analysis (LDA). The classiﬁer was ﬁrst run with all the tissue types and the wavelength range 190 nm – 1081 nm. We analysed the weighting matrix of the classiﬁer and then ran the classiﬁer again, the ﬁrst time using the ten highest weighted wavelengths and the second using only the single highest. Our method was applied to a dataset containing ovine joint tissue including cartilage, cortical and subchondral bone, fat, ligament, meniscus, and muscle. Results : It achieved a classiﬁcation accuracy of 100% using the wavelength 190 nm – 1081 nm (2048 attributes) with an accuracy of 90% being present for 10 attributes with the exception of those with comparable compositions such as ligament and meniscus. An accuracy greater than 70% was achieved using a single wavelength, with the same exceptions. Conclusion: Multiclass FLDA combined with LDA is a viable technique for tissue identiﬁcation from DRS data. The majority of diﬀerentiating features existed within the wavelength ranges 370-470 and 800-1010 nm. Focusing on key spectral regions means that a spectrometer with a narrower range can potentially be used, with less computational power needed for subsequent analysis.

DRS technique is at its infancy in its clinical use. Being a non-invasive method has allowed this technique to be integrated into fibre optic probes and incorporated into an orthopaedic device for real time guidance. Robotic and laser surgery in orthopaedics is still evolving and seeks to provide more precise surgery, new surgical techniques and the ability to maintain a high level of aseptic sterility intraoperatively [25,26]. Laser surgery does not provide the tactile feedback [27] surgeons use to determine the type of tissue manipulated. This can lead to iatrogenic damage [28][29][30]. The ability of real time DRS sensing in robotic surgery systems to identify tissue may allow for non-contact approach to tissue differentiation. Recent studies have shown its safety, such as in the accurate placement of screws into critical tissue in open and minimally invasive spinal surgery as it can identify the transition zone from cancellous to cortical vertebral bone [31]. Raman spectroscopy combined with machine learning has also been used for tissue identification [32]. However, DRS has less demanding hardware requirements; compared to DRS, Raman spectroscopy also requires a laser and a much higher wavelength resolution spectrometer. The collection time for Raman spectroscopy may be too slow to be used for real-time feedback. In a clinical setting DRS would be easier to implement.
There is limited data on the various spectral bands used in DRS to differentiate joint tissue encountered during orthopaedic surgery. Hence the objectives of this study are to provide validated DRS data that could help pave the path for autonomous robotic surgery in the future. Our novel study uses ovine joint tissue as the similar structures are found in human joints. We then utilised machine learning for tissue classification. Machine learning is a growing field in medicine and recent studies have looked at quantitative mapping of vital signs of tissues using random forest [33,34] and linear discriminant analysis (LDA) [7], quantitative characterization of human skin using random forest [35]. Various techniques have been used successfully in tissue differentiation including, support vector machine (SVM) [31] and unsupervised principal component analysis (PCA) [36]. Our study builds on this and used supervised LDA [37] and Multiclass Fisher's Linear Discriminant Analysis (Multiclass FLDA) [37].
In supervised techniques, such as LDA and Multiclass FLDA, the class labels are taken into consideration. Multiclass FLDA and LDA are used to transform the features into a lower dimensional space. This maximizes the ratio of the between-class variance to the within-class variance, thereby maximising class separability. This allows for direct classification without the need for an additional classifier, such as SVM, and improves our ability to interpret and verify the behaviour of the system. We then look deeper at the weighting matrix of the classifier. This gives us insight into the salient wavelengths. This complete DRS analysis of joint tissue and insight into the salient wavelengths has not been done in prior studies and paves a path for real time, interpretable tissue identification during surgery. The motivation for reducing the number of wavelengths is that it could allow for the use of spectrometers and hyperspectral cameras with lower wavelength ranges and could reduce the complexity and computational power needed for analysis.

Data collection
Joint tissue samples were collected and analysed from multiple ovine knee-joints ex vivo (outside the body). The tissue types collected were meniscus, cortical bone, subchondral bone, muscle, fat, cartilage and ligament-all commonly found in both ovine and human knee-joints. Samples varied in size from 1cm x 1cm to 6cm x 3cm. 300 measurements of optical spectra were taken from each of these samples except cortical and subchondral bone, where there were enough samples to take 400 measurements. The samples analysed were fresh and at room temperature. The Ocean optics USB-650 red tide spectrometer [38] with a custom reflectance illuminator was used to collect data that is more representative of what is possible in theatre as it is a compact instrument (see figure (2)). The red tide spectrometer has a maximum signal-to-noise ratio (SNR) of 250:1, and a wavelength range of 200-1080nm with a resolution of ∼2 nm. The light-source used was a 150W fibre-coupled Halogen lamp, connected to a light ring to achieve an even spread across the tissue sample. The distance between the fibre and the first lens was 20mm while the distance between the lenses was 85mm. This created a 1mm spot size at a focal length of 30mm. The optics bench was handheld, and the bottom surface was placed directly in contact with the tissue. This ensured that the samples were always at the correct focal distance from the sensor. The optics bench was slowly moved manually over the tissue sample as the spectrometer recorded the spectra at approximately 30 spectra per second. The types of tissue recorded can be seen in Fig. 1. More spectra were recorded for cortical and subchondral tissues as these tissue samples were larger and more abundant. The optical setup can be seen in Fig. 2. Samples were taken from three animals.  Spectrometer and optics setup used to gather DRS data. Tissues samples were placed underneath the guided plate (9) and illuminated by the light from the halogen lamp (150W). The reflected light was focussed into the fibre that guided the light to the spectrometer where the light was analysed. The raw data from the spectrometer was collected and stored on a computer for subsequent machine learning processing to identify tissue. 'A' is photograph of the actual setup and 'B' is a diagram of the DRS data collection.

Machine learning
All machine learning computations were performed using the Waikato environment for knowledge analysis (WEKA) machine learning tool kit [39] and were based on the normalised spectral. Each spectrum consisted of 2048 wavelength channels in the 200-1080 nm range with a resolution of ∼2 nm. The spectra were normalised and a standard normal variate (SNV) [40] transform applied to remove variations caused by the light source and to centre and scale them. The system was trained based on supervised learning. First each of the spectra collected was manually labelled with the class of tissue that it was collected from. This manual labelling was performed by clinical orthopaedic surgeons at collection time based on shape, colour, presentation, and location of collection. There were seven tissue classes corresponding exactly to the tissue types. Dimensionality reduction was achieved through Multiclass Fisher's Linear Discriminant Analysis (Multiclass FLDA) [37]. The final step was running the data through the Linear Discriminant Analysis (LDA) [41] with these selected wavelengths as attributes to learn the final model. 10-fold cross-validation was used to determine the classifier accuracy. This involves holding out 10% of the data, training the model on 90% of the data, then testing on the held-out set. This is repeated 10 times, each time with a different 10% of the data, to yield 10 estimates of the ability of that method to build an accurate model The classifier was run with the entire wavelength range for each tissue class in order to compare and contrast them. LDA was first used to learn a model with all 2048 wavelengths used as attributes. The resulting weighting matrix from the LDA Classifier was inspected to determine, for each pair of tissues, the highest weighted, most informative 10 wavelengths, and the most informative single wavelength. LDA was then re-run twice, once with the top 10 wavelengths, and once with the top 1 wavelength, for the purpose of comparison. This process is shown in Fig. 3.

Experiments and results
Each spectral dataset consisted of 2048 wavelength channels. Each of these wavelength channels were regarded as an attribute towards the identification of the associated tissue class. An accuracy of 100% (95% confidence interval of 3.2%) was achieved when the full set of data was used in the classification pathway (Table 1) which shows a confusion matrix. The confusion matrix is a standard machine learning tool [42,43] for assessing the results from a classifier. It gives us insight into what the model is getting right and what types of errors it is making. Classification accuracy is the ratio of correct predictions to total predictions made. The correct predictions are True Postive (TP) and True Negative (TN) and incorrect predictions are False Positive (FP) (Identified as belonging to a class when it should not be) and False Negative (FN) (not identified as belonging to a class when it should be). Accuracy is given by the following Accuracy = TP + TN TP + TN + FP + FN The salient wavelengths were identified [44,45] based on this data to gain further insight into how this combination of optical spectroscopy and LDA could be optimised for the task of biological tissue classification. We consider salient wavelengths to be wavelengths that contribute the most to the identification of tissue. There are several alogrithms for extracting salient wavelengths from spectroscopic data, such as those introduced by Li et al. [46] and Araujo et al. [47]. We elected to use a rather simple method available through the WEKA toolkit, largely due to performance, with the added benefit of being more assessible and comparable. Salient wavelengths were found for pairs of tissue by carrying out an LDA analysis based only on these two tissue types. From the weighting of the eigenfunctions with the largest eigenvectors, the salient wavelengths for the tissue pair were found [37]. This region of the spectra would not only look at greatest variance in mean as those that had more variation in the mean might have also had much greater variance and, thus, be less discriminatory, hence interclass and intraclass variation was taken into consideration. This was carried out for all combinations of tissue types with the results being presented in Table 2 & Table 3  Table 2 shows percentage accuracy of the classification pathway of one class against another using the ten highest weighted wavelengths. Greater than 90% accuracy was achieved using 10 attributes except for; cartilage vs cortical, cartilage vs fat, cartilage vs meniscus, fat vs meniscus, and ligament vs meniscus. Table 3 shows the wavelength weighted the highest in the classification pathway and the subsequent accuracy if this wavelength was the only attribute used for classification. The use of a  Table 3. Wavelength (nm) weighted the highest and subsequent percentage accuracy if only this wavelength is used for classification. Fig. 4. The normalised optical spectra between the wavelengths 250-1100 nm for 7 types of ovine joint tissue. DRS data and machine learning features for ovine joint tissue with salient wavelengths from Table 3 for tissue differentiated with bars (blue and red bars for short and long wavelength regions respectively).
single wavelength achieved an accuracy greater than 70% except for cartilage vs fat, cortical vs fat, and meniscus vs muscle. Figure 4 shows the DRS of ovine joint tissue with key wavelengths from Table 3 highlighted using a bar on the graph. This shows that the majority of differentiating features are within the wavelength ranges 370-470 and 800 -1010 nm.

Discussion
Bone is a hierarchically-organized tissue [48] that contains approximately 25% organic matrix, 5% water, and 70% inorganic mineral compound (calcium phosphate compounds, mainly hydroxyapatite). The organic matrix is composed of collagen fibres assembled together with noncollagenous proteins (NCPs) [27]. The ends of long bones are usually covered by cartilage, which is composed of specialised cells (chondrocytes) [49]. These produce an abundant extracellular matrix, very rich in proteoglycans, water and fibres, such as collagen and elastin [50]. Cartilage contains relatively few cells, which occupy 10-20% of its volume. The remainder is extracellular material that is highly hydrated and contains up to 80% water by weight [51]. Ligament and meniscus have very similar biochemical compositions [52,53]. They are both composed of approximately two thirds water by weight. The remaining one third is organic matter, mainly existing as type I collagen.
Studies looking at joint tissue identification/differentiation have identified areas of interest; The wavelengths of 470 nm and 780 nm were used to successfully discriminate between cortical bone and approaching blood filled muscle tissue when drilling through a bovine femur sample [49]. The reflectance and scattering are similar between the two tissue types. However, in blood filled soft tissue, the 470 nm wavelength is heavily absorbed by the blood compared to the 780 nm wavelength, resulting in measurable differences. Lipid has been identified as having a characteristic peak centred at approximately 930 nm. A peak at 760 nm can be ascribed to water and haemoglobin (Hb). A broad peak, centred approximately at 970 nm is due to water [54]. Bone spectrum appears distinctly 'red' and includes the characteristic absorption peaks of haemoglobin at 542 nm and 576 nm. [51].
Our study identified two main regions in the spectra which was utilised in tissue discrimination. The 370nm-470nm range and the 800nm-1000nm range. Hb, lipid and water have strong absorption peaks in these regions and could be a major contributor to the differentiation. Hb has an absorption peak at 400nm, lipid has an absorption peak at 930nm [55] and water has an absorption peak at 970nm [56].
Greater than 90% accuracy was achieved using 10 attributes except for; cartilage vs cortical (76%), cartilage vs fat (78%), cartilage vs meniscus (77%), fat vs meniscus (86%), and ligament vs meniscus (75%). Current commercially available robotic systems in orthopaedics are for arthroplasty and screw placement in spinal surgery [57,58]. All these systems depend on the surgeon to expose and identify tissue. Bony resections in arthroplasty has the following boundaries; cartilage to subchondral bone, subchondral bone to cortical bone and vice versa while protecting the nearby ligaments. We achieved >90% accuracy in these boundary identifications hence opens the possibility of automating certain aspects of surgery. The importance of tissue type identification will vary on the surgery being performed and even the steps within a procedure. New surgical techniques will evolve as robotic research progresses and this paper adds to the growing research in this field.
The living tissue of a particular person is subject to variations in blood, water, collagen and fibre content. The variations are significant between people, extraction locations and even extraction timing [1]. Further studies are needed to evaluate human tissue and the influences of these factors on DRS. In addition, studies are needed to evaluate the individual spectra of each tissue composition and better understand the biomolecular composition of joint tissues contributing to tissue differentiation.

Conclusion
Our study has demonstrated key regions in DRS useful in ovine joint tissue differentiation. The majority of differentiating features are within the wavelength ranges 370-470 and 800 -1000 nm. Focusing on key spectra regions means that a spectrometer with a narrower range could be used, reducing the associated computational power needed for analysis. This makes the technology more accessible and better suited for real-time applications, such as those required in robotic surgery. Multiclass FLDA combined with LDA is a viable technique for tissue identification from DRS data and we are the first study to our knowledge to look further at the weighting matrix of the classifier to further understand the key differentiating features.

Disclosures
The authors declare that there are no conflicts of interest related to this article