Machine learning powered framework for detection of micro-and nanoplastics using optical photothermal infrared spectroscopy

. Despite the breadth of scientific literature on micro-and nanoplastics (MNPs), a standardized procedure for detecting MNPs is still lacking so far, leading to incomparable results between published studies. This work innovatively proposed the combination of machine learning with advanced optical photothermal infrared (O-PTIR) spectroscopy to develop an efficient and reliable detection framework for MNPs. Spectra of MPs and non-MPs were first collected and inputted to build a classification model, based on which four important wavenumbers were selected. A simplified support vector machine (SVM) model was subsequently developed using the selected four wavenumbers. Good predictive ability was evidenced by a high accuracy of 0.9133. The developed method can improve speed as well as the reliability of results, having a great potential for routine analysis of MNPs, ultimately leading to the standardization of detection methods.


Introduction
Analytical techniques that are most used for micro-and nanoplastic (MNP) characterization are Fourier-transform infrared spectroscopy (FTIR) and Raman spectroscopy [1].Plastic particles are identified via their vibrational spectrum, which is unique for every polymer type.However, from a spectroscopic perspective, a rigorous spectral analysis was lacking in most published studies.Take the teabag article performed by Hernandez, et al. [2] as an example, they simply presented FTIR spectra of the teabag leachate and claimed they can be assigned to a certain plastic type based on the visual spectral similarity, which was followed by counting the number under scanning electron microscopy (SEM) assuming all particles belong to plastic.
To realize the particle-wise analysis, one approach is to first locate, count, and measure particles based on the visible light image or single-wave image, the spectrum of each particle can then be collected, which is automatically compared with the reference spectral library to identify material types.This method has a fast speed and good detection limit, yet it relies heavily on software for particle tracking and polymer identification.Meanwhile, particle location depending on either optical image or single-wave image might not be desirable for some cases.The other approach involves the acquisition of spectral imaging dataset which comprises an individual spectrum for each "pixel".This will improve the reliability of the detection result, yet it requires sophisticated data analysis and the collection of data could be slow ranging from a few hours to days especially with high spatial resolution (i.e., step size of 1 µm).
To address these analytical challenges, this work aims to develop a novel machine learning (ML) powered detection framework to analyze plastic particles ranging from micro to nanoscale in an efficient and accurate manner.This study leverages the recently developed optical photothermal infrared (O-PTIR) spectroscopy which breaks through the diffraction limit of traditional infrared and improves the spatial resolution of infrared spectroscopy to submicrometers.

Sample preparation
Reference nylon bulk plastics with a thickness of 1.6 mm and size of 1.5 cm × 1.5 cm were used.After immersed in the boiling water for four minutes, the leachate was subsequently filtered using an aluminium oxide filter (diameter: 25 mm) with a pore size of 0.2 µm (Anodisc 25 Product: 6809-6022, Cytiva).

O-PTIR data collection
During O-PTIR scanning, a customized filter holder was 3D printed and used to accommodate the filter.The O-PTIR spectral data has a spectral resolution of 2 cm −1 covering the range of 1801 -769 cm −1 .Singe-wave images and optical images can also be obtained using the software PTIR Studio software (Photothermal Spectroscopy Corp., Santa Barbara, CA, USA).

ML-powered detection framework
As can be seen from Fig. 1, the proposed framework starts with the collection of 1038 MP and 1052 non-MP spectra.Standard normal variate (SNV) was used to pre-treat spectra prior to model development.Partial least squaresdiscriminant analysis (PLS-DA) model was subsequently developed to discriminate between MPs and non-MPs, which produces a regression vector for visualizing the contribution of individual spectral variables.Based on this, several important spectral variables can be determined.Using these selected wavenumbers (for example, w1, w2 and w3 in Fig. 1), a simplified support vector machine (SVM) model was developed.For the application scenario, instead of the full spectrum, this work can save a large amount of time by collecting only O-PTIR singlewave images at the selected wavenumbers and inputting this data to the pre-trained SVM model to classify each pixel by labelling it as MP or non-MP.The result for a particle was determined by the majority vote of the labels of all pixels within the particle.

Model performance
Around 2/3 samples were used as the training set, while the rest 1/3 forms the test set.The performance of each binary classifier is assessed by the classification accuracy and Matthews coefficient (MCC).MCC is a reliable statistical rate that yields a high score only if the prediction obtained good results in all aspects, therefore, some researchers believed it is more informative than accuracy [3].Confusion matrix is further used to evaluate the quality of the output of the classifier for test sets.

Modelling performance
Table 1 summarizes the modelling performance of the test set for PLS-DA developed from a full spectral profile (i.e., 517 variables).A high accuracy of 0.92 was obtained.It is also noticed that the sensitivity and specificity are both reasonably high, indicating the strong capability of the developed classifier.The confusion matrix in Fig. 3 implies that there are 20 point spectra of MPs wrongly classified as non-MPs and 36 of non-MPs mistakenly assigned as MP, showing a relatively balanced accuracy.Based on regression vectors, four important wavenumbers contributing greatly to the discrimination between MP and non-MP spectra were identified: 1077 cm -1 and 1711 cm -1 , 1635 cm -1 and 1541 cm -1 .The SVM model built from four wavenumbers also presents a good predictive ability, as evidenced by a high accuracy of 0.9133 in Table 1, slightly lower than that of the PLS-DA model.The confusion matrix in Fig. 3 further demonstrates the strong capability for prediction depending only on the selected four wavenumbers.

Conclusion
This work innovatively proposes a ML based framework for detecting micro-and nanoplastics.Good predictive ability was evidenced, indicating the feasibility of the proposed method for real-life applications.The use of important wavenumbers can significantly reduce the collection time, making it a suitable routine method for monitoring plastic products on the market.

Fig. 1 .
Fig. 1.The schematic diagram of the ML-powered detection framework.

Table 1 .
The model performance summary of the test set.