Machine learning and its applications for plasmonics in biology

SUMMARY Machine learning (ML) has drawn tremendous interest for its capacity to extract useful information that may be overlooked with conventional analysis techniques and for its versatility in a wide range of research domains, including biomedical sensing and imaging. In this perspective, we provide an overview focused on the uses and beneﬁts of ML in areas of plasmonics in biology. ML methodologies for processing data from plasmonic biosensing and imaging systems by supervised and unsupervised learning to achieve enhanced detection and quantiﬁcation of target analytes are described. In addition, deep learning-based approaches to improve the design of plasmonic structures are presented. Data analysis based on ML for classiﬁcation, regression, and clustering by dimension reduction is presented. We also discuss ML-based prediction and design of plasmonic structures and sensors using discriminative and generative models. Challenges and the outlook for ML for plasmonics in biology are summarized. Based on these insights, we are convinced that ML can add value to plasmonics techniques in biological sensing and imaging applications to make them more powerful with improved detection and resolution.


INTRODUCTION
Surface plasmon (SP) refers to electron density waves formed at the interface between metal and dielectric material. SP can be spatially localized with confined volume and produce enhanced electromagnetic (EM) fields, leading to diverse applications such as single-molecule detection, 1,2 photovoltaic devices, 3,4 and lightemitting devices. 5 Resonance of SP shifts depends on the ambient conditions, which can be used to measure biochemical events such as bioaffinity interactions and quantify biomolecular concentration and kinetics. [6][7][8] For this reason, many novel concepts of sensing and imaging techniques have appeared in biology and biomedical applications; e.g., SP resonance (SPR) biosensing, [9][10][11] surface-enhanced Raman spectroscopy (SERS), [12][13][14] and surface-enhanced infrared absorption spectroscopy (SEIRA). [15][16][17] SP can be used to image molecular events based on metal-enhanced fluorescence with directional radiation of fluorophores. [18][19][20] In labeled imaging techniques, an enhanced light signal has been achieved with super-resolution by localization of SP using plasmonic nanostructures, such as nanoholes, disks, and random arrays, [21][22][23][24][25] and this may modulate optical characteristics for applications in imaging cell internalization of a virus, bacterial motility, intracellular mitochondrial movement, and structured illumination microscopy. [26][27][28] Label-free imaging has been Figure 1. Schematic of ML for plasmonics in sensing and imaging applications Plasmonics-based sensor and imaging systems provide biomolecular sensing and imaging platforms with good sensitivity. ML can be adopted to analyze diverse types of data acquired from the systems. It can also be utilized for prediction of EM responses of plasmonic structures and to design desired optical characteristics.
First, ML can provide a myriad of methodologies to analyze results obtained from plasmonic sensing for data processing into classification, regression, and dimension reduction/clustering ( Figure 2). The three components do not work independently and can be employed together for effective data analysis.
Second, ML can be utilized to investigate the relationship between plasmonic nanostructure and EM responses ( Figure 3). EM simulation can be conducted to obtain various optical characteristics of plasmonic structures. However, conventional methods require a long computation time and high memory usage, especially for calculation of the 3D structure. It needs a one-time training procedure, after which it can predict EM characteristics with much faster computation and less memory, although ML may not always be efficient, depending on applications, because generation of training data and optimization of an ML model can take a very long time. In addition to estimation of EM response, structures and material for plasmonic nanostructure with desired characteristics may be designed. This process, which may be referred to as an inverse design, is important to achieve ultrahigh sensitivity by amplifying light-matter interaction. 34,35 Deep learning in particular has opened up a new way for plasmonic structure design with its distinct advantages of extracting useful information from massive datasets. Various approaches based on modification of neural network architecture and algorithms have been established, surpassing earlier design methods.
In this perspective, we first introduce ML algorithms that include supervised, unsupervised, and deep learning. Diverse ML approaches to process data from plasmonic sensors and imaging systems are described based on classification, Figure 2. Schematic of ML implementation for analysis of data measured from plasmonic sensing and imaging Data, such as SPR sensorgrams, SERS signals, and scattering spectra, can be fed into ML models for detection, quantification, and diagnosis using classification and regression. The dimension of the data can be reduced for feature extraction, visualization, and signal enhancement. Dimensionreduced information can be utilized as training data for classification and regression, which achieves improved performance for the task. regression, and dimension reduction/clustering. We then discuss methodologies used to investigate the relationship between plasmonic structure and its optical responses. The methodologies include prediction of optical responses associated with plasmonic structures; for instance, reflection, transmission, absorption, and scattering intensity spectrum. Methods to design plasmonic structures to fit desired optical responses are introduced, with an emphasis on neural network-based approaches. This is followed by challenges for use of ML and the outlook of these approaches in the future. This perspective can benefit readers and nurture advanced applications of ML for plasmonic sensors and imaging systems.

ML ALGORITHMS
Supervised/unsupervised learning Supervised learning provides a labeled dataset to the machine that maps the relationship between input and output data. Linear regression, k-nearest neighbor (k-NN) and support vector machines (SVMs) are representative examples of supervised learning. Linear regression, which is relatively simple to comprehend and implement, identifies a linear approximate relation between different variables. k-NN is an instance-based model that finds the most similar k data points to estimate the output of data points of interest. 36 It can be operated without training, but its performance suffers when few data are provided. SVM performs classification by calculating the hyperplane that maximizes the distance between the plane and the closest data. SVM works effectively when the number of samples is relatively small compared with the number of features.
In unsupervised learning, the machine finds hidden patterns in unlabeled data for classification and clustering. Principal-component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) are two example algorithms of unsupervised learning. PCA is a linear transformation that can be used for dimension reduction and feature extraction by determining the direction in which the variation in data is at a maximum. 37 PCA, along with many other techniques for cluster analysis and regression, started from the category of statistical analysis despite being often classified as an ML technique. t-SNE is a non-linear dimensionality reduction method that provides efficient visualization of data while maintaining dataset cluster distributions. 38 Figure 3. Schematics of ML implementation for design of plasmonic nanostructures ML can be used to estimate EM responses of plasmonic nanostructure (forward) and to design a plasmonic structure with desired EM responses (inverse). The structure parameters for the design are not limited to specific parameters, such as size, gap, and rotation angle; rather, arbitrary patterns are possible.

Deep learning
Artificial neural networks (ANNs) were inspired by biological neural networks and consist of a collection of interconnected nodes that are typically organized into multiple layers. The connections are used to transfer signals from one node to another. Deep learning models are ANNs that have multiple layers between input and output, and they can be classified into two types: discriminative and generative models. Discriminative models are used for classification or regression, whereas generative models are used to create new data points that are indistinguishable from the original dataset. Multilayer perceptron (MLP) is a basic form of discriminative ANN that consists of at least three layers, including input, hidden, and output layers. Convolutional neural networks (CNNs) are discriminative models frequently used for image analysis. Local spatial features of data can be extracted by convolution of shared-weight filters that can be optimized during the training procedure. Generative models, such as generative adversarial networks (GANs) 39 and variable autoencoders (VAEs), 40 approximate specific probability distributions and create new data samples based on the underlying distribution. VAEs are autoencoders that are trained under regularization to make latent space generate new data efficiently. On the other hand, GANs typically consist of two parts: a generating part, which is trained to produce new data samples that are indistinguishable from the original dataset, and the discriminating part, which is trained to identify whether the data come from the original dataset or the generative model. The two parts of GANs develop by competing against each other and attempting to reach a Nash equilibrium, where the discriminator cannot differentiate data from real data, while diverse approaches are introduced to improve the stability of GAN learning. 41,42

ML-BASED DATA ANALYSIS Classification
The basic form of classification is a binary classification that divides data into two categories effective for a wide range of applications. For example, the concentration of cytokines has been analyzed with a CNN by analyzing the dark-field image of silver nanocubes conjugated with a cytokine. 43 The scattering spots of nanocubes were detected with a CNN that classified the pixels of images into two categories, existence and non-existence, by yielding the binary output of the neural network as 1 and 0, respectively ( Figure 4A). Binary classification has also been conducted to evaluate exposure to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) on plasmonic genosensors using several architectures of CNNs that were pre-trained with an ImageNet dataset 44 to estimate the presence of neurotransmitters 45 and identify the bacterial strains. 46 There is a need for classification into multiple categories beyond binary classification. For instance, the number of nanoparticles has been estimated by analyzing SPRM with a CNN. The CNN classified images into several categories based on the number of nanoparticles, allowing quantification of analytes in highly complex SPRM images with overlapping parabolic point spread functions. 52 A CNN has been used for assessment of the degree of DNA damage, 53 detection of cancer cells 54 and liver diseases ( Figure 4B), 47 and determination of virus size 55 and extracellular vesicles 56 by classifying them into several categories. Although CNNs were originally developed for two-dimensional data, it may be used to analyze one-dimensional data as well, with proper conversion to two-dimensional data. However, achieving generalization capability is challenging. 57 Thus, various approaches based on one-dimensional CNNs have been proposed for some applications 58,59 and employed to recognize different substances from SERS spectra. 60,61

OPEN ACCESS
In addition to CNNs, k-NN has been used to classify SPR sensorgrams into several groups in terms of the type and number of substances; the classified result was processed with further diagnosis protocols. 62 k-NN can be used for identification of specific/cross-reactive binding of an antibody from SPR images 63 and characterization of bacterial species (Escherichia coli) by classifying SERS spectra based on strains, culture conditions, and purification methods. 64 On the other hand, SVM has been used for cell cytometry by classifying lens-free holograms aided by gold and silver nanoparticles 65 and for differentiation of cancer cell-derived exosomes and various bacterial species from SERS spectra ( Figure 4C). 66,48 SVM was originally designed to be suited for binary classification, but multiclass classification can be carried out by splitting into multiple binary classifications with one versus all. 67 Regression Continuous value can be predicted by regression; e.g., the concentration and size of analytes. For instance, various approaches, including MLP, CNNs, and Gaussian process, were applied to estimate the concentration of glucose from SEIRA spectra 68 and drug molecules, including methylxanthine, 69 E. coli O 157:H7, and rhodamine of a single molecule from SERS spectra. 70,71 These approaches have yielded reliable results even when the concentration of analyte and data from the sensors were nonlinearly related. The size as well as concentration of extracellular vesicles has been estimated by applying CNNs to SPRM images ( Figure 4D). 49 A regression can be performed for analysis of multiple parameters as well as a single parameter. For instance, scattering spectra captured by plasmonic sensors are affected by several environmental parameters (e.g., the thickness and refractive index of a sample layer) and may cause confusion in sensing experiments. For this issue, MLP can efficiently extract relevant information from scattering patterns in the non-resonant wavelength bands and estimate the value of multiple parameters that affect spectra. 72 MLP also allows reconstruction of unknown spectra from encoded information that was obtained by measuring the transmitted intensity of a plasmonic nanostructure and achieved a compact and lightweight solution compared with traditional spectroscopy ( Figure 4E). 50 Two-dimensional data, such as images, can be a candidate output for regression because an image represents a set of values in the two-dimensional space. For example, the low SNR in a dark-field scattering image of gold nanorods in a living cell was converted to an image of scattering light spots, which facilitates clear identification of scattering signals. This is achieved with U-net, an encoder-decoderbased neural network that is well suited for image-to-image conversion. 73 A phase profile of SPR sensor data was estimated from the amplitude of a single back focal plane image of SPRM. The recovered phase profile was applied to refractive sensing to achieve robustness to noise and an enhanced detection limit compared with conventional SPR intensity measurements. 74 Generative models can also be used for regression; for instance, converting out-focused SPRM images into focused ones by training images measured at different focal distances. 75 Dimension reduction/clustering Dimension reduction can be employed for feature extraction and efficient visualization of data. The extracted feature by dimension reduction can be used for training data of a classification and regression model, overcoming the curse of dimensionality, which refers to the issues related to analysis of data in a high-dimensional space. 76,77 For example, SERS spectra of single-stranded DNA (ssDNA) sequences were processed with PCA. The dimension-reduced data were fed into a regression model for estimation of chemical composition, yielding results superior to those from the regression model alone. 78 Features of holographic images of the cell were extracted by dimension reduction with PCA and fed into SVM for classification of cell types. 65 Dimension reduction can be utilized for visualization of high-dimensional data with an additional clustering algorithm; e.g., k-means clustering. 79 Clustering refers to the process of partitioning data into several groups so that data points in the same group are more comparable than those in other groups. Typically, high-dimensional data are usually reduced to two or three dimensions for plotting patterns more explicitly. For example, dimension reduction of high-dimensional data and clustering of reduceddimension data have been used for analysis of optical characteristics of a meta-plasmonic sensor ( Figure 4F). 51 In the dimension reduction process, 197 dimension data of reflectance were reduced into 8 dimensions. The dimension-reduced data were subsequently mapped into 2 dimensions with t-SNE. The dimension of 197 corresponds to the number of measurement points with an incident angle in the range of 40 -89 at an interval of 0.25 . Finally, the 2-dimensional embedded data were grouped with k-means clustering and achieved efficient visualization of optical characteristics of plasmonic sensing and eventual identification of suitable candidates for practical sensor environment. t-SNE and k-means clustering were then used for visualization of scanning electron images of a plasmonic genosensor. 44 UMAP, a methods for non-linear dimensional reduction, has been adopted for improved data visualization of SERS spectra of E. coli. 64 Dimension reduction can be utilized to enhance weak signals from a sensor. PCA has been applied to enhance the weak signal of SERS, and the signal was reconstructed with 3 times higher intensity by retaining the 3 most significant principal components while eliminating the remaining components. 80

ML-BASED PREDICTION AND DESIGN
Prediction of EM responses from plasmonic structure Among traditional ML algorithms, regression, which is used to uncover correlation between variables and predict continuous-output variables, has been demonstrated to estimate EM responses of plasmonic structures given their geometry. In particular, regularization is usually adopted for regression to prevent overfitting by altering the cost function with a penalty for coefficients. Ridge regression, which is also referred to as L2 regularization, has been used to predict the resonance wavelength of gold concave nanocubes with k-fold cross-validation ( Figure 5A). 81 Various optimization methods have been employed; for instance, genetic algorithm, 24,82-84 particle swarm optimization, 85 and adjoint methods. 86 These traditional regression methods are generally easier to implement in a model and allow easier interpretation of the relationship between variables compared with ANNs (to be described subsequently). Therefore, traditional regression models may be appropriate when they yield results comparable with ANNs, and/or the relationship between structural parameters and EM responses is straightforward.
MLP is widely used to predict various EM responses from the geometry of plasmonic nanostructures. For this task, the input for MLP is set to be the geometry of plasmonic nanostructures, such as size, gap, periods, and rotation angles, and the output to be EM responses. For instance, MLP has been utilized to estimate reflection spectra of gold nanodisk arrays, 88 wavelengths where SPR of gold concave nanocubes occurs, 81 and resonance wavelengths of reflection for gold nanograting. 89 A CNN can also be employed to predict EM responses. It helps to circumvent the limitation of MLP, which predicts EM responses for constrained structural geometry; ll OPEN ACCESS for instance, a dimer structure with varying widths, lengths, and gaps. Two or higher multidimensional data that can represent an arbitrary shape of a structure can be fed into a CNN as an input. Then the features of data can be extracted by various trained kernels with an additional fully connected layer. For instance, a CNN has been utilized to estimate absorption spectra ( Figure 5B) 87 and scattering cross-section 90 given an arbitrary shape of plasmonic structure.
The ANNet model, which can train a multilayer neural network without gradient computation, has been employed to circumvent the difficulties of acquiring a large training dataset and achieved much reduced time for training. 91 ANNet successfully predicted leakage radiation characteristics from bright-field images of random nanostructures and showed fewer failure instances than U-net. 92 Design of structure/material of plasmonic sensors Multiple structures, however, may correspond to one EM response, making the design of a structure ill-posed and the overall training process extremely challenging. 93 Novel approaches, rather than a basic form of the neural network, 94 were therefore introduced to circumvent this issue. A bidirectional neural network has been proposed for design of a plasmonic structure with desired transmission spectra by forming two consecutive networks, a geometry-predicting network (GPN) and a spectrum-predicting network (SPN), as presented in Figure 6A. 95 A GPN was built to predict the geometry of a structure's given transmission spectrum and an SPN for prediction of the transmission spectrum of a structure. The two networks were processed with simultaneous training, where the spectrum in the training data was fed into the GPN, and the output of the GPN was fed into the SPN. The weights of networks were then updated by back-propagation methods with a loss function, which includes the errors between the output of two networks and the ground truth. This approach has been shown to provide far better performance than separately trained GPNs and SPNs. Similar methods, with some alteration, were also applied to the design of plasmonic nanoparticles and dimers given spectral cross-section or scattering 96,97 and gold chiral structure given circular dichroism ( Figure 6B). 98,99 In addition to the bidirectional neural network, an iterative neural network was introduced, in which a forward network was trained to predict EM responses given the geometry of a structure, and then the input parameters (structure geometry) were optimized by minimizing the error between the output of the forward network and desired EM response. 100,101 Generative models, such as GANs and VAEs, have been employed to address issues of discriminative models; e.g., the difficulty of dealing with enormous degrees of freedom in structure and the fixed outcomes given specific input data. GANs, which consist of a generator, critic, and simulator, have been utilized to design unit cell patterns of gold structure and produce desired transmission spectra considering the polarization of the incident and transmitted light ( Figure 6C). 102 The generator provides completely arbitrary patterns for a structure that fulfills desired spectra beyond constraints with few parameters; e.g., width, height, period, thickness, and gap. A GAN has been used indirectly in the design of photonic crystal fiber-based SPR sensors by augmenting training data that were fed into MLP. 103 A conditional GAN 104 in conjunction with a genetic algorithm has been introduced to the design of a dielectric spacer for a gap plasmon-based aluminum structure. The proposed network alleviated the requirement of generating numerous training data from numerical simulations by performing an inverse design and optimization. 105 A conditional deep convolutional GAN (cDCGAN) can be utilized by integrating a CNN and conditional GAN to achieve a stable Nash equilibrium solution. 106 The design of the silver structure was demonstrated by employing the cDCGAN, providing a high degree of design freedom with arbitrary patterns of a structure. 107 A VAE has been employed to design patterns with desired reflectance spectra without a pre-trained forward model, where the latent variables allowed diverse solutions to the inverse design, achieving one-to-many mapping consistent with physical intuition. 108 A deep learning model that can learn physical principles is highly desired for more general nanostructure design problems. Such a model can overcome weakness under structural constraints and decrease the computational load for simulation of optical responses of plasmonic nanostructures.

CHALLENGES AND OPPORTUNITIES
Importance of data quality ML generates output and solutions based on training data, and the data can significantly affect performance. As a result, gathering high-quality data is vital for efficient application of ML. The training data would be experimental by source, which can be obtained by incorporation and optimization of hardware/software for efficient and consistent data acquisition. In some cases, however, obtaining a large amount of data can be challenging, in which case simulation can be performed and used for training an ML model. Collecting training data by simulation requires accurate physical and mathematical modeling of experimental conditions and environments. Noise components, such as Gaussian and/or speckle noise, that can be inevitably present in the experiment should be considered to generate more realistic data Figure 6. ML implementations to design a plasmonic structure for desired optical responses (A) The structure of a bidirectional neural network consists of two networks. Adapted with permission from Malkiel et al. 95 Copyright 2018. (B) Schematic of a bidirectional neural network for the design of chiral metamaterial structures, where a joint-leaning feature was adopted for the generalized system. Adapted with permission from Ashalley et al. 99 (C) The architecture of a GAN-based network for design of a gold structure with desired optical spectra. It consists of a simulator, generator, and critic, which are all CNNs with subtle structural changes. Adapted with permission from Liu et al. 102 Copyright 2018 American Chemical Society.

OPEN ACCESS
and, hence, improve the performance of the ML model. Noise can be modeled by considering sources; for instance, thermal and shot noise. [109][110][111] One may use transfer learning and one-/few-shot learning techniques, which were introduced to utilize other domain data and label for application of the target domain data. No matter how one prepares training data for ML, it is imperative to have insights into the data domain and collaborate with domain experts in the specific research area to lessen the burden of obtaining training data of high quality and, thus, good performance.

Physicality of ML
Another challenge in ML lies in the difficulty of ensuring physicality and whether the algorithm learned physical principles that govern the characteristics of light and matter. Designing nanostructures is largely conducted by neural network-based methods because the task is a highly complex and ill-posed problem. A neural network is a black box in nature in the sense that it is difficult to obtain insights into the way the model works and to achieve good performance. Despite numerous efforts in the field of computer science to overcome the issue, it remains insufficient to reveal specific black box properties. The results obtained from the learned model may not reflect the physicality. This prevents the model from being useful in general applications and often makes it difficult to predict the data beyond training. Because of these issues, it is still difficult to assure the physicality of ML.

Application specificity of ML algorithms
We also emphasize that ML algorithms have different roles, scopes, and advantages, Thus, it is important to select a suitable algorithm for analysis of data from plasmonic biosensing and imaging systems. Although there is no unique and straightforward solution for this issue, several aspects of ML are worth considering; for instance, format, linearity of data, and memory usage. Specifics in this direction to select ML algorithms are left to guides that are well explained in other studies, 112,113 and it will be helpful to test various ML algorithms a priori. Although traditional data analysis techniques have been largely limited to data with a small number of parameters, ML can simultaneously consider many features that affect sensing and imaging data by effectively examining the entire data and extracting features using an advanced and optimized algorithm. ML is more suitable for analyzing multimodal and high-throughput sensors, and high-dimensional data can be effectively reduced to low-dimensional data, yielding better reliability and accuracy. ML can be less effective given an insufficient amount of data or when performing relatively simple tasks. In this case, an approach that compromises ML and conventional statistical analysis methods may be desirable. Sensor-related research, including plasmonic sensors, has shown consistently to enhance SNRs, and diverse ML-based methods have been developed for applications working with many other types of signals and images. [114][115][116] These methods can be adapted with proper modification for applications of plasmonics in biology and biomedical engineering by providing enhanced sensitivity and improved resolution. To date, most ML studies for plasmonic biosensing and imaging have relied on data-driven approaches without considering plasmonic physical models. However, this can be developed to incorporate theoretical aspects of plasmonics by introducing physics-informed ML, which can enforce physics laws in ML models. 117 This approach has already been applied to diverse research areas, yielding physically consistent results with improved generalization. [118][119][120] Code sharing and standardization ML and deep learning grow as powerful tools by sharing code and high-quality data while collaborating on projects with enhanced productivity through various platforms and communities. One may begin writing analysis codes for data from sensor ll OPEN ACCESS and imaging systems by adopting and modifying ML models that are shared by many researchers instead of starting from scratch. On the other hand, in terms of data, it may be relatively challenging to import shared data and apply one's own analysis of plasmonic sensor and imaging data because experimental conditions, including equipment and noise level, are different. It is necessary to consider a way for proper adoption of data; for instance, utilization of transfer learning. ML is no panacea for all applications; in this regard, it is important that users understand beforehand whether and how an application can benefit from ML.

Conclusion
In this perspective, we reviewed various approaches and applications of ML for plasmonic sensing and imaging in biology. Diverse ML-based methods, such as classification/regression, image recognition, and dimension reduction, were utilized for data processing of SERS, SPRM images and SEIRA, and exploration of the relationship between plasmonic structure and optical response. ML-based approaches provide huge benefits and, at the same time, a cumbersome burden of deciphering complicated patterns by extraction of important features from complex and multidimensional data from plasmonic biosensor and imaging systems. Sophisticated algorithms, such as tandem neural networks and generative models, have been utilized in the design of plasmonic structures ranging from nanoparticles to arbitrary shapes. We anticipate that the use of ML for plasmonic biosensing and imaging will continue to grow because of the unique advantages of efficient extraction of meaningful information based on a data-driven approach and extensibility of incorporating physical models, and in-depth collection of high-quality training data and selection of optimum ML algorithms will remain essential.