Machine learning and analytical methods for single-molecule conductance measurements

Single-molecule measurements of single-molecule conductance between metal nanogap electrodes have been actively investigated for molecular electronics, biomolecular analysis, and the search for novel physical properties at the nanoscale level. While it is a disadvantage that single-molecule conductance measurements exhibit easily fluctuating and unreliable conductance, they oﬀer the advantage of rapid, repeated acquisition of experimental data through the repeated breaking and forming of junctions. Owing to these characteristics, recently developed informatics and machine learning approaches have been applied to single-molecule measurements. Machine learning-based analysis has enabled detailed analysis of individual traces in single-molecule measurements and improved its performance as a method of molecular detection and identification at the single-molecule level. The novel analytical methods have improved the ability to investigate for new chemical and physical properties. In this review, we focus on the analytical methods for single-molecule measurements and provide insights into the methods used for single-molecule data interrogation. We present experimental and traditional analytical methods for single-molecule measurements, provide examples of each type of machine learning method, and introduce the applicability of machine learning to single-molecule measurements.


Introduction
Machine learning has made remarkable progress in recent years and has attracted attention for its applications in a variety of fields, including chemistry and nanoscience. 1-4 New scientific insights can be gained by using machine learning to obtain more information from data. Single-molecule measurement is an area where machine learning is desirable due to the amount of data available, the variability of the data and the difficulty of interpretation. Single-molecule measurement is a technique for assessing the electrical conductance of a single molecule between metal nanogap electrodes ( Fig. 1). [5][6][7][8][9][10][11][12][13] It originated from the theoretical proposal of the molecular diode by Aviram and Ratner. 14  using functional molecular junctions as devices. 8,11,[15][16][17][18][19][20][21] Notably, molecular junctions composed of functional molecules have been reported as essential components in devices such as diodes, [15][16][17][18] switches, 19,20 and transistors. 11,21 In the early stages of research on single-molecule measurements, the primary objective was to develop molecular devices as shown in Fig. 1b and e. Since Di Ventra's group theoretically proposed DNA and RNA sequencing using single-molecule measurements, this novel application of the technique has received attention as shown in Fig. 1c and f. [22][23][24] The research of single-molecule measurements has developed to successfully measure the conductance of nucleotides in DNA and RNA [25][26][27][28][29][30] and amino acids. [31][32][33][34][35][36][37] As the nature of single-molecule measurements allows for the measurement of the direct conductance of a single molecule, they are expected to flourish as a new analytical method that is highly sensitive, rapid, and requires no pre-treatment steps. Furthermore, singlemolecule measurements play a crucial role in the investigating of novel physical and chemical properties in the nanoscale. Recent research has focused on elucidating the mechanisms of electrical 38,39 and thermal conduction at the nanoscale, 40,41 observing quantum interference, [42][43][44] and detecting and enhancing chemical reactions at the single-molecule level through the nanogap environment Fig. 1d and g. [45][46][47] Regardless of the application, accurately measuring the conductance and identifying the molecular junction structure are critical. However, it is difficult to precisely identify and understand the structures of singlemolecule junctions. The typical method of measuring the ensemble average of molecules of the order of Avogadro's number cannot be applied to single-molecule measurements. Electrical conductance measurements are the primary methods used to determine the structure of single-molecule junctions. However, the conductance of a single molecule varies widely, even for repeated measurements of the same molecule. [48][49][50][51][52][53] Moreover, the order of magnitude of conductance differs among reporting groups, [54][55][56] mainly because of variations in single-molecule junction structures and migration of metallic electrodes. [48][49][50]52,53 The molecule-electrode coupling and the energy alignment of the conduction orbital of the bridging molecule determine the single-molecule conductance. Changes in the electrode or adsorption structures of the molecule alter the coupling and conduction orbital levels, which easily affect the conductance owing to noise or external stimuli. Therefore, in single-molecule measurements, conducting only a single trace is insufficient for discussing the properties of single-molecule junctions. Both experimental methods and data analysis need to be developed. Experimental methods for reliable measurements and analytical techniques for obtaining statistical data have been developed for singlemolecule measurements. One of examples of experimental development is exploring more stable and well-defined contact with direct bonding between molecule and electrodes via C-C bonding. 45,57,58 In aspect of analysis, in broader science, the recent remarkable development of machine learning technology has had a significant impact on a wide range of fields, including nanotechnology. 1-3 The development of deep learning, which trains large amounts of data and has nonlinear and highly expressive capabilities, has been particularly noteworthy. 2 In addition to deep learning, the accessibility of a wide variety of machine learning analyses has been improved by userfriendly software and the development of new methods such as XGBoost and LightGBM. [59][60][61] Consequently, machine learning-  based analysis has also attracted attention in the single-molecule measurements field. 62 This review focuses on the development of analysis techniques for single-molecule measurements, particularly those utilising informatics approaches, which have been advancing rapidly in recent years.
First, the experimental technique is briefly described. The most common method is the break junction (BJ) method, which includes the mechanically controllable break junction (MCBJ) [63][64][65][66][67] and the Scanning Tunnelling Microscope (STM)-BJ method 68 represented in Fig. 2a and b, respectively. The MCBJ method involves breaking metal wires on an elastic substrate to create a nanogap, while the STM-BJ method measures the conductance of a molecule between the substrate and STM tip. The first report of single-molecule conductance using the MCBJ method showed only several single-molecule conductance measurements. 66 The MCBJ's ability to form stable and controllable nanogaps led to the development of methods to measure the vibrational state of molecules by changing voltage, such as point-contact spectroscopy (PCS) and inelastic tunnelling spectroscopy (IETS). 67,69-71 These techniques require lowtemperature environments for experiments, limiting them to basic research. However, after the STM-BJ method was reported in 2003, single-molecule measurements became executable in a commercially available setup, and various molecules were subsequently measured for their conductance. 68 In this report, the authors not only used STM but also performed a statistical treatment of conductance through repeated breakup and formation, thereby increasing the reliability of conductance measurements. There are two types of conductance measurement methods in the BJ method. One is the I-z method, which measures current (I) during the process of breaking the junction by continuously increasing the nanogap distance (z) as shown in Fig. 2c, and the other is the I-t method, which measures conductance-time (t) after nanogap formation by keeping the nanogap distance constant as shown in Fig. 2d. Recently, not only break junction of metallic contact but also the I-t method using graphene electrodes have also attracted attention using, which provide a stable measurement by direct C-C bonding. 45,57,58 Subsequently, methods other than conductance measurements have been developed, such as current-voltage (I-V) characteristic measurements, 72-76 thermoelectric voltage measurements, [77][78][79][80] and electrochemical measurement techniques 9,81 to investigate electronic structures. Raman spectroscopy is used for spectroscopic measurements of vibrational states, 82,83 and shot noise is used for conduction channel measurements. 84,85 These measurement techniques have improved the amount of information obtained from single-molecule measurements. However, these elaborate experiments are experimentally costlier in comparison to simple conductance measurements.

Histogram-based analysis
In single-molecule conductance measurements, a plateau observed in the conductance trace is commonly interpreted as an indication of single-molecule conductance. However, similar plateaus are also observed in blank measurements. Although plateaus are more frequently observed in samples containing molecules, a single trace alone is insufficient to determine their presence. Therefore, histogram-based analysis is the most fundamental and important statistical analysis method for single-molecule measurements. [5][6][7][8][9][10][11][12][13] Conductance histograms are typically created by accumulating conductance traces during the rupture process, and singlemolecule conductance is then determined from the peak positions of the histogram. Although the single-molecule conductance is the most fundamental information, the peak width also contains information about the molecular junction. A series of conductance values determined from the histograms under different conditions provides more detailed information. The decay constants for the molecular series are determined using single-molecule conductance-molecular length plots. The decay constant depends on the conduction orbital level of the molecular backbone and broadening of the conjugated system. 86,87 Experiments are also often performed at varying temperatures. The temperature dependence of the single-molecule conductance obtained at each temperature provides information on the conduction mechanism. 39 For example, tunnelling conduction shows no temperature dependence, while hopping conduction shows Arrhenius-type temperature activity. Therefore, the conductance histogram provides basic information about the single-molecule junction under study.
In addition to conductance measurements, histograms are a commonly used statistical tool for other parameters related to junction stability. These parameters include the junction plateau length, retention time, and snapback distance. [88][89][90][91] The snapback distance is the distance travelled by the electrode immediately after breakage, which is defined by the difference between the elongation distance after gold junction breakage and the distance to metal junction re-formation. Additionally, a conductance-stretch length 2D histogram can provide a statistical representation of the overall trace shape. 92,93 The 2D histogram displays conductance on the vertical axis and elongation distance on the horizontal axis, which allows for a visualization of the statistical behaviour; 2D histograms reveal the presence of conduction states or illustrate the decay of conductance with increasing distance.

2D correlation histogram
Additionally, the 2D correlation histogram (2DCH) has proven to be a powerful tool for understanding molecular junctions with multiple conduction states. [94][95][96][97] First, n conductance traces are individually converted into an m-dimensional conductance histogram. From this n Â m matrix, an m Â m correlation matrix is generated, which is then displayed as a 2D heat map in the 2DCH. The values in the 2DCH range from 1 for strong positive correlation, À1 for negative correlation, and 0 for no correlation at all. The correlation between the two conduction states related to the frequency of occurrence of the other state when one state is observed can be easily determined with the 2DCH. The examples for simulated test data is represented in Fig. 3. The two datasets exhibit similar conductance histograms. In 2DCH, there is clear difference in cross region between the two conductance states. This information is valuable for inferring the relationship between conduction states during the breaking process.

Machine learning
Although histograms are commonly used to analyse singlemolecule measurements, they do not capture all information during the breaking process of single-molecule junctions as represented in Fig. 4. To compensate for this large measurement variability, the statistical analysis is applied for singlemolecule current profiles. Machine learning algorithms improve the accuracy of discrimination, regression, and clustering with multi-dimensional features. Statistical models are trained and used to identify significant individual measurements to improve data quality. As mentioned, repeated single-molecule measurements can be collected via repeatedly breaking and forming the junction despite the variability of the individual conductance traces. This feature makes single-molecule measurements a promising research area for machine learning applications. In particular, deep learning algorithms can optimize a large number of parameters to improve accuracy, 2 a good fit for the large amounts of data generated in single-molecule measurements.
Typical machine leaning categories are described in Fig. 5. Machine learning is broadly categorised into supervised and unsupervised learning. Supervised learning is used to predict labels and numerical values for unknown data based on a data set with known labels and values. On the other hand, unsupervised learning is used to provide interpretation for data sets without explicit labels or values. In the next section, we provide examples of the application of machine learning to singlemolecule measurements and its use in related research fields.

Clustering
Clustering is a method used to partition unlabelled data into multiple groups. Commonly used algorithms include k-means, Gaussian mixture models, and DBSCAN. 116 The k-means method clusters to determine the centre of gravity of the clusters as the centroid and assigning data to clusters that are closest to the centroid. The application example of k-means are quantization of measurement images and clustering nanoparticle size by mass spectra of nanoparticle. 117,118 GMMs clusters by representing probability densities with multiple Gaussian distributions. GMM are used for clustering FRET fluorescence responses and molecular structures obtained from molecular dynamics calculations. 119,120 The DBSCAN algorithm clusters data by its probability density using a distance in the feature space. DBSCAN is applied to cluster nanopore current data. 121 k-means is simple algorithm and widely applied, but cannot be applied to data with different variances between clusters or data that are not spherically distributed since the data are clustered by distance from the centre. GMMs are also effective when each cluster has a different variance. k-Means and GMMs require the number of clusters to be defined in advance. DBSCAN requires distance between the data in advance. DBSCAN can also define outliers or noise points.
The pioneering study of the application of clustering to conductance traces of single-molecule measurements is a multiparameter vector-based classification process (MPVC) reported by Lemmer et al. 109 In which conductance traces are treated as vectors and features for clustering are extracted by transforming each trace into three characteristic quantities. A reference trace is initially selected, and the traces are transformed into a feature vector with three components: the Euclidean distance, which represents the magnitude of the difference between each trace and the reference vector; the normalised inner product, which indicates the similarity of the shapes; and the degree of fluctuation relative to the reference vector. The vectors are then clustered using an unsupervised learning algorithm. In this study, clustering was performed using the Gustafson-Kessel Fuzzy clustering algorithm, which successfully distinguished conductance traces in 3-D space from the simulation data that peaked at the same location in a conductance histogram. With Fuzzy clustering, data can be assigned to multiple clusters. For the experimental data of oligophenylene ethylene molecules, which did not exhibit a distinct peak in the conductance histogram of all traces due to a low bridging rate, the application of MPVC enabled the identification of populations that displayed a distinct plateau. This technique is not only applicable to conductance traces but also to current-voltage characteristic curves obtained by sweeping the bias voltage during junction formation. By clustering the I-V curves of molecules with tripodal anchors, three states with varying conductive and rectifying properties were distinguished, and their respective structures were identified by comparison to theoretical calculations. 100 This MPVC method is visually intuitive because it utilises mapping to a three-dimensional feature space that reflects the shape of the trace. Nonetheless, some drawbacks exist, such as the challenge of identifying reference vectors and managing excessively lengthy traces when contrasting traces of varying sizes. Normal clustering algorithms require feature vectors with identical dimensions. Hence, each conductance trace must be transformed into a vector of the same dimensions. Fig. 6 displays other clustering scheme of single-molecule measurement. The most applicable conversion method is to create a conductance histogram from a single trace, which can easily convert a trace of any length into a vector with dimensions in bins. 103,104,111 Vectors representing the histogram are clustered using algorithms such as k-means and spectral clustering. This method cannot distinguish between traces where the single-molecule conductance transitions from a high-conductance state to a low-conductance state and vice versa. In numerous measurements of individual molecules, the conductance tends to decrease as the distance increases. However, certain molecules, like alkanes, display a rise in conductance just before breaking due to the presence of gauche defects that lead to a decrease in conductance. 79,122 It is essential to consider the loss of information, such as conductance increase, which may depend on the choice of feature vectors during the analysis. Another method using deep learning has been proposed, in which traces are treated as ordinary twodimensional images. 99,106,110 This image-based method can directly capture changes in conductance.
Clustering analysis provides valuable insights lost through simple histogram generation. In our study, we utilised Grid-based DBSCAN to analyse octanedithiol conductance traces and obtained histograms that revealed multiple distinct peaks by clustering data points within each trace. 112 This approach enabled us to infer changes in the single-molecule junction structure from conductance changes during the rupture process. Clustering, a machine learning method, employs an algorithm to classify data without relying on the researcher's intention. It exposes insights that cannot be extracted through conventional histogram-based analysis.
As previously stated, there is no definitive solution in unsupervised learning, and the results are highly dependent on both pre-processing and model selection. 123 In the case of clustering, a distance metric is utilised to group data points. The pre-processing of normalisation also impacts the clustering results. When dealing with multiple quantities of varying physical dimensions, it is necessary to normalise the data to enable proper clustering. Without normalisation, only those quantities with significant numerical variations will be affected, resulting in suboptimal clustering outcomes. Standardisation is the most common normalisation technique, which involves converting each feature's mean to zero and the standard deviation to one to mitigate sizerelated differences in physical quantities. Additionally, some clustering algorithms require a priori knowledge of the number of clusters; however, choosing an appropriate number of clusters is critical. Performance indices such as the least-squares error are not ideal for determining the number of clusters because they tend to improve as the number increases. A commonly used method involves adjusting the number of clusters and selecting a value that maximises or minimises the performance indicator, including a penalty term that depends on the number of clusters. Examples of such performance indices are the Calinski-Harabasz index, akaike information criterion (AIC), and bayesian information criterion (BIC). 111,124 Using this approach, the molecules in chemical reactions, the number of association states of nucleobases, and recognition of small molecules have been clarified. 124 Other methods for determining the number of clusters include identifying the point at which the slope of the error decreases with respect to the number of cluster changes, specifying a large number of classes, and assigning physical meanings to each class, 125 or determining the classes from a physical model. Furthermore, machine learning emphasises the importance of understanding the data's characteristics. Therefore, associating the data to be clustered with physical interpretations is beneficial. 123

Dimensionality reduction
Unsupervised learning involves dimensionality reduction and feature extraction. Principal Component Analysis (PCA) is the most commonly used method for dimensionality reduction, 99,106,[113][114][115] in which orthogonal axes are selected to capture large values of variance in increasing order as shown in Fig. 7a. The number of dimensions is reduced by PCA, which mathematically corresponds to the variance-covariance matrix introduced above, a nonstandardised matrix of 2DCH, by adopting the eigenvectors of the variance-covariance matrix in the order of increasing eigenvalues. The magnitude of the eigenvalue corresponding to the eigenvector represents the contribution of the component. PCA is widely used because of its ability to provide a unique solution without parameter selection and its ease of interpretation. In various related fields, PCA is used for noise reduction and characteristic feature extraction from GC/MS, 126 EELS, 127 Raman, 128 and 13 C-NMR spectra. 129 To obtain a characteristic histogram, the histograms generated from each single trace were analysed using PCA, making it useful for spectral analysis. PCA is applied to Raman spectra measured simultaneously during single-molecule measurements. 115 Other dimensionality reduction methods include sparse PCA, which emphasises differences, and non-negative matrix factorisation (NMF), which always decomposes data using a vector whose components are non-negative. 116 Although NMF is mathematically incapable of defining a unique solution, it is convenient for interpreting physical data, such as conductance histograms and spectra constructed using only non-negative values. Nonlinear representation methods such as t-SNE and U-MAP are also useful for understanding complex data. 106,110 These methods are algorithms where the closer the distance between data in the original feature space, the closer the distance after dimensionality reduction. Hence, these methods are powerful for visualisation of multi-dimensional data. Neural networks have also been applied for dimensionality reduction. Autoencoders are neural networks in which the input and output layers are identical. Fig. 7b shows the network structure of autoencoder. The intermediate layer of an autoencoder has a lower dimension than the input layer, and the dimensionality is reduced nonlinearly by the intermediate layer after reconstructing the output from the input training data. 95,107 A deep learning-based noise reduction algorithm, known as Noise2Noise, 130 has successfully reduced noise in nanopore measurements, leading to improved comprehension. 131 Noise reduction and feature extraction through dimensionality reduction are effective tools for data visualization and improve data interpretability.

Clustering of dimension-reduced data
Dimensionality-reduction techniques are commonly used as pre-processing methods for clustering to address the curse of dimensionality, which refers to the increase in computational cost when clustering large input data. While supervised learning is the typical approach to discrimination, unsupervised learning techniques, such as clustering, are used as a form of supervised learning by transforming the density estimation problem into a supervised function approximation problem through the comparison of probability densities. 116 In singlemolecule measurement research, clustering with dimensionality reduction is used to solve discrimination problems, such as chemical reaction detection. 107 An autoencoder is applied for dimensionality reduction. The input and output are both I-z traces, and the number of nodes in the layer with the fewest nodes in between is less than the dimensions of the input data. The loss functions of the input and output were minimised to obtain the intermediate layer encoding the input layer, which served as the dimension-reduced feature for clustering using k-means. In the concentration-ratio identification of mixed solutions and the identification of chemical species during chemical reactions, the clustering method is applied for classification with the order of probability densities known. This technique enables the conversion of a conventional allaccumulated histogram with no clear peaks into two histograms with distinct peaks. In a Diels-Alder reaction system at a molecular junction, the ratio of one class decreases and that of the other class increases with time to represent the progress of the chemical reaction. Machine learning is used to analyse the individual trace information that disappears during histogram creation.
There is a wide range of techniques for selecting features, reducing dimensionality, and clustering, which have been extensively researched in the context of clustering I-z traces. 106 To evaluate these methods, several traces from the OPE experimental data were used to generate test traces with multiple classes. Various methods were employed, including 2D histograms, the method with reference vectors reported by Lemmer et al., 109 PCA, MDS, Samm, t-SNE, UMAP, and Autoencoder as dimensionality reduction techniques, and SOM, FCM, k-means, hierarchical, OPTICS, GMM, and GAL as clustering methods. The Folwkes-Mallows index was used to assess the similarity of the clustering results obtained from all combinations of the methods. The results indicated that GAL and GMM were the best clustering methods. Hierarchical clustering was not as effective, although it is sometimes easier to interpret from a physical and chemical property standpoint. Therefore, it is important to select an appropriate method based on the intended purpose. Feature selection was found to have a greater impact than clustering algorithms, with 2D histograms performing better than raw data. Nonlinear dimensionality reduction methods, such as t-SNE and UMAP, have been found to achieve higher accuracy. 106 These results highlight the importance of utilising analysis methods with complex representations of feature selection for the analysis of single-molecule measurement data.

Supervised learning
In supervised learning, the model is provided with both the training data and the correct answer, and it uses this information to predict the objective variable for unknown data. Regression, 132-135 which predicts continuous values, and classification, 33,136-147 which predicts categorical values, such as chemical species prediction, are included in supervised learning.

Regression
In related fields, regression has been employed to predict the performance of organic solar cell devices 148,149 and the toxicity of nanoparticles. 150 These prediction models aid in the establishment of efficient experimental procedures. Linear single regression, the most straightforward regression method, is widely used, including for single-molecule measurements. Machine learning regression employs multi-dimensional input data, such as vectors with multiple features that represent molecular fingerprints or converting molecular structures into vectors. 151 Furthermore, graph neural networks, which directly train and predict graphs, have been developed in the machine learning field. 152,153 The ability to predict physical properties directly from molecular structures, without the need for a chemist's input, is evolving, given that molecular structures can be represented as graphs.
In the field of single-molecule conduction, machine learning plays an increasingly important role in theoretical calculations to reduce calculation costs. 132,135 Convolutional neural networks trained on the outcomes of molecular dynamics (MD) and nonequilibrium green's function (NEGF) calculations are employed to predict the experimental conductance more efficiently than conventional, first-principles, direct calculations of transport properties. Additionally, the regression of experimental data for singlemolecule measurements has been reported, with a machine learning-based regression model constructed to predict the conductance of a molecule. 134 To obtain information about the molecule, descriptors are used as dependent variables, such as the gradient of kinetic energy, surface integral of kinetic energy, van der Waals surface area, bond stretch energy, and LUMO. 117 molecules were subjected to support vector machine (SVM)-based training and regression, yielding a correlation coefficient of 0.95 for the training set and 0.78 for the blind test set between the predicted and experimental conductance. The support vector regression is an extension of the SVM classifier, which determines discriminative boundaries to maximise the margin from each data point. SVM usually requires tuning of hyperparameters such as the regularisation parameter and the kernel selection. SVM is effective when the sample size is not large. Furthermore, a regression model was developed to predict the snapback distance and plateau length based on the trace geometry and conductance, with the aim of gaining insight into the phenomenon of singlemolecule junction breakage with XGBoost. 133 XGBoost is the algorithm that shows the highest accuracy of the many algorithms applied in NMR chemical shift prediction models. 154 The regression model was constructed using five features, including the maximum conductance at the metal junction formation and the length near 1G 0 , rather than more general features such as the histogram obtained from a single trace in the clustering section. The machine learning techniques utilised in this study, namely, XGBoost, Adaboost, and Random Forest, train many weak regressors with low accuracy on a subset of features and make predictions by majority voting of these regressors. The importance of these features was determined based on the errors of the weak regressors. The importance comparison of the feature values in this study revealed that the distance to a metal junction rupture is a crucial factor in snapback distance prediction, whereas the plateau length is not a particularly significant parameter. This supports a model in which the molecular junctions are formed before the gold atom junction breaks. Thus, supervised learning can be applied not only to quantitative prediction but also to the interpretation of physical phenomena.

Classification
Another common type of supervised learning involves classifying data into categories. Machine learning classification is widely employed in image recognition and other applications. 155,156 As an analytical technique, supervised learning for predicting a definite correct answer is a powerful means of object identification. Nanopore measurement is a technique that enables the measurement of a single particle by detecting the changes in ionic current that occur when a particle passes through a nanopore. [157][158][159] The data obtained through the I-t method, which involves continuously measuring the conductance while maintaining a fixed gap interval in single-molecule conductance measurements, are analogous to the current measurement results obtained in nanopore measurements, as the current changes are observed only when a single molecule or particle passes through the nanopore. In nanopore measurements, machine learning classification is used to learn and categorise the viral species to train the current profile. 160,161 DNA sequences are also analysed using recurrent neural network (RNN) to measure the current change when DNA was passed through the nanopore. 162 Machine learning has proven effective in identifying signals obtained from single-molecule measurements of DNA nucleobases and amino acids. Lindsay et al. utilised the STM-BJ and I-t methods of single-molecule measurements with molecular modifications to measure the conductance of nucleobases, achieving high accuracy in identifying DNA. 136 Single-molecule measurements, which directly measure the tunnelling current through a single molecule in a gap, have the potential to be applied to a variety of molecules. The conductance differences among amino acids were also observed using single-molecule measurement. 32 The same method was applied to amino acids, and using SVM, 33 they successfully identified D-Asn to L-Asn, Gly to mGly, and Leu to Ile with accuracies of 0.87, 0.95, and 0.80, respectively. This method classified the single-molecule signals by analysing cluster of signals not individual signals. One of the ultimate goals of single-molecule measurements of DNA, RNA, and amino acids is to develop sequencing methods to identify individual signals rather than groups of signals. Our research group identified individual single-molecule signals by classification with supervised learning, random forests. 137 For the machine learning analysis, the feature was the average of each region of the current signal partitioned along the time domain. The four DNA nucleotides were classified with an F-measure of 0.83. The F-measure is a performance measure for machine learning classification and is defined as the harmonic mean of sensitivity and specificity. This statistical process allowed the identification of single signals derived from single molecules. Furthermore, the targets of this method extend beyond the four DNA nucleobases to include modified bases that are expected to be cancer markers, oligo DNA with varying base lengths, and neurotransmitters. [141][142][143][144] This method also allows for singlemolecule measurements in cases where discrimination by conductance alone is ineffective in the presence of multiple molecules with similar conductance. Using this method, a mixture of modified bases and dG, which are cancer markers, was measured, and the obtained signals were individually discriminated using a machine-learning classifier trained on each solution as shown in Fig. 8. 143 Concentration ratios were determined by predicting the class ratios. Mixed solutions of modified bases and dG with concentration ratios of 1 : 3 and 3 : 1 were obtained, resulting in 1 : 4.0 and 2.7 : 1, respectively.
Neural networks with high expressive power when trained on big data have also been utilised for the identification of single-molecule measurement data. Venkataraman et al. converted each conductance trace obtained by the I-z method of STM-BJ into a conductance histogram, and trained the conductance histogram features with a neural network, resulting in 93% accuracy in discriminating molecular traces from tunnel traces. 140 In this study, the neural net classifier were trained with more than 100 000 traces and achieved high classification accuracy. This study demonstrates the potential of machine learning for the efficient analysis of large amounts of data. Classification using recurrent neural networks (RNNs) has also been reported. 146 In the I-z traces of the BJ method, metal and molecular junction breaks are often occasional, and the lengths of the data cannot be aligned. However, RNNs are applicable to variable-length input data such as speech identification. 163 In this method, RNNs were trained on normalised minimum cross-sectional time series data from MD simulations, and a class of experimental conductance traces was predicted. The results showed that RNNs classify variable-length traces and provide a tool for recognising characteristic motifs in traces that are difficult to find using simple data-selection algorithms. Table 1 summarises the results of machine learning of current data from single-molecule measurements. In general, deep learning is considered capable of handling big data. However, it is difficult to determine a general algorithm because the classification results depend on the nature of the data, such as the dimension and the similarity between classes. It is also important to increase the interpretability of the data or reduce computational costs, even if it slightly reduces discrimination accuracy. Clear purposes and the choice of a suitable method for the purposes are necessary.

Weakly supervised learning
We identified two main types of machine learning, supervised and unsupervised learning. However, there exists a third category called weakly supervised learning that integrates features from both the aforementioned categories. One example of weakly supervised learning is the positive and unlabelled data classification (PUC) approach. 164,165 Its objective is to identify data in the same manner as in supervised learning. However, unlike in supervised learning-where the training data include fully labelled data with known correct answers-in weakly supervised learning, the training is conducted without complete labels, that is, objective variables. The PUC was trained using two types of samples. The first sample included only one positive-signal class, similar to supervised learning, and the origin of the data was known. The second sample contained a mixture of positive and negative signals. However, the data were unlabelled and indistinguishable between the two classes. The PUC is trained on both samples and used to classify the positive and negative classes of the unlabelled data as represented in Fig. 9a. As the specific characteristics of single-molecule junctions are often unknown based on the available samples, this approach is useful because of its ability to identify unknown data within the available data. 124,137,144  Then, the class of unlabelled data is learned and predicted by weighting by labels using learned classifier to predict probability of labelled data. The algorithm can identify protein records that should be included in an incomplete specialized molecular biology database. 164 Fig . 9b shows an example of the application of PUC in singlemolecule measurements. Our research group has developed a novel approach to enhance the accuracy of DNA nucleotide identification by eliminating signals that are present even in blank measurements. 137 During single-molecule measurements, a telegraphic noise-like signal, which may be attributed to changes in the electrode structure or contamination, is occasionally observed even in blank measurements. These noise signals are also presumed to be present in DNA nucleotide measurements. However, it is difficult to distinguish between noise signals and signals derived from the sample. To address this issue, we utilised the PUC method to remove noise signals accurately. In this approach, the noise signals are classified as positive and the sample-derived signals are classified as negative. The blank data contained only the positive class, whereas the sample data included both the positive and negative classes that were unlabelled and unknown. Noise-derived signals were removed from the sample data by learning and discriminating using PUC. PUCbased noise reduction improves the discrimination accuracy described in supervised learning section. Although the conductance of noise signals is typically lower than that of molecular signals in such measurements, identifying molecular signals using a current criterion by extracting only the peak currents is feasible. However, this machine learning-based approach reduces the arbitrariness associated with criteria selection. Moreover, this method detects false negative signals originating from the sample. Our PUC-based analysis has the potential to identify signals that cannot be analysed using conventional methods. a Signals, clusters, and traces denotes I-t pulse signals, cluster of I-t pulse signals, I-z traces. b Performance index is F-measure. Furthermore, it is important to note that noise signal contamination is not solely caused by the nature of single-molecule measurements. Since single-molecule measurements provide the advantage of directly measuring molecules, a direct measurement technique for biological samples is highly desirable. Consequently, single-molecule measurements are often conducted in the presence of various molecules other than the target molecule, contributing to noise signals. Our research group successfully identified neurotransmitters in biological samples using PUC to learn from samples derived from both biological and pure substances. 144 By training pure solutions as positive and biological samples as unlabelled, the neurotransmitter signals in biological samples were extracted. The extracted neurotransmitter signals were then discriminated using supervised learning to obtain the concentration ratios of neurotransmitters in the biological samples. This approach is promising for analysing complex biological samples and enables the direct detection of target molecules.
PUC is a powerful analytical technique for identifying novel states. Its application to data obtained from single-molecule measurements has enabled quantitative evaluation of the aggregation ratio between small molecules and nucleobases. 124 In a solution containing a mixture of small molecules and nucleobases, both aggregated and unaggregated molecules are present. The experimental isolation of the aggregated state is difficult. The signals of the small molecules and nucleobases were separately measured as positive and the mixed solution as unlabelled. PUC can classify and detect signals of the aggregation state present only in the mixed solution. Through this analysis, we confirmed, at the single-molecule level, that the aggregation ratio was larger for small molecules with more hydrogen bonding sites for guanine. The number of associated states was determined by clustering the signals of guanine and guanine recognition molecules using unsupervised learning, and the optimal cluster was determined using the BIC. Singlemolecule measurements with machine learning-based analysis provide insights into molecular interactions at the microscopic level and the development of molecular design guidelines for new drugs. Therefore, machine learning has the potential to not only classify known labelled signals, but also contribute significantly to the discovery of unknown states.

Conclusions and future perspective
In summary, the use of machine learning to develop analytical techniques for single-molecule measurements has resulted in a substantial increase in the amount of information obtained beyond conductance, which is typically determined using conventional histogram-based analysis. Discriminating between multiple similar states obtained from the measurements of a single type of molecule, extracting characteristic features from multiple measurement data, and identifying the molecular species measured with single-molecule measurements have been achieved through machine learning-based analysis. These methods play a major role in understanding single-molecule conduction and utilising single-molecule measurements as a new biomolecule detection technique. The development of analytical techniques is essential for the ultimate goals of single-molecule measurement, such as the creation of molecular devices, investigation of novel phenomena at the nanoscale, and discovery of novel molecule detection, owing to the advantage of single-molecule resolution.

Unsupervised learning
Measurement of various physical properties, including thermal and vibrational spectra, are performed in single-molecule experiments. The impact of noise is expected to be significant during measurement due to the microquantity of these physical properties. Numerous noise reduction methods have been developed in the field of informatics and applied to chemical measurements. [126][127][128][129][130][131]166,167 Basic techniques involve dimensionality reduction via PCA, [126][127][128][129] whereas more advanced techniques include dimensionality reduction using autoencoders and Noise2Noise mentioned above. 167,168 Noise reduction methods can further be applied to single-molecule experiments.
Since single-molecule measurement data often consists of various types of current traces, clustering is useful for understanding single-molecule phenomena. Since clustering is a heuristic method with no explicit answer, it is essential to select appropriate features. It is desirable to establish an appropriate feature selection method according to the physical properties of the target molecules. Furthermore, clustering methods that directly calculate the distance or similarity between current traces will be utilized for proper interpretation.

Supervised learning
Classification and identification of single-molecule current data is expected to expand to a variety of measurement targets. Further research is necessary for practical applications. It is desirable to identify nucleobases and amino acids in DNA, RNA, and protein sequences. In addition to the conventional identification of individual molecules, identification technology for molecules in sequences is essential. Furthermore, since generalization performance needs to be improved for a wide range of applications, it is necessary to eliminate differences among devices. For this purpose, refinement of the device fabrication process or learning of large-scale data including device differences will be effective. Physical insights gained from feature-dependence in discrimination accuracy would also be helpful for versatile application.
Regression models are useful for investigation for highperformance single-molecule junction as molecular devices or identifying the origin of the unknown signals found with PUC. Although regression model for predicting the single-molecule conductance has already been reported, 134 a more precise model is necessary and can be achieved by training on a larger dataset. To enable the identification of unknown substances, a machine learning model capable of analysing complex data is required to develop a data assimilation method and a large database of single-molecule conductance measured using a precise single-molecule measurement methodology.
The purpose of applying regression models is not only to predict precise values from large data sets, but also to efficiently search for optimal conditions from small data sets. In the related fields, Gaussian process regression has been reported to obtain high-yield or optimized results with minimal trials of experiments or calculations. 169,170 Application of Gaussian process regression to measurement conditions and device fabrication method leads to more efficient and stable experiments.

Weakly supervised learning
Single-molecule junctions possess two metal-molecule interfaces, with the metal surface displaying properties distinct from the bulk, such as catalytic activity. 13 Additionally, an immense electric field is applied by a bias voltage in the nanometre-scale gap. 47 The unique nature of single-molecule junctions is expected to yield unparalleled chemical reactions. The previous study reported the amplification of chemical reaction rates of Diels-Alder reaction attributable to the electric field across the nanogap using the STM-BJ method. 47 In this study, molecule identification before and after the reaction is solely predicted on conductance. The implementation of machine learning-based analytical techniques will improve discrimination accuracy and facilitate the discovery of unknown phenomena. New techniques to identify novel phenomena, such as PUC, are applicable for discovering new chemical reactions, not only for the determination of reaction rates of the same reaction as in bulk. Furthermore, these methods are helpful in identifying molecules that perform specific roles in a sample comprising a multitude of molecules.

Application of novel methods
In related fields, there have been efforts to efficiently search for optimal experimental conditions through the utilisation of reinforcement learning. 171,172 These applications have the potential to aid in the discovery of appropriate experimental conditions for single-molecule measurements and facilitate the generation of more reliable data.
Machine learning has significantly improved the accuracy of discrimination in single-molecule measurements. In addition to expanding the applications of analytical techniques, exploring suitable experimental environments for measurement and analysis is becoming increasingly important. We previously demonstrated that modifying nanogap electrodes improves identification accuracy, even for molecules that cannot be distinguished using conventional machine learning methods alone. 36 Further progress in both statistical analysis method and novel and precise measurement technique development will be necessary to achieve these goals.

Author contributions
Conceptualization,Y. K. and M. T.; Visualization, Y. K. and J. R.;Writing original draft, Y. K.; writing -review & editing Y. K. and M. T. All authorshave approved the final version of the manuscript.

Conflicts of interest
There are no conflicts to declare.