Machine learning techniques applied for detection of nanoparticles on surfaces using Coherent Fourier Scatterometry

We present an efficient machine learning framework for detection and classification of nanoparticles on surfaces that are detected in the far-field with Coherent Fourier Scatterometry (CFS). We study silicon wafers contaminated with spherical polystyrene (PSL) nanoparticles (with diameters down to $\lambda/8$). Starting from the raw data, the proposed framework does the pre-processing and particle search. Further, the unsupervised clustering algorithms, such as K-means and DBSCAN, are customized to be used to define the groups of signals that are attributed to a single scatterer. Finally, the particle count versus particle size histogram is generated. The challenging cases of the high density of scatterers, noise and drift in the dataset are treated. We take advantage of the prior information on the size of the scatterers to minimize the false-detections and as a consequence, provide higher discrimination ability and more accurate particle counting. Numerical and real experiments are conducted to demonstrate the performance of the proposed search and cluster-assessment techniques. Our results illustrate that the proposed algorithm can detect surface contaminants correctly and effectively.


Introduction
Much research on detection and localization of deep-subwavelength objects based on optical scattering has been done, covering a wide range of particle types such as viruses, bacteria, dust and nanofabricated features [1][2][3][4].Regardless of the various approaches, the physical principle that underlies these studies remains the same.By analysing the light that is scattered to the far field after being reflected from a surface containing nanoparticles or other types of contamination, one tries to get information on the density, size, material of these nanoparticles [5].In the context of the semiconductor industry, we can think of unwanted contamination on the silicon wafers in the nanometer-size scale.This contamination can occur at different stages of the lithography process, and it is important to check blank or patterned wafer as well as the mask (reticle) itself.The reticle quality and reticle defects continue to be a top industry risk [6].To ensure the quality and high yield in semiconductor manufacturing, contamination due to isolated particles in the size range of from 20 nm to 1 in diameter should be detected and, if possible, localised and removed.
The main techniques to study these nanometer-size features are scanning electron (SEM), dark and bright field microscopes.For electrically conductive materials, the surface analysis in the reflection mode is straightforward with SEM.If the scattering objects are buried inside the structures, transmission electron microscope (TEM) or scanning TEM (STEM) using a beam or a focused spot of electrons can be used [7].With these techniques, sub-nanometer resolution can be achieved; however, it is hard to implement them in the production line, and generally, these techniques are considered to be slow.In addition, if relatively high beam current and acceleration voltage for the electrons are used, the analysis with SEM can also produce cracks on the surface or permanent thermal damage.
Subsequently, dark-field techniques, where only the scattered portion of the light is captured, are powerful tools for high-throughput analysis.The state-of-art systems work with bare wafers, smooth and rough films, and deliver defect detection sensitivity aimed at the 7 nm logic and advanced memory device nodes [8].Since the direct reflected light is eliminated from the measured field, the incident power has to be high to produce enough scattering and sufficient signal to noise ratio (SNR).Hence, similar to SEM, in dark-field measurements, there is a potential to alter or damage the sample under study due to thermal effects [9].
Bright field techniques, where the reflected and the scattered light from the surface are measured, solve the issue of the sample damaging since they use very low incident power.However, similarly to dark-field, the small inherent scattering and consequently low SNR renders the limit of the sensitivity.In this context, it is hard to detect tiny particle sizes, with diameters < 100 nm in bright field mode using the visible wavelengths.
To solve this issue and to allow for the detection of such particle sizes, researchers have proposed various methods including interferometric ones such as label-free interference reflectance imaging of IRIS or ISCAT [10] and non-conventional sensing with optical forces in optical pseudoelectrodynamics microscopy (OPEM) [2].Another family of techniques that are suitable to study nanotechnology materials in far-field is based on Quantitative Phase Imaging (QPI) [11].The method of optical interferometric microscopy, in particular, has demonstrated an outstanding result in detecting 20 nm wide defects in patterned wafers [12].A volumetric (3D) analysis for processing focus-resolved images of defects is enabled via the combination of scattered field optical microscopy and through-focus scanning optical microscopy.The results include the detection of sub-20 nm patterned defects [13].Alternatively, one can obtain high sensitivity and low power of the illumination by measuring the light that is scattered from the particles to the far field in a smart way such that the SNR can be improved as compared to dark field techniques.This is the core of the technique used in this paper, namely Coherent Fourier Scatterometry (CFS): it is a low cost, robust, and suitable for the detection of polystyrene latex (PSL) nanoparticles down to 50 nm in diameter, and possibly even smaller ones [14][15][16][17].
For the detection of very small particles using CFS, it is crucial to optimize the entire system.Reliable numerical tools have been developed to understand the parameters that could influence the scattering process such as polarization, beam shaping, and how the data should be collected.Experimentally, besides a robust design, improvements directed to the detection system (such as noise suppression by introducing heterodyne detection system and beam shaping [14,18]) have been implemented.At last, the data processing is of extreme importance, and this is the main subject of this paper.
The scatterometry data become useless if the algorithm that treats the data cannot effectively discriminate between different sizes of the particles present on a particular surface.One complicating factor is that, besides the inherent noise related to the detection of light, in a production environment, data can be corrupted with several other sources of noise and artefacts.In addition, the presence of extensive size-range contamination severely complicates the analysis of individual particles.In the worst-case scenario, if the measured data is examined in the wrong way, it can lead, for example, in the case of lithography, to a drop in system productivity.Finally, taking into account the growing amount of data, techniques such as CFS lack the tools of being able to process raw data sets automatically and effectively.Recently, to overcome the challenges of detection and classification of smaller particles, machine learning methods, including the regularized matrix-based imaging framework [19], principal component analysis [20], and convolutional neural networks [21] were applied to image-based defect detection.
The objective of this paper is to develop a full framework for particle size classification in scatterometry data consisting of pre-processing, signal search and histogram formation with an algorithm that can be directly targeted at data that is corrupted with noise and drift, as well as including mixed-size particles per sample.For this framework, we relied on the established noiseremoval and unsupervised clustering techniques and adapted them to detect the nanoparticles.We developed a parametrized search by thresholding that picks the differential signal shape (raw data from the scatterometer) and relates it to the sought information (size distribution and location of the particles).By using these techniques, we show that the nanoparticles could be accurately quantified, even in the case of high densities.With sufficient resolution, a sample containing a mixture of nanoparticles with 60, 80 and 100 nm has been analysed in conditions where the data set had a lot of noise and drift due to the scanning issues related to the CFS tecnhique.Our framework enables the demanded automatic analysis of the scatterometry data and facilitates the validation of the detection results.
The paper is organized as follows.In Section 2, a brief overview of the measurement process and data is presented.Section 3 contains a description of sub-problems for the data set analysis.Section 4 describes the proposed algorithms of pre-processing, search, feature extraction, supervised clustering, and computational complexity.Section 5 shows the experimental results with the framework implementation and compares the accuracy of several classification algorithms incorporated in the scheme.In Section 6 and 7 we finalize the paper with discussions, conclusions, respectively.The summary of the functions used in this paper is given in Table 1.Measurements with CFS are done via raster-scanning a ≈ 1 tightly focused spot ( = 405 nm, = 0.9) over the surface of interest.The Fourier plane of the objective obtained in reflection is imaged on the balanced detector.The sample is mounted on a 3D piezo-electric stage whose position can be controlled with sub-nm precision (P-629.2CDby Physik Instrumente).To reduce the amount of recorded data, and to increase speed and SNR, the Fourier plane is divided into two halves (perpendicular to the scan direction) and subtracted from each other using a balanced split detector (see Figure 1).In this way, for every X-Y scanning position, only one current value is obtained and stored as one point in the 2D scatterometry map.The differential detection allows having high SNR because the contribution from the rough background is minimized.If any clean part of the surface is analyzed, the acquired signal is virtually zero.For the areas containing particles, the spurious reflected light from the surface and light that is scattered due to the particle interferes at the detector.The total far field in the presence of the particle will become asymmetric as the particle is scanned through the focused beam.This implies that the left half of the field in the pupil is different from the right half, generating a nonzero photocurrent at the split detector, The recorded signals from the photodetector are the basis for the scattered maps (2D X-Y distributions) [22].One of the significant advantages of the CFS approach is its high-sensitivity in localizing the centre of the particle in both transverse and longitudinal , planes.When the probe is focused on the interface, by scanning a spherical nanoparticle on the surface will render the so-called balanced pulse (positive and negative lobes of equal intensities, see Figure 2 B)).The zero-crossing of this pulse refers to the perfect alignment between the centre of the nanoparticle and the focused spot [23] (green point in Figure 2 B)).The effect of the defocus produces unbalance of the signal as well as a drop in the SNR drop, as it has been demonstrated in Ref. [23].nm.On the left -the first scanning line coincides with the edge of the nanoparticle, and consequently the differential signal will appear in 5 consecutive lines, with the red lines providing small amplitude of the signal.On the right -if there is an offset between the first scanning line and the edge of the particle, the signal due to this particle will be spread in fewer lines.B) A typical amplitude distribution of the recorded differential signal in time as one line containing the particle is scanned in the X direction -first maximum and then minimum when the positive lobe is equal to the negative lobe (balanced signal).The time axes is related to the X axis (length) by = ., with being the scanning speed and the length of one scan line.Red lines constrain the features of width and amplitude − of the differential signal.C) An example of a particle response as a collection of subsequent scans in X (separated by Δ ) is called a scattered map.
When analysing the surface in a raster scanning fashion, one needs to choose the proper Δ displacement step between the parallel lines of scanning.The bigger the step, the lesser the time it takes to cover the complete area, but the downside is that particles can be missed.The simplified picture showing the relationship between the scanning step Δ and the particle size is shown in Figure 2 A. Naturally, if one wants to detect contamination of e.g.50 nm, the step between lines of the scanning should be set lower than the particle diameter, for instance Δ = 10 nm.For relatively big particles, one can expect that every time the probe interacts with the particle, a high-enough scattering will be produced and thus the estimate for the amount of lines where the particle is visible equals to = /Δ , with being the diameter of the particle.N.B. the SNR is the defining factor for the effective amount of lines and the influence area of the particle is bigger than the physical size as according to its scattering cross-section.Yet, the outlined picture highlights the idea that there is a certain minimum and maximum expected number of beneficial signals that would emerge from using different settings of Δ .For tiny particles < 80 nm, using the wavelength = 405 nm, one can expect that the scanning lines that would go through the edges of the particle have much less SNR (see Figure 2 A and C), because the amplitude of the scattering becomes small as the focused spot goes away from the center of the particle.Furthermore, mismatches between the position of the particle and the scanning step might occur (dashed lines Figure 2 A) and in this case, even fewer scan lines containing signals due to the particle are obtained.The rule of thumb is to have at least two signals that come from an isolated particle that is distinguished from the background.Finally, the nanoparticles are generally classified based on their dimensionality, where the size of the calibrated sphere is associated with the features of − amplitude and time-width of the measured differential signal (Figure 2 B).The time axes is related to the X axis (length) by = ., with being the scanning speed and the length of one scan line.

Sub-problems
The task of detection and classifying the particles using scatterometry data can be split into sub-problems.In this section, we discuss these sub-problems: pre-processing of data, finding the particle-like signals, estimation of the width, cluster assessment.

Pre-processing
The goal of the pre-processing task is to prepare the raw sampled data for further steps.Commonly, a DC bias and sometimes baseline fluctuations in the signal at the detector can occur due to vibrations and other experimental factors.For the removal of the various electronic noise, low-pass (LP), notch filtering and wavelet-based subtraction were applied.

Selection of suitable Amplitude and Width
We use two parameters for the object detection: the amplitude ( − /2) and the width of the complete differential signal due to a single particle in the time-domain (see Figure 2 B).We look for an algorithm that is robust to non-particle signals that can be present in the data.Examples of such signals include environmental vibrations or large defects present on the surface of the material, which we can consider as false-detections.
In the scan direction , multiple particles may be present on one scan line since the density of particles can be high in some areas of the surface.Multiple maxima and minima needs to be determined on one scan line, sorted and the relative distance between different signals needs to be determined.Each "transition" between maxima and minima is associated with the corresponding zero-crossing position at the middle of the signal.Finally, fine adjustment is needed to define the width of the particle-like signal accurately; this is done by parametrizing it such that it can be distinguished from noise or another signal in the data set.Next, one needs to take care of the particle signal appearing at the border of the scan line.In this case, if the signal was sampled for one of the two lobes (positive or negative), the algorithm should estimate the complete width of the pulse.

Multiple line particle detection identification
A particle-like signal is distinguished from a false detection if the centroid of the signals (position of the zero between maxima and minima) have the same X position over multiple lines (see Figure 2 C).A false detection is identified when the particle-like signal is observed in only one scan line, and is further removed from the data.Finally, per signal group, the most clear particle-like signal and its features (see Figure 2 C) are stored for the histogram.The pulse with the biggest − is a good representative because it corresponds to the centre of the particle in the X and Y directions.
There are many well-known algorithms for cluster determination, such as hierarchical clustering, K-means, DBSCAN [24][25][26].Almost every clustering algorithm can be tuned to penalize one error more than the other according to the requirements.For instance, we can use the predefined vertical step of Δ and set the expected number of zero-crossings associated with a single particle.Additionally, the clusters of zero-crossings (pair of X and Y coordinates) can have a characteristic spread of .

Algorithm
In this section, we show the specific tools and algorithms we have used to solve the sub-problems showed above.We also mention the computational efficiency and some other aspects of the algorithms.

Pre-processing
Among various noise sources that might be present in our experiment [14], the power line interference and the baseline wandering can strongly affect the further detection and classification of particle signals.The 50-60 Hz local power-line frequency (bandwidth of < 1 Hz) can be mostly removed by analogue hardware during acquisition, and the remaining noise is removed digitally using the notch filter.However, the baseline wandering is not easy to be suppressed by analogue circuits.Hence, we take the notch-filtered waveform and subtract the wavelet decomposed version of the same signal to recover the clean particle signal (more details in Appendix: The pre-process filters).This step effectively introduces the point by point correction to the wandering profile.Finally, an average filter (LP) is applied to remove glitches.This approach can be considered as more rigorous because it relies on the sampling frequency used in the experiment.The routine is based on a contribution from reference [27].A less accurate way of dealing with the offset in the data can be MatLab's detrend function that removes the best straight-fit line from the data in a vector of the sampled points.

Hyperparameters:
, τ, , These are user-defined parameters of expected threshold amplitude and width τ.Since the multiple expected amplitudes and widths are passed iteratively, the results from the previous search should not translate to the consecutive one.Let's consider the 2D measured data with each row representing a single scan line (Y) and column representing the sampling point over the width (X of the Figure 2 C).The differential signal at the detector for the = 4 scanning lines and with = 4 samples in horizontal direction of scan is given in Eq. 1 The parameters of , represent the half-width and length of the region to be zeroed w.r.t reference sampling point.Hence, for measured data, if 33 is the reference position (centre of the particle), the = 1 and = 3 dataset becomes: Thus by, and , the user can zero the lines that are close to the reference detected particle.Per line, the algorithm looks for the multiple peaks and minima and checks whether their absolute values fall under the amplitude .The retrieval of secondary peaks and minima allows increasing the overall accuracy of the algorithm.By default, we assume every particle-like looking signal to be a forward signal (Figure 2 B)).The reverse signal can be stored separately or included in the estimation process.Some key reasonings are highlighted in the following bullet-points and also shown in Figure 3. • Define whether the signal is forward (maxima appears before minima), choose between ignoring or flipping the reverse pulses.Check whether the distance between 2 indices ( 2 − 1) < fits the condition of the time-width.Store True of False for the second condition.
• When both conditions are true, a particle is roughly detected.We calculate the position of the particle's signal zero-crossing by taking the average between maxima and minima position = ( ( 1 + 2)/2) (ceil function rounds towards plus infinity), perform the fine adjustment (next section) and remove the signal from the data set.The is global parameter that represents half-distance in indices to replace with zeroes about the of the signal.The rule of thumb, in this case, is that region to be zeroed should not exceed the time width of the particles you are looking for.
• If only the amplitude condition is satisfied, the indices of multiple minima (above threshold) that belong to the current line are checked to fall closer to the 1 than 2. If other minimum falls closer, reassign the 2 and repeat the width check.If both conditions are satisfied, remove the signal from the data set and apply the .
• The multiple particle search routine is to find numerous maxima (above threshold), them in descending order (see Alg. 1), and, maximum by maximum, follow the steps outlined previously.If there are multiple particles on a single line, the algorithm returns X's corresponding to the particles middles.
Throughout this paper, we will use the terms "zero-crossing" and "middle" interchangeably, following the variable name of as defined in the MatLab software.The fine adjustment algorithm is to go from to ′ .
Fine adjustment is the part of the search process right after the width, and amplitude conditions are satisfied.We assume that from the position, the particles' signal occupies the same amount of samples on both sides of the signal (spherical object).The initial guess for the left margin of the signal is = − /2.To make sure that the signal doesn't go outside the indexing in Matlab, i.e., − /2 <= 0, we rewrite the left margin as index 1.In this case, in the procedure that follows, we should rely on the ℎ to be defined accurately and then is recomputed based on it.Analogously, the right margin is calculated as ℎ = + /2.If a signal appears close to the right border, we rewrite right margin as the last index of the sampled voltage vector.The crucial part of the fine-adjustment step is to cut-out the region of signal for zoom-in study, i.e. from to ℎ .The secondary minima of the cut-out region are checked to fall closer to the middle position, compared to the initial .If there is a closer point, it is redefined as ′ .The reason for this is an observation that typically there is a small dip in the signal preceding the quick rise in amplitude of the particle pulse.Further, we reassign the ℎ of signal to be the same separation as to the left , and one can notice that in our procedure the estimator can generalize outside the original size of the sampled vector.

Clustering of data from one single particle
The steps of the algorithm presented previously result in an array of coordinate pairs for (X,Y), that correspond to the position of the (zero-crossings) of each signal and has dimensionality 2× , where is the number of .Since one particle results in a few signals at consecutive lines of the scan, one should recognise a group of the particle-looking signals as a centroid that represents this specific particle.In this way, the particle-size distribution histogram will identify one particle on the sample corresponding to one cluster of signals.The centroid of the cluster will correspond to the line with the highest − of the cluster, and consequently the particle's center.
We modify a well-known machine learning algorithms of K-means and DBSCAN to recognize the clusters of the particle-looking signals and use prior information that can help to spot the isolated particles.One complicating factor that can be present in the data is the random drift between the lines when sampling.The drift manifests itself in the shift of the signal zero-crossing (see Fig. 2C) position in the direction between consecutive lines.In the Appendix Modified K-means, DBSCAN and comparison, we define several algorithms that can account for the drift in the data.We compare the computational complexity of the modified K-means to the algorithm of DBSCAN.Besides, we highlight the sensitivity of algorithms to initialization parameters.

Results
Throughout this section we experimentally study three different samples of the PSL particles spin-coated on the silicon wafer.The first sample includes particles with diameters of 50 nm, the second 100 nm, and the third a mixture of 60, 80 and 100 nm.The details on sample preparation are outlined in the Appendix: Preparation of the samples.

Pre-processing and search.
In the high-scale IC manufacturing, typically double side polished wafers are used.The block of pure crystalline silicon is diced and polished right before the deposition of the resist.Due to the lack of precision in the wafer holder, unstable rotation and heat deformation, the polishing can affect the flatness of the wafer.Additionally, the thickness of the wafer is not uniform across the sample [28].This effect mostly occurs at the edges of the wafer.Nevertheless, the scanners need to provide information over the entire wafer under the study.For sensing or particle detection applications using CFS, the probing light should be focused on the interface between air and top surface.Due to several experimental factors during the scanning, the baseline (differential signal when no particle is present) may fluctuate or drift from the expected zero value.Hence, occasionally, the data set might include DC offsets mixed with low frequency noise (baseline wadering) [29,30].This problem can be corrected as shown in the data presented in Figure 5 A) raw data (top), and with baseline correction (bottom).Further, the scattered map from the bottom Figure 5 A) is analysed with the search algorithm (Section 4.2) to produce the corrected data that is seen on the 5 B).Here we analyzed a random area from the calibrated sample and the histogram nicely peaks at the position of the = 7.05 [ms] that corresponds to PSL particle with 50 nm in diameter (see inset with the callibration curve) that agrees with the recipe of the first sample.For the area that contained only a few particles, one can notice a relatively high amount of counts, and this is because all the localized zero-crossings contribute to the output histogram.The SNR ratio for this dataset is low = 7.14 [dB] while the algorithm can still localize the particle detections, including the one that resides at the border of the scan, thus generalizing beyond the input data.
N.B.The particle classification in CFS is based on the width of a time-domain particle signal.The quantitative limit of the post-processing framework for discrimination between the different-size particles is defined by the accuracy of the fine-adjustment routine of Section 5.1.More specifically, in the ability to find the minima closest to the rising edge of the differential signal.If we assume the infinite sampling of the signal and low noise, there are virtually no limitations on how accurate the position of minima can be defined, aside from those emerging from the numerics or computational effort [31].On the practical side, there is a limitation in the manufacturing of the monodisperse PSLs.The target size of the particle diameter has the uncertainty in the range of 1 − 2 nm [32].

Comparing the accuracy of clustering routines on a data set with drift.
The source of the drift originates from the sampling at the detector being asynchronous process with respect to the piezo stage movement.When the piezo controller passes the initialization signal to the computer, the jitter and USB connection produce a random time delay before the sampling will actually start.One can mitigate the problem by introducing the constant waiting time (empirical estimate) at the piezo before the voltage will be increased (Figure 6 A)).Yet, the random nature of the delay will not be equal to the introduced .When faster scanning is performed, the drift in the data set gets worse.Figure 6 B) -E) shows the same isolated nanoparticle scanned at a different speeds: 100, 90 and 50 ms per line, demonstrating an increasing amount of the distortion in the data set.
We take the data corrupted with the drift and compare the accuracy (Eq. 3) of two clustering algorithms as the average result of 100 random initializations.
where is a number of detected clusters, hence isolated particles, and the actual amount of particles on the sample.This formula ignores the difference between the over-and under-estimate in the .We will use the non-drift corrected "image" as a ground truth for this comparison providing us the number for .The non-drift "image" is achieved by establishing a new synchronization approach with the trigger pulse generated at the piezo controller through analogue output upon each beginning and end of the scanning line.The first test is to use some of the global parameters such as , , , ℎ ℎ according to the reasoning outlined in the Section 4.4 and Appendix: Modified K-means, DBSCAN and comparison.Recommended hyperparameters come from showing the program once how the "good cluster" looks like.A number of 100 random initializations were needed to get an idea on how the K-means algorithm will suffer from random initialization, specifically the starting number of clusters and their positions are randomly initialized.On the contrary, the DBSCAN, regardless of initialization, always converges to the same result (see Figure 7).This test reveals that both algorithms can achieve relatively high accuracy > 70%.As it has been expected, accuracy on the data that contains less drift is higher and contains less uncertainty.On average, the accuracy does not exceed 84% for the case of the DBSCAN and the algorithm produces the same amount of clusters at every iteration.When the input data is shuffled, the only "non-deterministic" behaviour is in the label for the cluster being assigned, but not the composition of the cluster itself.The behaviour was firstly highlighted in the original paper of DBSCAN [26] where the authors claimed that convergence result is independent of the order in which the points of the database are visited expect the "rare" situations.This "rare" situations occur when border points belong simultaneously to two clusters.This border point will be assigned to the cluster that is considered first to avoid the overlap.In other words, there is always the same amount of density-reachable points from a reference point, hence the same amount of the assigned clusters is constant.
In the next test for both algorithms, the global parameters were manually adapted to yield higher accuracy (Figure 7 C)).The adjustments to the , the desired number of clusters in K-means can be set higher than the elbow method recommends, and for the DBSCAN algorithm the parameter is crucial.This test demonstrates that with the aid of completely manual tuning, higher accuracy > 80% for any type of data set can be achieved.Even more, the parameter in the DBSCAN can be chosen to recover the 100% accuracy on the data set with the minor drift.N.B.The average convergence time for the DBSCAN algorithm is 0.01 second and for the K-means algorithm 47 seconds on laptop Dell Inspiron 7577.

Benefit of the centroids re-assignment
For the domain of the semiconductor industry, specifically for the lithography process, it is crucial that cleaning can be performed if contamination above a certain size is present on the sample.In this way, for instance, the very small particles are of minor importance for the pellicle layer above the UV mask, and only if the bigger particles are present, cleaning action needs to be taken.In the absence of the pellicle, on the contrary, one should take care only about the small contamination landing on the mask [33].The quantitative description of the surface, provided by the surface scanner in this regard becomes very crucial.The confusion between the different sizes of the scatterers on the sample should be minimal.For our system, the width of the signal changes as scanning through the spherical particle is performed : it is highest when the scan line passes through the center of the particle and it is smaller in consecutive lines around the particle's center, as shown in Figure 8.In the first approximation, all detected signals can be fed to the histogram as it was done in Section 5.1.This approach would work properly if the data set would include a single particle size or if the contamination is reasonably different.Realistically, samples contain a wide range of particle sizes.If the pulses on the edges of the particle scan are included in the estimation histogram, they will contribute to the interclass confusion (classes represent diameters).In the Figure 9 we demonstrate the outputs from the signal search algorithm and corresponding clusters defined by the DBSCAN algorithm.This algorithm was chosen since the convergence time is faster than the modified K-means, and it had achieved higher accuracy at the previous test.The region of the sample under study is a good representative of the multi-class sample where additionally to the nominal 60 and 80 nm PSL particles, there are isolated particle-looking signals that are treated as outlier by the algorithm as well as the contamination of bigger particles ≈ 100 nm in diameter.The results of convergence by K-means algorithm for the same data set is presented in Appendix: The re-assignment of signal centroids by the K-means algorithm.The first approximation histogram includes the side detection from the class of the 100 and 80 nm contributing to the class of 60 nm as well as features between the classes and it seems that there is only a single class present in the data (see Fig. 9 C).When all signals that corresponds to one particle are clustered and only the highest − pulses from each cluster is assigned as being one particle (Fig. 9 B), we observe three separable classes in the histogram (Fig. 9 D), showing that this strategy solves the problem of particle size confusion.

Discussion
The approach of clustering the data has a downside, namely, the risk of losing the beneficial signals that correspond to very tiny particles.These particles may produce only one or two scan lines containing signals with sufficient SNR, if the selected step between the scan lines Δ is too big.To improve the sensitivity of the algorithm even further, a separate routine could reconsider the outliers.This step could include adding a collection of matched filters operating in the time domain to filter out signals with the expected duration.Alternatively, one can try to establish spectral differences between the particle and non-particle signals (multiple wavelength approach).
While this study considered the detection of polystyrene particles, the technique could also be applied to extract features from a measurement of particles of different materials.Scatterometry is not an imaging technique, and some other features (such as material) can be recovered if one can model them and obtain more diversity in the experimental data.For example, instead of only looking at the time spam of the particle signal (related to the size of the particle), one can add its magnitude, which is proportional to the diameter and material of the particle [34].Another example is the work of Potenza et al. [35] where using similar technique, they were able to recover the complex index of refraction of the particles, and in this way, reveal their material.DBSCAN can yield higher accuracy than the K-means subroutine in the case when the scale of the data is well understood.Also, the convergence of DBSCAN algorithm is fast.Nevertheless, there is still room for implementing the K-means routine because the sensitivity to the hyperparameters is much higher in case of the DBSCAN, including the complete failure in defining the clusters from the initial data (see the Appendix: Modified K-means, DBSCAN and comparison).The K-means, on the contrary, can be considered as a more robust algorithm that yields relatively high accuracy, and at any initialization will always define a certain amount of clusters.The K-means algorithm is scalable to large data sets while the DBSCAN can suffer from the curse of dimensionality [36].A final point to consider is when working on data sets with severe drift due to scanning, high density and wide range of the particle sizes, the DBSCAN can fail to cluster data [37].
Throughout the IC manufacturing process, large amounts of data need to be mined in a fully automated mode [38].With a growing amount of data, we can envision that the lineby-line analysis of the data set can become computationally slow.Also, the total amount of hyperparameters is significant.Search and clustering routine in total has up to 6 parameters fed by the user.Hence, in future work, we will explore the potential of methods for handling big data, such as deep learning and CNN [39,40].

Conclusions
We have demonstrated that Coherent Fourier scatterometry is capable of generating the 2D maps with the locations and sizes of PSL nanoparticles on a surface, down to particles with a diameter of 50 nm using low power illumiantion wavelength of = 405 nm (on the substrate, the input power is =∼ 0.026 mW).CFS relies on differential detection to minimize the contribution from the rough background, and uses photocurrent measurement in a raster-scanning regime generating a wealth of 2D data sets.
In this paper, we have developed a generalized framework that accurately extracts features of the differential signal produced by the scattering of a nanoparticle and uses these features for particle location and size determination.We have combined pre-processing with search algorithms based on the thresholding, such as peak-to-peak amplitude, and the width in time of the signal.The proposed method makes use of unsupervised clustering techniques to separate particles with high density on the samples.We adapt algorithms of DBSCAN and K-means and use them together with the simple prior.
We have tested the framework for data sets with high density of the particles, in the presence of large experimental noise and drift.The accuracy of the algorithm resulted in the 84% for the hyperparameters set semi-automatically, and the 100% accurate result for manually-tuned parameters.The algorithm of DBSCAN is a go-to solution because it works much faster than K-means.However, the latter is more robust because it is less sensitive to the change on the input parameters.
Finally, we would like to stress that while we tested the framework for the particular case of experimental data obtained with CFS, this method can be generalized to other experiments that involve measurements with differential detection, such as coherent time-addressed optical CDMA systems [41] and ferromagnetic resonance spectrometers (VNA-FMR) [42].In these techniques, the data set might include mechanical vibrations or other experimental fluctuations, similar to the drift studied in this paper.We believe that the proposed framework is an essential addition to the nanoparticle detection experimental community.(7) In Eq. 6, , ( ) is the scaling function, , ( ) is the wavelet function, ( ) are the scaling and ( ) are detailed coefficients.In this paper, the Daubechies 6 scaling and wavelet functions were used because it has been proved to be excellent in analysis of signals that contain baseline wandering [43,44].For computing the ( ) and ( ) coefficients, the low-pass (LP) and high-pass (HP) filters are being recursively applied to a signal.When the signal is processed for the first time, the HP filtered data gives the details and LP filtered data gives the scaling coefficients at level 1.The more times the filters are applied, the more detailed levels of the signal representation can be achieved.In this paper, we have used the decomposition level of = 10 and have applied the translation factor of = 8 for the scaling and wavelet function.The baseline wandering is removed and = ℎ − .Finally a simple moving average filter is applied according to Eq. 8.
The output signal ′ is a result of averaging the points in the input signal , and = 5 is the number of points used in the moving average.

Modified K-means, DBSCAN and comparison.
Given, for instance, the coordinates (X,Y), K-means clustering, can converge to a amount of clusters, among which per cluster we know the distance between each point and the position of the cluster centroid mean .The first two algorithms are used to treat the outliers in the clusters :

end if 11: end for
Where the metric , standard deviation , and average are computed according to Eqs. (9 -11).For a random variable vector M made up of N scalar observations, Example of such an algorithm (Alg.2) applied to an arbitrary cluster is shown in Figure 10 A).
The idea is to remove the points that are too far from the mean, and we use the constant of 10% decrease in standard deviation (STD) to reject the outliers.The initial 8 points in cluster are sorted in descending order by the distance from the mean .When removing the first two points, the metric > 0.1, but not when we remove the third, < 0.1 thus cluster will be reduced to the most packed 6 points.When the outliers are removed we want to reject the clusters that are overly spread for instance due to the drift.In our approach, the spread of particular cluster has to be below < ℎ ℎ and it is computed according to Algorithm 3.

Algorithm 3 Compute spread
The idea here is that the user selects a single cluster that with a big confidence corresponds to an isolated particle and passes the corresponding recommended limit of ℎ ℎ .The example of the thresholding by spread 10 B) shows that such a limit will be met by the set of black points but not by the red set.Finally, based on the geometric considerations outlined in Section 2 of the paper, we add a prior on the resultant amount of points that contribute to a single cluster.For a target particle diameter of the , the amount of the zero-crossing points ( , ) ≤ Δ .Alternatively, one can perform numerical simulations where the line-by-line scanning of the focused spot is done through the range of particle sizes and by combining it with the estimates for the characteristic noise present in the technique assess the maximum amount of lines.Further research on this issue would be of interest; however, it is beyond the scope of this study.
Next, the two popular algorithms of K-means and DBSCAN are described and compared to become the cluster initializers.

Classical K-means with prior Hyperparameters:
, ℎ ℎ We modify the K-means [25,45] such that it can accurately establish the isolated particles.For validation purposes, the density of spheres is crucial, thus the algorithm needs to overcome its inherent tendency to overestimate clusters.One difference to the original K-means is that we introduce the outliers.The outliers include: A) clusters with single particle; B) empty clusters; C) distant points previously included in a cluster.The option C) is treated by Algorithm 2. The second difference is conditioning of the assigned clusters.Clusters are considered to be valid if < (Figure 3 C)) and ′ ([ , ]) < ℎ ℎ , where ′ ([ , ]) is computed for the set of points, with a mean moved to the zero and the distance normalized to unity (Algorithm 3).After K-means convergence (one epoch), if there are points that fail on both conditions and ′ ([ , ]), they are passed through the K-means again.The algorithm stops if all points are assigned to either cluster or an outlier.In every epoch of the K-means, the optimal amount of clusters is defined from the elbow method based on the average of 3 random initializations.Hence, in our implementation, the K-means is described via following algorithm in pseudocode.end while 24: end procedure DBSCAN Hyperparameters: , The density-based clustering algorithm (DBSCAN) [26,46] has a straightforward advantage in taking care of the obscure points, such that all points that are not reachable from any other point are outliers or noise points.The two hyperparameters are inclusion radius and minimum number of points in the cluster .The input for the is straightforward, such that it can be any number between 1 < < .For the recommendation, we use the following routine:

Algorithm 4 K-means with prior
• Normalize the complete data set of to the unity, such that X = X (X) and Y = Y (Y) .• Select the set of points that with a high confidence forms a cluster, via visual inspection, = { X} and = { Y}.Center this cluster to the zero ( , ) (first two lines of Algorithm 3) • Determine the average distance between points.Includes computation of Euclidian distance between each pair of observations in separately X and Y and taking average of each vector.
As a result of K-means or DBSCAN, one can pick up the converged clusters and either: A) Pull the features of specifically for the highest − from corresponding pulses in a cluster; B) Average of the corresponding time-spans τ of points in clusters (Eq.12); C) The full amplitude of a signal itself − .
Where is number of points assigned to the cluster.Correspondingly, the centroids of clusters are stored for the mapping of the particle positions.

On computational complexity of algorithms. Comparing two algorithms
The classical K-means algorithm has a complexity ( ), where is the number of input points, is the desired number of clusters, and is the number of iterations needed for convergence.It is also observed that approximately ∝ [47].Hence, the effective time complexity becomes ( 2 ).The K-means is a greedy algorithm since it can produce both empty and over-populated clusters.Another drawback is the large dependence on the initialization of cluster centers.As according to the quadratic time complexity, it should not be used in extremely large data applications [48].Implementation of the K-means with prior in this paper has ( 2 • ) time complexity.In the DBSCAN implementation, for each of the points of the input data, we have at most one region query.Thus, the average run time complexity of DBSCAN is query of log times the amount of points , ( • ).

Sensitivity of K-means and DBSCAN algorithms
Sensitivity analysis was used to explore how the accuracy of algorithms would change with slight variations in the hyperparameters.The green point in every plot represents the most preferred initial value that yields the highest accuracy, while the offset from this point defines the sensitivity (see Fig. 11).

− , if < 0 ceilFig. 1 .
Fig. 1.Schematic of the experimental setup, showing the light path and the differential detection principle.To obtain the scatterometer maps, the sample is scanned in the x and y directions.For every X-Y position, the differential signal at the balanced detector is recorded.

Fig. 2 .
Fig. 2. A) 2D raster scanning procedure showing scan lines in the X direction separated by Δ = 10 nm in Y direction.The geometrical size of the studied particle is = 50

Fig. 3 .
Fig. 3. Block diagram of the signal search algorithm that starts with the signal line.

4. 3 .Fig. 4 .
Fig. 4. A sketch showing the margins of the signal separating it from the background.

Fig. 5 .
Fig. 5. A) Top -the side view (along Y) of the raw sampled data wherein the baseline wandering is present.Bottom -the corresponding data after the baseline wandering is removed.B) Top view on the same scan with the red points representing the detected zero-crossings.C) Histogram representing the particle size distribution, based on time width from detected pulses.The inset shows the calibration of size of the particle as a function of the time width of the signal.D) Example of the line from the data set, the dashed line is an initial guess for the time-width and left -L and right -R boundary is returned from the fine-adjustment step.The scan speed per line is such that a scan width of 20 microns in X takes 100 [ms].

Fig. 6 .
Fig. 6.A) The sampling done asynchronously, and the start sampling point (red cross) has fluctuation in time -above.The primitive of the voltage waveform for moving the piezo along one axis forth and back -below.The time-constant tries to match the start of the piezo movement (uprising edge of the waveform) with the beginning of the sampling.Example of the isolated nanoparticle with different amount of drift present.The non-drift image B), an increasing amount of drift from 100, 90, 50 ms scanning time per line, C), D) and E) correspondingly.

Fig. 7 .
Fig. 7.The true number of particles (ground truth) in red.Comparison of the DBSCAN (in blue) and modified K-means algorithms (in black) for the three levels of drift present.100 drift represents the least distorted data set, 90 drift dat aset with average distortion, 50 drift is the data set with severe distortion.Recommended in A) and tuned hyperparameters in C).Result of 81% accurate convergence by the DBSCAN for the case of 50 drift present B).

Fig. 8 .
Fig. 8. Sketch of the signal from an isolated spherical particle visible over three consecutive line scans.The red region represents the increase in the τ width of the signal when the scan line passes through the centre of the particle as compared to other consecutive lines ±Δ (signals as dotted lines).

3 Fig. 9 .
Fig. 9. A) The zero-crossings of the differential signals by the search algorithm and B) the corresponding isolated particles by converged DBSCAN.The data set includes minor drift where one scanning line of Δ = 25 takes 100 ms.C) Histogram when all particle-looking signals are taken into account and D) when the signals corresponding to one particle are clustered and only one signal (highest − ) is taken into account to represent one particle detection.

Fig. 10 .
Fig. 10.A) Per cluster the outliers are removed according to predefined 10% decrease in STD.Only the first two points will be moved to the outliers because the 3rd point is close together with the other points; B) Global parameter of ℎ ℎ in red and, per cluster, the estimated in green, is either below or outside the expected range; C) Maximum number of points per cluster , and < is the number of points in cluster

Fig. 11 .
Fig. 11.Sensitivity analysis of isolated changes of hyperparameters for K-means and DBSCAN algorithms.The accuracy changes as a function of ℎ ℎ and , given fixed = 13 and = 4 in A) and B) correspondingly and as a function of and , given ℎ ℎ = 0.056 and = 0.013 in C) and D).

Table 1 .
Glossary of the main functions used in the manuscript.