Deep learning-guided surface characterization for autonomous hydrogen lithography

As the development of atom scale devices transitions from novel, proof-of-concept demonstrations to state-of-the-art commercial applications, automated assembly of such devices must be implemented. Here we present an automation method for the identification of defects prior to atomic fabrication via hydrogen lithography using deep learning. We trained a convolutional neural network to locate and differentiate between surface features of the technologically relevant hydrogen-terminated silicon surface imaged using a scanning tunneling microscope. Once the positions and types of surface features are determined, the predefined atomic structures are patterned in a defect-free area. By training the network to differentiate between common defects we are able to avoid charged defects as well as edges of the patterning terraces. Augmentation with previously developed autonomous tip shaping and patterning modules allows for atomic scale lithography with minimal user intervention.


Introduction
With the miniaturization of complementary metal-oxide-semiconductor technology approaching its fundamental limit, attention has been focused on developing alternatives built at the atomic level [1][2][3]. If these devices are to be commercially viable, they must be built in a way that allows parallelized and automated fabrication. Scanning Probe Microscopy (SPM) has provided a means for several different varieties of atom-scale device fabrication including memory systems using a Cu/Cl system [4] or dangling bonds (DBs) on hydrogenterminated silicon (H-Si) [5], spin-based logic using Fe atoms on a Cu(111) surface [6], single-atom transistors using phosphorus dopants in silicon [7], and binary atomic wires and logic gates using DBs on the H-Si surface [8]. Despite the progress made in the design of these and other device concepts [4,[9][10][11], reliable device fabrication is usually limited by patterning accuracy or variability in the fabrication process. DBs on the H-Si surface have been shown to be rewritable [5,12,13] as well as stable at room temperature [14,15] making them an excellent candidate for atom scale devices.
The H-Si surface has found applications in the study of surface chemistry including self-directed growth of ordered multi-molecular lines [16,17] and reaction energetics [18]. The controllable desorption of hydrogen from the H-Si(100)−2×1 surface using the probe tip of a scanning tunneling microscope (STM) [19], allowed for more precise studies of surface chemistry [20] including fabrication of rudimentary devices [21,22]. With the continued study of DBs on H-Si surfaces, more complex and functional devices have been developed; however, complete automation of the hydrogen lithography process has been limited by three major factors. First, the probe requires continuous monitoring to ensure an atomically sharp patterning condition. This step was Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. recently automated using machine learning [23]. Second is automated hydrogen lithography error detection and correction through recently realized controlled hydrogen repassivation techniques [12,13]. Third is the automated characterization and localization of defects on the H-Si(100)−2×1 surface to assign an ideal patterning area by avoiding certain charged and uncharged defects which is the subject of the present work.
Defects found on hydrogen-terminated samples can take the form of sub-surface or surface charge centers which can affect the operation of nearby electric field sensitive atomic devices [24], or as non-charged surface irregularities which limit the space available for patterning. Manually locating and characterizing defects is quite labor intensive and depends on the random distribution of these defects and the cleanliness of the terminated sample. Initial attempts to automate surface defect recognition relied on fast Fourier transforms [25]; however, the characterization of defects was limited to a few of many different species. More recently, machine learning has been applied to assist in classification and analysis of surface structures using SPM [26][27][28][29][30], but it has yet to be applied to surface features of the H-Si(100)−2×1 surface. Here, we implement an encoder-decoder type convolutional neural network (CNN) [31][32][33] to locate and classify features on the surface. By using semantic segmentation [34,35], the neural network is trained to recognize a variety of charged and uncharged defects commonly found on the H-Si(100) surface. After implementing the model with existing patterning [5], and probe tip forming suites [23], full automation of the patterning process is achieved.
Crystalline silicon is tetravalent and forms a diamond lattice; each silicon atom shares four bonds, two above and two below the atom. At the (100) surface, two of these bonds are unsatisfied so the crystal reorganizes to a lower energy configuration. The addition of atomic hydrogen to the silicon surface during the annealing process results in the formation of one of three possible phases. The likelihood of forming such phases can be controlled by the annealing temperature at which the sample is prepared. On a silicon surface with (100) orientation, the 2×1 phase forms at ∼377°C, the 3×1 phase at ∼127°C, and the 1×1 phase below ∼20°C [14,36,37]. The most commonly used for DB patterning is the 2×1 reconstruction where each surface atom pairs with a neighboring surface atom to create a dimer pair. The dimer pairs are assembled in rows which run parallel to each other across the surface. The unsatisfied bond of each silicon atom can either be terminated with hydrogen or left vacant creating a DB. Although the preparation of the H-Si(100)−2×1 phase is well understood, some surface defects decorate the otherwise perfectly clean, defect-free surface. We are able to image the defects as well as clean H-Si(100) using a STM.
In order to train the CNN to recognize these surface defects, they must be labeled in a pixel-wise manner in the STM images. Our neural network is trained with seven different classes of defects. The first is regular, clean H-Si(100)−2×1 ( figure 1(a)). There are two types of charged defects labeled 'type 2' [24,38] (figure 1(b)), the origin of which is yet to be confirmed, and 'DB' or DB [19,39] (figure 1(c)). The remainder of the known surface defects are understood to exist in a neutral charge state and consist of diversely reconstructed H-Si, adatoms, and adsorbed molecules. Figure 1(d) shows a 'dihydride' in which two silicon atoms each bind to 2 hydrogen atoms instead of forming a dimer pair [40]. Figure 1(e) shows a 'step-edge' which is a drop in the surface height by one atomic layer. Dimer rows run perpendicular to the original direction above the step and the boundary of the step edge is often marked with 1 × 1 or 3 × 1 reconstruction [41][42][43]. Figures 1(f)-(j) show several different defects that either appear too infrequently in our training data to properly train the neural network, or are found too close to each other to properly segregate during data labeling (figure 1(k)). Defects of this type were assigned the label 'clustered'. It is hoped that future implementations of the network will allow for further segregation of the 'clustered' class into individual defect classes. Figure 1(l) shows the final label class, an adsorbed species, molecule, or cluster of atoms of unknown origin labeled as 'impurity'. These defects are thought to be something other than H-Si and can usually be reduced by eliminating any potential contaminants during sample preparation. Further discussion on the origins and structure of these defects including an investigation of individual defects of the 'clustered' class will be presented in a future work [44].

Methods
All experiments were performed using an Omicron Low Temperature STM operating at 4.5 K and ultrahigh vacuum (4 × 10 -11 Torr). Tips were electrochemically etched from polycrystalline tungsten wire and resistively heated in ultrahigh vacuum to remove surface adsorbates and oxide, and sharpened to a single atom apex using field ion microscopy [45]. In situ tip processing was performed by controlled tip contact with the surface [13,46,47]. Tip shaping parameters were the same as in [23].
Samples used were highly arsenic doped (1.5 × 10 19 atoms cm −3 ) Si(100). Samples were prepared by degassing at 600°C overnight followed by flash annealing at 1250°C. The samples were then terminated with hydrogen by exposing them to atomic hydrogen gas at 330°C. It should be noted that these sample preparation guidelines were only loosely followed for all samples shown in this paper in order to ensure that a significant number of surface defects were present on the sample.
Image and data acquisition was done using a Nanonis SPM controller and software. All training data was acquired at an imaging bias of either 1.3 or 1.4 V with a tunneling current of 50 pA. An empty states imaging bias was exclusively used due to the enhanced contrast around certain defects. Specifically, distinctions between charged and uncharged defects as well as dihydrides are much easier to notice when analyzing the empty states rather than the filled states images as shown in figure S2. The patterning automation routine was programmed in Python and Labview using the Nanonis programming interface library.

Results and discussion
Neural network achritecture The architecture of the CNN was implemented to support semantic segmentation of the images. Semantic segmentation allows for both the localization and classification of objects in images. This can be used in many applications where the network must make a distinction between different objects in an image including use in self-driving cars [49][50][51] and medical image analysis [52][53][54]. In our case, a distinction is made between the pixels that make up each of the labeled defects. We trained various CNN architectures ( figure S6) and implemented the one that shows the greatest performance in defect recognition (figure 2) (Labeled model 8 in the SI). An encoder-decoder type architecture is used which allows for higher order feature extraction while minimizing the number of trainable parameters [31,33,55]. Each encoder layer consists of two sets of a convolutional layer (3 × 3 kernel), batch normalization layer, and a 'relu' activation layer followed by a maxpooling layer (2 × 2 kernel). The number of convolutional filters doubles with each encoder layer starting with 32 filters reaching a maximum of 128 filters. Following the encoder layers, a series of decoder layers are applied to bring the output of the network to a size which matches the input. Each decoder layer consists of an upsampling layer (2 × 2 kernel), convolutional layer (3 × 3 kernel), batch normalization layer, another convolution layer (3 × 3 kernel) and batch normalization layer followed by a relu activation layer. The final output layer consists of a convolutional layer which uses 7 filters (3 × 3 kernel) followed by a softmax activation which produces a one to one mapping of the surface for each of the labeled classes.  figure S1, which is available online at stacks.iop.org/MLST/ 1/025001/mmedia.

Data set and training
The network training data set was compiled from 28 images (100 × 100 nm 2 with a resolution of 1024×1024). Each of the 28 images were divided into 64 smaller images (128 × 128). Each of the smaller images in the training set were rotated by 90°, 180°, and 270°as well as flipped along its axis and subsequently rotated increasing our training data by a factor of 8. Images were divided into training, testing, and validating images at a ratio of ∼2/3:1/6:1/6, respectively (corresponding to 9560:2384:2393 images). Although all images used in the training set are of the same size, the network was designed to take images of varying sizes as inputs. The only restriction to the input images is that they all have the same resolution of 1024 pixels/100 nm. This ensures that each convolutional filter will extract the same feature profiles on a variety of STM images. We utilized the Adam optimization algorithm [57] with learning rate of 0.01. Subsequent model retraining was done using a learning rate of 0.001 which very slightly improved network performance in this case. The networks were trained using a categorical cross entropy loss function. The network quality was assessed using a soft Dice loss function, to reduce the effect of the large class imbalance found in our training data [58].
Neural network performance A subset of the outputs of our fully trained model can be seen in figure 3. The clean H-Si label was left out because of overlapping boundaries with the defects. More examples of the predicted label outputs including clean H-Si can be seen in figures S3 and S4. The overall Dice score of the model is recorded at 0.86 which was calculated as a weighted average of the Dice score for each label. A full confusion matrix showing all individual Dice scores including other network architectures can be seen in figure S7. A large portion of the 0.14 inaccuracy can be attributed to the fact that we had multiple users labeling data without a standardized defect size in place. This effect can be seen when comparing the labeled test data set with the predicted labels ( figure S5). The edges of the labeled data are straight, while the predicted label edges show a much rougher border. One would expect that Figure 2. A representation of the CNN architecture used in this study. This architecture is selected by comparing the performance of different CNNs (table S1) as well as traditional machine learning methods (We include in the SI, our attempt at using SIFT features [56] to classify surface defects.) It consists of 3 convolutional encoder layers followed by 3 convolutional decoder layers. The final set of images is passed through one final convolutional layer followed by a softmax activation giving 7 separate images corresponding to each of the 7 labels. The output displayed here marks the clean H-Si in black.
if each of the defect types were traced with a constant label size, the predicted edges would better replicate the labeled edges. This effect can be seen in the confusion tables (figure S7). Lower scores are observed for defects with a high edge-to-surface pixel ratio (type 2, DBs, dihydrides) compared to defects with a lower edge-tosurface pixel ratio (H-Si). For our purposes, this does not present an issue as the size of the defects are much larger than the variation in the predicted defect edges.

Augmentation with scanning probe lithography
With the successful development of the neural network, it was implemented in the automation of hydrogen lithography. Figure 4 summarizes the current automation process. The user inputs a pre-designed pattern they wish to create (inset of figure 4(b)) and scales the coordinates of the scanner with the surface lattice parameters such that the DB pattern aligns with H-Si atoms on the surface. The user initiates the program and the SPM controller takes a scan of the sample with a resolution matching the training data. The image is fed to the neural network and an output image containing each of the defects is returned. In order to decrease the local electrostatic interactions of local charge defects with the DB pattern [24], an effective radius of ∼5 nm is applied to all type 2 defects on the surface. This increased spacing does not need to be applied to DBs since they are now routinely erasable. The same radius is applied to step-edges as well to allow space to ensure all subsequent DB patterns are made on the same step terrace. The program then identifies the area on the sample furthest from any defects that would support the pattern (white box in figure 4(a)). Once found, a smaller scan of the viable area is taken to confirm the dimer direction matches that of the pattern. If not, the pattern is aligned with the next best viable area until the dimer direction is correct. The smaller scan is then used to shift the pattern such that each DB lies directly above their corresponding H-Si atom on the surface. The program begins patterning by positioning the tip above the specified H-Si atoms and applying an initial bias pulse of 1.8 V with a pulse width of 10 ms. If the DB creation is unsuccessful, the pulse bias is increased by 0.1 V intervals (up to a max of 2.5 V) until the DB is created. The tip continues this process for all desired DBs until the pattern is complete ( figure 4(b)). Tip shaping takes place if the DB is not successfully created after the bias pulse has reached its maximum value. A full flow chart of the patterning program can be seen in the SI (figure S8) as well as additional patterning examples (figures S10-S13). The same procedure could be applied to more complicated fabrication schemes.

Conclusion
Continuing on the path to fully develop atomically-precise fabrication tools, we have successfully implemented a routine that can assess the quality of a sample, identify a suitable area that is free of defects, and execute a hydrogen lithography procedure. The routine is based on a CNN which uses semantic segmentation to locate and differentiate between certain charged and uncharged defects that inhibit the manufacturing process or potentially alter the operation of patterned devices. We have demonstrated the applicability of our approach by training the neural network with images of defects commonly found on the H-Si(100)-(2 × 1) surface. Hydrogen lithography was shown by patterning an 8 DB structure on the H-Si surface. It is envisioned that defect-free regions adequate for fabrication of functional logic or memory units comprised of roughly one hundred atoms will exist and that interconnections between such units will be custom routed so as to avoid defects. In this way, defect-free surface areas will be connected to form larger, effectively defect-free circuit blocks. In addition to avoiding defects, erasure of certain defects identified using the neural network is expected to become fully automated in future works. The techniques shown here are applicable to any type of device fabrication or lithography using any form of SPM as well as subsets of semiconductor device fabrication where the quality of the materials used must be assessed to optimize the fabrication process.