Learning motifs and their hierarchies in atomic resolution microscopy

Characterizing materials to atomic resolution and first-principles structure-property prediction are two pillars for accelerating functional materials discovery. However, we are still lacking a rapid, noise-robust framework to extract multilevel atomic structural motifs from complex materials to complement, inform, and guide our first-principles models. Here, we present a machine learning framework that rapidly extracts a hierarchy of complex structural motifs from atomically resolved images. We demonstrate how such motif hierarchies can rapidly reconstruct specimens with various defects. Abstracting complex specimens with simplified motifs enabled us to discover a previously unidentified structure in a Mo─V─Te─Nb polyoxometalate (POM) and quantify the relative disorder in a twisted bilayer MoS2. In addition, these motif hierarchies provide statistically grounded clues about the favored and frustrated pathways during self-assembly. The motifs and their hierarchies in our framework coarse-grain disorder in a manner that allows us to understand a much broader range of multiscale samples with functional imperfections and nontrivial topological phases.

Another example is POM, which comprises structural motifs of transition metal oxyanions, that can form a diverse range of three-dimensional (3D) enabling frameworks for catalysis, memory storage, and even nanomedical applications (21,22). For such complex systems, one often turns to exploratory high-resolution microscopy to discover novel structures in the laboratory.
Automatically discovering interpretable unseen structural motifs from atomic resolution images remains an unsolved challenge. Consequently, our ability to rapidly describe novel and complex structures from high-resolution but noisy, incomplete images is severely handicapped. This in turn breaks the real-time feedback for efficiently seeking relevant regions in complex samples, and for making timely decisions on sample characterization and preparation.
Machine Learning (ML) models can be trained to recognize specific types of structural motifs (23)(24)(25)(26)(27)(28)(29) presented at a particular range of resolution, rotations, translations, and noisy imaging conditions. In many cases, training these supervised models on experimental measurements still requires a sufficiently large corpus of labeled data, which often come from laborious labeling by a human expert or idealized forward models (e.g., simulators) with ground truth. However, it is a well-known issue (30)(31)(32) that applying these ML models to slightly different samples and/or presentations can produce wrong predictions. Furthermore, while feature-extraction has been automated for some atomic resolution micrographs (33)(34)(35), in some cases these output features are abstract and not readily interpretable as structural motifs (33,36).
Clustering is an established and powerful form of unsupervised learning that does not require labels.
These models rapidly yield feature classes that are readily interpretable. Furthermore, such clustering can be noise-robust when feature classes are formed by signal averaging noisy and incomplete observations of potential class members. Yet these unsupervised ML models do not efficiently model the spatial context of the derived features. Just like this paragraph is not merely a "bag-of-words" (37), a high-resolution sample is more than a collection of structural motifs. The spatial context surrounding each motif, and the arrangement of neighboring motifs are crucial. Such spatial context gives us crucial information about how these motifs self-assemble, and shed light on mechanisms that encourage or frustrate them.
For such context-aware motif learning, we are inspired by artificial deep neural networks' (DNN) powerful ability to model complex relationships between features: Convolutional Neural Networks can learn spatial relationships between nearby pixels in a hierarchical fashion (38,39); and Natural Language Processing models can learn compositional rules of words, phrases, and fragments (40,41). However, training such supervised DNNs needs labeled data, which is typically derived from laborious manual labeling of atomic resolution micrographs. The label-free alternative using unsupervised DNNs learns features that are not optimized to be human-interpretable, and hence not primed for insightful co-discovery with humans.
Without using DNNs, it is possible to augment a clustering-based unsupervised classifier with the ability to learn hierarchical relationships between structural motifs. Hence, the representations learned by such a model are readily interpretable as motifs and their contextual hierarchies. Here we adopt a classify-then-compose framework that analogously decodes the fundamental motifs ("building blocks") of a micrograph using unsupervised learning, from which more complex motifs are hierarchically and interpretably constructed. Such a bottom-up approach can be useful for context-aware learning of unseen, complex structural motifs with minimal to no supervision. The learning objective is twofold: rapidly obtain human-interpretable features for structural motifs obtained from a noisy atomic resolution image, and rapidly characterize the compositional rules of these motifs.

Results
The first step of this framework is to extract structural motifs within the sample, some of which might be previously unknown. Importantly, periodic atomic structures can be described by a finite number of atom-centered motifs (Fig. S1). In realistic samples and imaging conditions, however, each of these motifs will have additional internal parameters that describe imaging uncertainties (e.g. measurement errors), strain fields, or other latent factors ( Fig. S2 and Fig. S3). Complex motifs can in turn be constructed from simpler motifs, to describe longer-range order within complex samples, and systematically create a taxonomy for classifying increasingly complex motifs.
We illustrate the four steps of our learning framework using the annular dark-field scanning transmission electron microscopy (ADF-STEM) image of monolayer MoSe2 in Figure 1A. The first step is patch extraction, where we programmatically extract all fixed-size image patches that are centered on all the visible atom columns (Fig. 1B) by seeking regions of high local contrast or symmetry. The second step is feature extraction, where the features in these patches are projected onto the Zernike polynomial (ZP) bases (Fig. 1C). In the third step, these Zernike features are grouped into different motifs using a multistage force-relaxed (FR) clustering algorithm (Fig. 1D to 1G), which we describe below (also see Methods and Materials, Movie. S1). And finally, a hierarchy of increasingly complex motifs is automatically constructed (Fig. 1H), from which the original sample is reconstructed (Fig. 1I). This hierarchy building towards more complex motif combinations is elaborated in Figure 2.
The second step of this framework (Fig. 1C) uses ZPs to reduce the representation of the configurations and shapes of the atom columns in pixelated image patches. This is in contrast to the dimensionality reduction for motif building by Belianinov et al.(34), where only atomic column configurations were preserved. This Zernike representation offers three key advantages. First, the completeness and orthogonality of ZPs guarantee that any square-integrable function on the unit circle can be decomposed into a linear combination of ZPs with coefficients named Zernike moments without redundancy (Fig. S4). In Figure 1C these Zernike moments ( ) are uniquely indexed by either the (p,q) tuple, or using the equivalent single index j (Table S1). Second, ZPs decomposition effectively rejects uninformative high spatial frequency noise (Fig. S5). We also observed that ZPs are demonstrably more efficient than other methods in reconstructing the ground truth ( Fig. S6) and improving clustering performance ( Fig. S7 and Fig. S8) with different types and amounts of noise. Finally, rotational symmetries within image patches are self-evident in the Zernike representation, which can be selectively turned on or off to determine relationships between class averages (compare Fig. 1D and 1F against Fig. 1E and 1G). Hence, Zernike features are robust against rotational uncertainty. Our time and space complexity analysis (Fig. S9) also shows that computation Zernike moments via matrix approximation (Fig. S10) are about 7.8 times faster and 1.5 times more memory efficient than PCA (scikit-learn implementation).
Classifying noisy unseen motifs together based on similarities between their dominant Zernike features creates average motifs with even less noise, making downstream labeling more robust.
Automatically creating such class averages is critical in the pre-processing of very noisy cryoelectron micrographs(42) and x-ray diffraction patterns (43). To do this classification flexibly we introduce a force-relaxed (FR) clustering scheme, which generalizes the efficient uniform manifold approximation and projection (UMAP) scheme as a multistage force-based clustering algorithm ( Fig. S11, Movie. S1). Following the comparison workflow in Fig. S12, our FR scheme shows comparable performance with t-SNE (44) and UMAP (45) with balanced datasets (Fig. S13), and outperforms these two with imbalanced datasets (Fig. S14). The resultant FR layout ( Fig. 1F and   1G) can be flexibly adapted to datasets with either discrete motif classes or manifolds of motifs by tuning the relative strengths and schedules of these forces (see Materials and Methods).
Our image patches typically enclose the neighboring atomic columns. This is an efficient way of extracting single-atomic column defects without supervision, which readily appear at the FR layouts ( Fig. 1F and 1G). These motif labels that are automatically derived from this classification, plus the relative spatial locations between pairs of motifs, can be hierarchically composed to discover simple relationships between structural motifs (Fig. 1H).
A hierarchy of the structural motifs found in Fig. 1 can be automatically constructed with the steps described in Fig. 2. This construction is most readily done when these motifs fall within a lattice.
We perform a second round of unsupervised classification where cells on this lattice are classified by their motif compositions (Fig. 2B). The resultant motif-cells, which deviate from the perfect crystal, are in turn programmatically connected to similar adjacent motif-cells (Fig. 2B, 2C). A hierarchy can be automatically formed from these connected (but spatially isolated) motif-cells, where the relationships between and occurrence rates of these motif-cells become apparent (Fig.   2D). Details are discussed in the Methods section.
The analyses above only require several seconds on a modest desktop computer (Table S2). Hence, it can be easily adapted to give live feedback at electron microscopes.
This framework readily generalizes to crystalline samples with higher defect densities ( Figure 3).
An example of such is a monolayer WS2 doped with Fe and Te atoms (Fig. 3A), where the FRclustering of the Zernike features rapidly identifies more motif classes (Fig. 3D). Using only the spatial relationships between these motifs, more complex motifs can be hierarchically composed ( Fig. 3E): Fe-centered defects that are surrounded by three S columns, which are in turn fenced in by W-columns; Te-centered motifs that are surrounded by three W-columns, which are in turn fenced in by S-centered columns. The original ADF-STEM micrograph can be readily reconstructed with these motifs (Fig. 3B and 3C), where motifs of the background crystalline WS2 monolayer are hidden for clarity. A hierarchy of motif-cells, similar to but more complicated than that in Fig. 2, can be reconstructed for this sample (see Fig. S15).
Our framework can create powerful annotations for understanding structures that are too disordered for manual classification. An example of this is the large family of POMs, where multiple polyhedral transition metal oxyanion units link together to form complex three-dimensional structures that hold promise for novel catalytic (21) and biomedical applications (22). Figure 4A to 4C, illustrates how our framework rapidly identified three types of pentagonal motifs in the Mo-V-Te-Nb-oxide POM and their relative abundances. These motifs resemble five MO8(M=Mo, Nb) octahedra arranged in a pentagon but with the following differences: 59.4% of these pentagons are empty; 36.3% likely surround either niobium (Z=41) or molybdenum (Z=42) atomic columns; and 4.3% possibly surround vanadium (Z=23) atomic columns (Fig. S16).
Remarkably, these three fundamental pentagonal motifs tessellate the plane resembling a type-4 monohedral pentagonal structure(46) (Fig. 4D and 4E). This tessellation, which canonically involves only a single species of irregular pentagon, is instead supported here by a spectrum of subtly distorted pentagons (Fig. S17).
Hierarchically composing higher-level motifs from the two major types of single-pentagon motifs ( Fig. 4F, Fig. S18) reveals how these pentagonal units might have spontaneously assembled. Level 2 motifs comprise four pentagons that occupy the corners of a larger square; here the putative Vanadium-centered pentagons (light blue motifs in Fig. 4C) are omitted. The blue arrows directed towards the corners of these level 2 motifs indicate if the pentagons at the specified corners are filled. In a completely disordered sample, one would expect 2 4 =16 such level 2 motifs that occur with equal probability. Yet Figure 4F shows that a lower configurational information entropy structure where >83% of these level 2 motifs are dominated by only 6 of these motifs. This lowentropy signature suggests the preferential attachment of pentagonal, but distorted, units during their solution-phase growth (21).
Even larger level 3 square motifs can be formed where each corner is now occupied by a level 2 square motif. Again, in a completely random and disordered structure, there should be 2 16 =65,536 level 3 motifs of equal probability. And yet again, nearly a quarter of the field of view (Fig. 4G,   Fig. S19) is dominated by only two distinct level 3 motifs that conspire to form a previously unreported ordered structure with an approximate 3:1 ratio. From Figure 4G it is clear that the growth of this new ordered structure was entropically frustrated by a large family of competing motifs rich in Mo and Nb columns (Fig. S20). This frustration could have been further abetted by the putatively V-enriched motifs, which appear in the regions between these ordered structures as indicated in turquoise dots (Fig. 4H).
When the variations in the sample are more continuous than combinatorial, constructing motifs can be uninformative. Consequently, building hierarchies from such motifs are potentially misleading.
A good example of such samples is the family of van der Waal heterostructures whose electronic and structural properties can be manipulated by introducing relative rotations between atomically thin ordered layers. Figure 5 shows an ADF-STEM micrograph of two hexagonal MoS2 layers that were mutually rotated by approximately 3.15 degrees ( Experimentally realized twisted bilayers will contain imperfections. These imperfections, through the variations between domains or perturbation (47), can be readily quantified in our framework.

Discussion
Deeply rooted in our framework is the critical notion of structural motifs. Motifs coarse-grain and hence simplify the staggeringly large space of possible atomic configurations in our sample. Each motif class can accommodate variations in features due to uninformative nuisance parameters (e.g., noise, scanning errors, detector noise, etc.) or geometric parameters (e.g., small strain fields). With sufficiently high-resolution images, some of these parameters can be unambiguously identified and coarse-grained away. Thereafter, insights about a sample's state and dynamics can be distilled more readily.
This coarse-graining is intimately linked to the fact that interpreting a motif class is more robust against noise and aberrations compared to single noisy image patches. When the signal is sufficiently larger than the noise, such coarse-graining reveals structural motifs (e.g., Fig. 3) despite local strain fields (see POM structure distortion map, Fig. S8). Admittedly, however, for very noisy measurements it will be impossible to discern structural features in the motifs from noise.
Additionally, the Zernike polynomial representation offers robustness against the types of scanning drift that were encountered in the experimental images presented here. But again, this robustness will be compromised when this scanning drift becomes sufficiently severe.
These motifs are the essential building blocks for organizing the structural complexity of samples into interpretable hierarchies. In a perfectly periodic structure, the number of atom-centered motifs at any level of this hierarchy equals the number of unique and visible atomic columns within the unit cell (Fig. S1). However, the multiplicity of these motifs will increase with point defects (Fig.   S23) and inter-domain boundaries ( Fig. S24 and S25). Further introducing random disorder to this periodic structure (e.g., from vacancies, interstitials, translations, and distortions, etc) causes the number of complex motifs to rapidly rise as we build towards more complex hierarchies in our Mo-V-Te-Nb-oxide sample ( Fig. S20 and S26). The rate of this rise is a measure of a sample's configurational entropy and loss of long-range order. This entropy also specifies how higher-level motifs can be efficiently and combinatorially represented with lower-level fundamental motifs (Fig.   2, S15, and S27).
The dominant hierarchy of motifs in Figure 3F is strongly influenced by the dominant self-assembly pathways in the sample. In contrast, the sample also reveals a much larger gamut of thermodynamically accessible but less likely higher motifs ( Fig. S20 and S26). The exercise of constructing these hierarchies may reveal crucial clues about preferential attachment and freeenergy barriers during multi-step nucleation and growth of ordered phases (49). Additionally, these motif hierarchies also provide statistically-grounded structural waypoints to guide (or check) abinitio calculations of viable structures, dynamics, and downstream function (2,3). These ideas can be readily extended to other atomic-resolution structures, such as those collected using scanning tunneling microscopes (Fig. S28).
To conclude, we have described a framework based on Zernike features and force-relaxed clustering to extract and represent motifs from complex atomic-resolution micrographs, and accelerate downstream labeling. Importantly, this framework continues to exploit the spatial context between simple motifs to learn a hierarchical composition of higher motifs that can reconstruct an image from the bottom up. By explicitly inducing this motif hierarchy in a sample, we can quantify and interpret the degree and types of complexity in a sample. Ultimately, these motifs help us coarse-grain disorder and/or complexity in materials to a degree where new knowledge readily emerges and in turn inspires insightful hypotheses. Hence, our techniques offer a novel and rapid approach to extracting multi-level structural information from atomic-level microscopy images and establishing statistically multiscale structure-property links, paving the way to rapid and automatic discovery of next-generation nanomaterials with complex and unknown features.

Materials and Methods
Growth of WS2 thin films with Te and Fe dopants Tellurium (Te) powder was placed into a quartz boat at a temperature of T(Te) ≈ 450 °C. WO3 and FeS2 powders were put in a ceramic boat inside the quartz tube at the center of heating zone. A Si/SiO2 substrate with a clean surface was put on the boat. The growth temperature was set at about 800 °C, and the growth time was 30 min. The flow rate of the Argon (Ar) carrier gas was 90 standard cubic centimeters per minute (sccm).

Growth of MoSe2 thin films via MBE
SiO2 substrates were degassed in the same chamber for 1 h and annealed at 500 °C for 10 min. Mo and Se powders were evaporated from an electron-beam evaporator and a Kundsen cell, respectively. During growth, the temperature of the SiO2 substrates were maintained at 500 °C, with a flux ratio between Mo and Se of ∼1:10 and chamber pressure kept at ∼9 × 10-10 Torr. Monolayer and bilayer MoSe2 can be obtained when the growth temperature is set at 250 °C.

STEM Sample Preparation
As-grown transition metal dichalcogenide (TMDC) films were first identified by optical microscopy. Cu QUANTIFOIL ® TEM grids were placed onto the target region of TMDC thin films followed by an isopropyl alcohol (IPA)-assisted polymer free lift-off method. The TEM grids were annealed in ultrahigh vacuum chamber (~1×10-9 Torr) at 180 °C for 10 h prior to STEM imaging to eliminate surface contamination.

STEM Characterization
STEM-ADF imaging were carried out on an aberration-corrected JEOL ARM-200F equipped with a cold-field-emission gun at 80 kV if otherwise stated. Two sets of detector acceptance angle were adopted. A higher detector range (68mrad-280mrad) was used for MoSe2 characterization. A lower detector angle range (30mrad-68mrad) was adopted for WS2 imaging for improved contrast of S vacancy sites. A dwell time of 19 µs/pixel was set for scanning imaging mode. HAADF-STEM imaging of the POM structure was performed on UltraSTEM 200 (operated at 200 kV).

Synthetic Data Generation
Synthetic datasets in this paper are used to evaluate popular dimension reduction methods (PCA, t-SNE and UMAP) against the Force-relaxed clustering. We have listed the details of synthetic data in Table S3. Simple synthetic motifs in Fig. S1-2, S4-5 are directly calculated by adding different two-dimensional Gaussian functions. In addition to binary classes of motifs with different n-fold symmetry, synthetic bilayer patterns used to evaluate KL divergence (Fig. S13) are generated by convolution with corresponding Gaussian kernels in the Fourier space using sufficient up-sampled grids. Then the generated moiré patterns are then rescaled to adapt to the resolution of experimental data.

Identification of Feature Points
Identification of feature points follows three steps: smoothing, maximum filtering and locating the points. The workflow is detailed in Fig. S29. The key to successfully extracting feature points is to obtain smooth versions of raw images (Step 1). Depending on image quality and image conditions, different smoothing schemes are adopted. For images with high signal-to-noise ratio (SNR), Fourier space filtering was implemented to keep 10% of lowest frequency components. For images with low SNR, Singular Value Decomposition (SVD) based method was applied to the image. We adopted the SVD based method for images in this work unless stated otherwise. The smooth image was dilated by a local maximum filter (Step 2). The feature points are the locations where input image is equivalent to the dilated version (Step 3). In some cases, this identification scheme conservatively overcounts the number of feature points, and features points can be further reduced via symmetry response of Zernike features or selected in Force-relaxed clustering scheme.

Determination of Motif Patch Size
Patch sizes have to be sufficiently large to capture meaningful local symmetries that are persistent in the sample. These symmetries would be efficiently and interpretably encoded in the Zernike representation of these image patches. Empirically, we found this encoding satisfactory when the side-length of the image patches matches the average length of the repeating unit. The latter can be automatically detected using the radial distribution function (see workflow in Fig. S30. Patch sizes smaller than this recommendation tends to offload local symmetry information to the hierarchies that their consequent motifs form, which may not be as readily interpretable.

Dimensional Reduction with Principal Component Analysis
Let the initial set of feature vectors extracted from the raw image be denoted = { 1 , 2 , ⋯ , , ⋯ , | ⊆ }, where n labels the number of features (image patches), and m is length of feature vector that stores Zernike moments. We then use linear principal component analysis (PCA) to identify the main components of covariance between these features. Thereafter, we reduce the features' dimensionality by projecting them into the PCA components that capture the largest feature-feature variations. In this paper, we project all Zernike features ( ≤ 65, = 66) obtained from STEM images into the two largest PCA components ( ↦ = { 1 , 2 , ⋯ , , ⋯ , | ⊆ =2 }, where d labels PCA components).

Force-relaxed Clustering
Our two-stage relaxed clustering comprises a repulsion-dominated stage, followed by an attractiondominated stage. The repulsion-dominated stage allows adequate separation distance between structural motifs whose features are mutually most dissimilar. The attraction-dominated stage then adjusts the strength of attraction force to make sure each motif cluster is compact, and that clear decision boundaries can be drawn between clusters. The forces used in this paper and others (44,45,50) are described in Table S4  If a feature (e.g., in Fig. 1C) is another feature's k-nearest neighbour (determined once from the feature matrix ), attractive forces will pull them together in this PCA-reduced space; otherwise randomly chosen ′ non-neighbour pairs are mutually repelled via a stochastic scheme. Exemplary results are shown in Fig. 1F and 1G, where = 10, ′ = 5. These forces iteratively move featurepairs (Y) in the PCA-reduced layout and are inversely proportional to their mutual separation on this plane and weighted by a similarity metric between the feature pairs in their original Zernike space (X). To provide additional control, we alter the relative strength of these attractive and repulsive forces into a repulsion-dominated stage, followed by an attraction-dominated stage. For numerical stability, these forces are gradually relaxed from a maximum in both stages. The hyperparameters involved are illustrated in Fig. S11.
The evolution of the features that resulted in Fig. 1F and 1G reveals how the two-stage relaxed clustering creates decision boundaries between structural motifs. Features suffer a rapid global dilation during the initial repulsion-dominated stage. During this stage, approximate class boundaries quickly emerge while keeping their k-similar features nearby. This dilation is slowed by a force-relaxation schedule. Subsequently, during the attraction-dominated stage similar features coalesce into well-separated centroids without a global contraction of all the features. This ordering of the stages is crucial for creating clear decision boundaries between features, which can be optimized with the hyperparameters in Table. S3, and Supplementary Text. Further comparison is further discussed in Fig. S13-14.

Automatic Construction of Motif-cell Hierarchy
Here we describe the procedure used to programmatically produce the motif-cell hierarchies in Figures 2 and S15. A variant of this approach is used for the POM dataset ( Figures S26 and S27).
1. Use the feature points on an image to define cells that tessellate the entire image. This can be done in two ways: automatically locating the lattice vectors in a quasi-periodic sample which gives a regular grid of cells; or a Voronoi construction where the resultant cells might not be periodic.
2. Each of the lowest level structure motifs labeled by FR-clustering are associated with its encompassing cell. This produces motif-cells similar to those seen in Fig. 2B. 3. Single-cell motif-cells (e.g., rhombuses in Fig. 2B) are rapidly classified by the set membership of their motifs. These unique single-cell motifs-cells form the lowest-level of the hierarchy that we will build below. In a mostly crystalline sample, the most frequently observed single-cell motifcells will belong to the periodic unit cells of the crystal. Here we ignore these dominant motif-cells to focus on how non-dominant motif-cells self-assemble. 4. If the cells' positions are described by lattice vectors ⃗ + , then each cell's position can be uniquely represented using only its lattice index (i.e., ( , )).

Each non-dominant motif-cell is "grown" by automatically connecting it to adjacent motif-cells in their von Neumann neighborhood. A higher-level motif-cell is created when all the neighbors of its constituent motif-cells have all been connected onto this larger motif-cell.
6. These larger, higher-level, connected motif-cells are ordered by levels according to the number of cells each contains (Fig. 2D). Higher-level motif-cells with the same number of cells are compared to remove duplicates up to an overall 90-degree rotation.

Supplementary Materials
Supplementary Text Figs. S1 to S30 Tables S1 to S4 Movies S1

This PDF file includes:
Supplementary Text Figs. S1 to S30 Tables S1 to S4 References

Other Supplementary Materials for this manuscript include the following:
Movies S1  Conveniently, STEM images of crystalline samples contain only a finite number of atom-centered motifs, which often correspond to the discrete nature of the low dimensional representation of these motifs. This discrete property is closely related to the fact that crystal structure can be concisely described by a unit cell that only contains a finite number of atoms. We used synthetic data composed of an array of Gaussian blobs to validate this statement. Fig. S1A displays a synthetic image of a crystal sample with plane group 4 , which contains two types of motifs as indicated in the inset image and Fig. S1 (B and C).    Table. S3. Here, | | and are given by

Zernike Polynomials and Zernike moments
where 0 is the Kronecker delta. Fig. S4 shows a visual depiction of the first 10 Zernike polynomials. The order of the polynomial is determined by the indices and . In fact, the double indices (p, q) are ordered into a single index , i.e., is equivalent to . Table S1 shows the conversion between ( , ) to , for which the mathematical relation is expressed below, The azimuthal component of is either ( ) or ( ). In fact, it can be generalized to a complex form using Euler's formula = ( ) + ( ). Hence, Zernike polynomials in complex-valued form can be expressed as 8 The truncated ZP representations can effectively reduce noise. Much of the high spatial frequency measurement noise in high-resolution micrographs is predominantly contained in higher-order Zernike moments. Furthermore, the ≤ 10 Zernike projection of an image patch already captures a wide range of possible shapes and arrangements of atomic columns and defects. Therefore, by truncating ZPs beyond p=10 effectively allows us to reject higher spatial frequency measurement noise. As shown in Figure S5, with a very low value of PSNR (1.81), the reconstruction patch still shows the main three-fold symmetry which is consistent with the Zernike moments plot. Fig. S5 shows that the low spatial frequency features of noisy images are at least three orders of magnitude more detectable than the putative input peak signal to noise ratio in different symmetry configurations.
Comparison of ZPs and PCA from patch reconstruction To enrich the discussion towards why we choose ZPs and provide further evidence of the usefulness of our method, we have added more comparison analysis to validate our point. We have basically two paths to compare ZPs and PCA: compare the reconstruction results from PCA and ZPs both truncated to the same number of components; or compare clustering performance of features reduced using PCA or ZPs. Fig. S6 shows the reconstruction results using PCA and ZPs when truncated to the same number of components in three different noise models. After applying only a moderate noise in all three scenarios, we observed that ZPs (orange rectangle) outperform PCA (blue rectangle) in recovering the spatial location and contrast of the ground truth patches. We have to point out that PCA is an effective image denoising method when implemented in overlapping patch-based singular value decompositions (SVD) method. Poisson noise is significantly reduced when overlapping patches are grouped as input into the SVD-related denoising algorithm(1). In representing features from patches using PCA, it contains variations from noise components if pre-pre-processing/denoising is lacking.
Comparison of ZPs and other dimension reduction methods via cluster performances The advantages of ZPs over PCA are even clearer if evaluated based on clustering performance scores via a scheme shown in Fig. S7. The synthetic dataset (of different noise levels) is firstly represented via different dimension reduction techniques, then the reduced features are clustered using the KMeans algorithm. The predicted labels from Kmeans combined with ground truth labels are used to calculate different clustering performance scores including adjusted mutual information (AMI), Fowlkes-Mallows index (FMI) and Silhouette coefficient. S8 shows a complete comparison of these scores in different noise settings using various representation methods. In all three noise models, fixed-bases methods (e.g., ZPs, PZPs(2) and Bessel (3)) outperforms PCA and kernel PCA. In particular, ZPs show the best clustering performance in Poisson setting.

Fig. S10. Computation of Zernike moments (features) via matrix approximation.
We used matrix approximation to compute Zernike features with a reduced dimension of p. A total number of n image patches was flattened to form a matrix with a shape of (n, m), where m is the number of pixels in one patch. A set of p Zernike polynomials with the same shape of image patch is also flattened to form a matrix with a shape of (p, m). The reduced features with a shape of (n, p) can be approximated via matrix pseudo-inverse operation.
Computation Zernike moments can be greatly sped up via matrix approximation instead of directly integrating. As illustrated in Fig. S10, the linear decomposition property of ZPs entitles us to simplify the computation through a matrix pseudo inverse operation. Fig. S9 shows that computation of Zernike moments via matrix approximation are about 7.8 times faster and 1.5 times more memory efficient than PCA (scikit-learn implementation).
Force-relaxed Clustering

Basic Definitions
Let = { 1 , 2 , 3 , ⋯ , } be a set of features such that ⊆ . In this work, represents the Zernike moments extracted from the ℎ image patch. The set of such features then suffer a second low dimensional transformation : ↦ = { 1 , 2 , 3 , ⋯ , | ⊆ }, and ≪ . Here, we use principal component analysis (PCA) with = 2 to generate the initial layout → . Different low dimensional transformation can also be adopted.
The neighborhood information of can be stored in a weighted graph = ( , ), where are vertices (features) and are edges (between valid pairs of features) of the graph. Edges are drawn between two vertices only if they have a non-zero element in their adjacency matrix, ∈ × that we define below. The element (1 ≤ ≤ and 1 ≤ ≤ ) of the adjacency matrix P is the weight of the edge ( , ), which measures the similarity between vertices associated with features and .
An isomorphic graph can be constructed with the reduced features as vertices: = ( , ). Notice that both graphs and share the same set of edges.

Construction of an Adjacency Matrix P from X Using k-Nearest Neighbor Method
Given a set of vertices , there are many ways to construct the edges of graph ( , ). In this work, we use the -nearest neighbor method ( -NN).
Given an input hyperparameter , we are able to compute the set of -nearest neighbors of as { 1 , … , }. For each , we define a minimum distance and a normalization distance from the neighborhood. The minimum distance where ( , ) is a default distance metric between two features and . In this work, between two features vectors and is defined as ∈ we apply the attractive forces between and its k-neighbors of the features (blue subbands), then apply repulsive forces between and randomly selected non-neighbors (orange subbands). The displacement of force relaxation is tuned by ( ), which starts from 1 in each stage, then linearly falls to nearly zero at the end of the stage.

Deriving the Gradient of UMAP Cost Function
UMAP introduces two fuzzy sets of input ( ) and output data ( ), with two sets of weights being equivalent to and respectively. The divergence function for UMAP is denoted as the cross entropy of the two fuzzy set: Here, the input weights are pre-calculated from and treated as constants. The output weights are given by, The gradient of the objective function with respect to is: The two terms within the brackets in the last equation can be interpreted as attractive and repulsive forces acting on the features respectively.
Comparison of FR, t-SNE and UMAP  We used clustering performance scores to evaluate and compare FR, t-SNE and UMAP. Different from a classification task, evaluating clustering performance is not as trivial as computing the precision and recall according to the labels. Specifically in the context of clustering, our metric should take the cluster separation into consideration rather than counting the absolute values of predicted labels. Similar to comparing different representation schemes (Fig. S8), we used Silhouette coefficient, AMI and FMI to evaluate the performance of FR, t-SNE and UMAP ( Fig.  S13 and S14).
Following the workflow in Fig. S12, we found all three embedding techniques have comparable performance in the synthetic dataset with equal-sized clusters. In the uneven-sized cluster scenario, FR outperforms UMAP and t-SNE in Silhouette, AMI, and FMI scores.             S29 illustrate a three-step workflow to locate feature points. The key to successfully extracting feature points is to obtain smooth versions of raw images (Step 1). Depending on image quality and image conditions, different smoothing schemes are adopted. For images with high signal-tonoise ratio (SNR), Fourier space filtering was implemented to keep 10% of lowest frequency components. For images with low SNR, Singular Value Decomposition based method was applied to the image. We adopted the SVD based method for images in this work unless stated otherwise. The smooth image was dilated by a local maximum filter (Step 2). The feature points are the locations where the input image is equivalent to the dilated version (Step 3).

Workflow to determine the patch size
Empirically the side length of patches is estimated to be the radius of the first dominant peak in radius distribution function (RDF) of the underlying sample, which can be approximated from Fourier transform of atomic resolution images. When the patch sizes are too large, the downstream FR algorithm will only capture the major clusters as small satellite clusters cannot be distinguished because they share high similarity with the corresponding major cluster; When the patch sizes are too small, the patch does not cover sufficient region and the local environment (spatial) Synthetic Dataset description: By default, each synthetic motif with n-fold rotational symmetry has a shape of 128x128 which contains (n+1) Gaussian blobs. These (n+1) Gaussian blobs form a regular n-gon with the extra Gaussian blob occupying the center. The surrounding Gaussian blobs have a distance of 32 pixels away from the center of the motif. All Gaussians share the same sigma value of 7. In all blue motif classes, all Gaussian blobs have a maximum amplitude of 1.0, while in the orange motif classes, the central blob has a maximum amplitude of 0.80. In case of applying Poisson noise to the above data, the motif is scaled and converted to integer type.
A synthetic dataset can be specified by a unique set of parameters { 1, 2, , , , }, and the definitions are these parameters are declared as follows (default values in parentheses):