AutomAl 6000: Semi-automatic structural labeling of HAADF-STEM images of precipitates in Al-Mg-Si-(Cu) alloys

The 6xxx Al alloy series, when subject to age hardening, can present a rich collection of precipitate structures, most extending as needles, laths or rods along the <100> directions in the aluminium matrix. These precipitates significantly influence properties of the bulk material, most notably ductility and strength, and are therefore the objective of extensive research aimed at better understanding the precipitation sequence of alloys during aging. The atomic structures of the majority of the phases that appear in this system, can be described as bundles of atomic columns following the matrix periodicity along <100> Al, where most are hijacked by the solute elements. It has been found that the columns in the 6xxx system precipitates are commonly characterized by a set of simple structural principles or rules, that allow identification of the species and the relative displacement (longitudinal jumps) of the atomic columns that comprise the entire cross-section structure of the needles. With some exceptions, these rules have proved to be useful for not only analyzing the phases in this system, but also to characterize disordered precipitates, matrix/precipitate interfaces, and hybrid precipitates containing multiple phases. Modern research in the field of nano-scale material science is equipped with increasingly sensitive tools and techniques, of which atomic resolution aberration-corrected high-angle annular dark field scanning transmission electron microscopy (HAADF-STEM) plays an important part. With the goal of analyzing such HAADF-STEM images of particle cross-sections in the Al-Mg-Si-(Cu) system, we have developed the stand-alone software AutomAl 6000. AutomAl 6000 features, amongst other, a method for column detection, an algorithm based on the symbiosis of a statistical model and the rules formulated in a digraph-like framework, as well as an approachable graphical user interface. From a 2D projected HAADF-STEM image of a cross-section precipitate, the software can semi-autonomously determine the 3D column positions and column species wherever the rules are applicable. In turn, AutomAl 6000 can then be used to display atomic overlays of the precipitates, analyze or display various other representations of the data. Links to the active repository, as well as other resources can be found at http://automal.org.


Preface and acknowledgments
This thesis is the conclusion of my MSc study in physics at NTNU. The central result of my work is represented by the software program AutomAl 6000 1 . This thesis provides the theoretical and methodological background of the software. Some parts of the software has been challenging to describe in a written text, and the program should be tested and tried out. However, be advised that the software is not yet fully finalized at the time of writing, partly because this project has become very large for a one-man programming team, and time-restraints and many overrun deadlines has put some limits to what I was able to achieve in time, including the "fullness" of this thesis. It has also been a sometimes stressful challenge to manage the shear size of this project, and its many aspects. Some parts of this thesis suffers a bit in quality as a consequence. I will however stay with NTNU for a bit longer in a small time-limited position, where my assignment will be to make the source code more finalized and ready to perhaps be adopted by the community as an open-source project, if the interest is there. We are also considering to write a paper to accompany the software, if time permits. With this in mind, accept my submission here as a snapshot of an still ongoing project.
I would like to thank my project supervisor, Randi Holmestad, Professor at the department of physics, NTNU, for approaching me with the subject for this project, and for the encouragements and assistance throughout the project. It has been very rewarding and motivating to work with the topics that have presented themselves through this project, and I am very grateful for the opportunity I was given, to be able to work with the subject matter, and to meet many of the people that make up the very competent community of TEM, aluminium and material science groups at NTNU and SINTEF.
I would also like to thank Calin Daniel Marioara, Senior Research Scientist at SINTEF industry, for spending his time teaching me some of the essential knowledge that I required to even begin approaching the challenges of this work. He has also provided feedback and assistance during the semester and inspired many of the implemented and proposed functionalities of the software associated with this work. I would also like to mention Sigmund Jarle Andersen, Senior research scientist at SINTEF industry. His work, among others, on the structural principles has been paramount for the feasibility of the work that is presented in this thesis. He also made himself available for discussions, and has provided some interesting insights into some of the details that have exposed themselves through this work.
There seems to have been several persons who have conceived and thought about the idea to attempt to create the type of software that has been done here. Of those, I would especially like to mention Sigurd Wenner, researcher at SINTEF industry, whose PowerPoint-slides constituted the main source of inspiration for the definition of the problem at hand.
The work I have done on this project, is somewhat theoretical, or rather methodical, in nature, and I have spent very little time on experimental techniques. There are several individuals here at NTNU and SINTEF who have been helping me with their experimental insights and knowledge, images, feedback to the software and much more. This help has been invaluable to me. Of these individuals, I would especially like to mention Jonas Kristoffer Sunde, Adrian Lervik, Elisabeth Thronsen and

Introduction
In many modern fields of research, analyzing data from experiments often requires the use of software. Applying general purpose software or writing simple special purpose programs is commonplace in many research projects, but sometimes a more substantial effort on development of analytical tools can be warranted. This thesis outlines such an effort, where a new stand-alone, open-source software tool entitled AutomAl 6000 has been developed. AutomAl 6000 is a specialized tool for analyzing atomic resolution High Angle Annular Dark Field Scanning Transmission Electron Microscopy (HAADF-STEM) images of Al-Mg-Si-(Cu) precipitates in 6xxx aluminium alloys. From an input HAADF-STEM image, AutomAl 6000 can determine, given some manual input, the 3-d positions and atomic species of each atomic column in the image, which can be visualized with an atomic overlay. This overall process is possible due to the structural principles that accompany the precipitation sequences in these alloy systems, see section 2.2.3. The extraction of atomic structure data from HAADF-STEM images by using the methods herein, involves casting all the useful information in a suitable framework which have been termed atomic graphs, see section 3.2. Having the atomic data ordered in this manner, also lends itself naturally to additional novel ways of quantitative analysis of precipitates in terms of graph parameters, see section 4.
To successfully perform the tasks that AutomAl 6000 either claims or desires to include in its capabilities, there are many challenges that must be approached, resulting in an inherently multi-faceted project. The problems that are most deserving of highlight, are summarized below.
• Column detection. Before structural analysis can take place, a method that can automatically locate the atomic columns in HAADF-STEM images, is required.
• Atomic graphs. A mathematical framework for both the structural principles observed in most Al-Mg-Si-(Cu) precipitation structures, and a flexible statistical model. Atomic graphs also serves as a somewhat intuitive visual interface between a user of AutomAl 6000 and the state or structure of the underlying data.
• Statistical modelling. A basic framework for developing statistical models used in the methods of AutomAl 6000.
• Column characterization. Using the atomic graph framework together with a statistical model, an algorithm has been developed, that can solve the over-arching problem of transforming 2-dimensional column positions in the image plane, to 3-dimensional column positions, as well as the atomic species of the columns.
• Graphical user interface and other secondary methods and tools. To make the methods we have developed easily accessible to everyone, we have made a Graphical User Interface (GUI). There are also several methods and tools that are natural extensions to the software that have been implemented.
Due to the somewhat unusual format of this Masters's project, the output generated during the project period consists of the following several parts, which should all be considered together, with emphasis on this thesis and the source code: • This thesis, which is focused on developing some amount of rigour and suitable language to systematically explain the thinking that has gone in to the development of the methods of AutomAl 6000.
• A webpage 2 that contains software downloads, guides and technical documentation. As well as some select updates on the project activities.
• A git repository 3 with the source code of AutomAl 6000. The particular link given here is directed to a repository that is intended as a snap-shot of the source code at the time this thesis is submitted for consideration. In the near future, the active repository can be found under the "links" section of the webpage.
It should be noted, that with AutomAl 6000 being a piece of software that is still under development and constantly subject to change, this thesis is unlikely to give an accurate description of the software for very long 4 . Thus, implementation details are in general avoided in this text. Instead, the goal of this text is to present a language of methodology and to substantiate the thinking that has emerged from the process that has lead to the current version at the time of writing.
The theory section, section 2, of this thesis presents some basic introduction to HAADF-STEM experimental technique, aluminium alloys and the structural principles observed in the 6xxx precipitation sequence. Furthermore, the theory section also provides a brief introduction to the language of graph theory, as well as a short discussion of data modelling techniques and nomenclature. The several facets of the methodology of AutomAl 6000 is then presented in the method section, section 3. Some discussions arise naturally from several of the topics related to the development of AutomAl 6000, which are presented in section 4. In the conclusion section, section 5, the experience acquired during this project is summarized. The final section, section 6, presents the most pressing focus points as the project moves into its final stage. Appendix A gives an overview of the source code of AutomAl 6000.
There are no experimental results produced by this project, and the images presented in this thesis which AutomAl 6000 has been developed with, are borrowed from individuals at the department of physics at NTNU and SINTEF industry. These images are presented here with permission.

Theory
This section provides a brief insight into the topics that underpins the AutomAl 6000 software. After a short introduction to HAADF-STEM, a focused look into aluminium precipitates follow. Next, the language of graph theory is introduced. Finally, this sections ends with an overview of data analysis and statistics.

Brief introduction to HAADF-STEM imaging
Theoretical and experimental details of HAADF-STEM are not paramount to the understanding of the methods of this thesis. However, since these images are the basis of all the analysis done, a brief and simplistic outline is given here. More details can be found elsewhere [1].
In STEM mode, the incident electron beam is focused into a very sharp, convergent beam which is scanned across the surface of a very thin sample, typically less than 50Å. To obtain high-resolution images, the sample has to be oriented exactly in a zone axis, which means that atoms are aligned in columns exactly along the beam direction. Furthermore, the probe size has to be less than the distance between the atomic columns. To obtain such a small probe, the microscope has to be corrected for spherical aberrations in the probe forming lens. The microscope used to obtain the images analyzed in this work, is the JEOL ARM 200, with a probe size <1Å at 200 kV high voltage.
HAADF is a special imaging mode which collects transmitted electrons that are scattered to high angles through Rutherford and thermal diffuse scattering. Put simply (and almost correct), the electron scattering to high angles is inelastic and incoherent, while Bragg scattering to lower angles is elastic and coherent [1]. Typical collection angles for the HAADF detector in the back focal plane are from 40 to 200 mrads. How much of the incident electrons that is scattered to the detector, is approximately proportional to the square of the atomic number of the species in the column that is being hit by the beam and how close to the column the beam is. This intensity, which is the number of electrons scattered to the detector, is often referred to as Z-contrast. Each pixel in a HAADF-STEM image is a representation of the scattering measured from that point of the sample. The scan then proceeds to an adjacent area of the sample, and so on, until the entire image is produced. Figure 2.1 shows a simple schematic of the HAADF-STEM setup.
There are many experimental challenges that one must face to produce good images. As mentioned, to obtain atomic resolution, aberration correction is required. Furthermore, minute vibrations can cause the sample to drift slightly during the scan, causing a characteristic type of image distortion. To achieve images that show clear atomic columns, it is also important that the TEM-operator adjusts the sample such that the beam direction is perfectly aligned with the main zone axis. In the images analyzed in this thesis, we always study the <001> zone axis of the aluminium matrix. Physical properties of the specimen that are being studied, can also sometimes influence the visual results, one example being columns, structures or even precipitates that doesn't fully penetrate the sample. Finally, high quality samples 5 are important for good results.

Structural principles in Al-Mg-Si-(Cu) precipitates
This section provides a brief insight into the landscape of aluminium precipitation. The following assumes some very basic familiarity with crystallography nomenclature, like crystal lattice, crystal directions or Miller indices [3].

Aluminium
Aluminium (Al) has a crystalline Face Centered Cubic (FCC) structure with lattice constant a = 404.95 pm, see figure 2.2 (a). In a HAADF-STEM image, we are imaging the structure along one of the equivalent <001> directions, which means that we see the projection of the FCC and atomic positions appear as closest neighbours in the projection in fact have a 1 2 a component in the beam direction. In figure 2.2 (b) this projection is illustrated. Since the samples that are examined with atomic resolution STEM typically are 20-50 atomic layers thick, the atomic positions seen in the images, are referred to as columns.

Precipitation sequence and phase
When an alloying element is introduced into the FCC structure, it will influence how the material changes during thermomechanical treatment processes. Depending on the treatment, the alloying elements will often start to cluster together into molecule-like structures. This initial clustering then develops into precipitates. The intermediate stable or meta-stable phases that appear as precipitates during thermomechanical processing are often seen in the context of a precipitation sequence. With arrows representing some thermomechanical process, the Al-Mg-Si sequence is often written as where SSSS is the super saturated solute solution, where the alloy elements are homogeneously mixed in the aluminium matrix, occupying aluminium FCC lattice points. The stable and meta-stable phases are named β , β , U 1 and so on, and they are characterized by the structure of the precipitate. These precipitates extend like needles, laths or rods along the <001> directions in the Al matrix [4]. Another precipitation sequence transpires in the presence of Cu: The "library" of know stable and meta-stable phases is not necessarily exhaustive, and there are currently new phases still being described. It also stands to mention that sometimes, certain areas within a precipitate is of no particular phase, but rather takes on a disordered phase. Particles that contain the structures of two or several different phases is a hybrid precipitate. Some examples of familiar phases that appear in the 6xxx Al-Mg-Si-(Cu) systems [5], are shown in figure 2.3.

Structural principles
The current understanding of how the particular precipitation structures occur, is explained with solute elements with larger atomic radii accepting more Nearest Neighbours (NNs), while smaller atoms accept fewer [6]. In pure Al FCC, each Al position in the Al matrix have 12 NNs that are separated by 1 √ 2 a. Taking a plane through the Al position and 4 of the NNs, there are 4 NNs in the same plane, and 4 NNs to each side of the plane. Seen in projection then, there are 4 NNs which occupy the opposite plane, see figure 2.4a, where the opposite plane NNs are indicated by white lines. Let the symmetry of each column be the number of opposite plane NNs in projection. Now, introduce a line defect by shifting one Al column into the interstitial position, 1 2 a along a <001> direction. In figure 2.4b, this is illustrated with the central column now occupying the Z = a 2 plane. With this central column in the interstitial position, the symmetry of the surrounding columns change. Now let the columns that have changed symmetry be occupied by solute elements, assuming that columns with a 3-symmetry are favoured by the smaller atoms (Si, Cu), 4-symmetry by Al, and 5-symmetry by the larger atoms (Mg). This is illustrated in figure 2.4c. Finally, let the structure relax to compensate for the new atomic radii, which produces the familiar β eye, as seen in figure 2.4d. This designation of atomic species by column symmetry, can be seen as a structural principle in the Al-Mg-Si-(Cu) system. All though the principle has several counterexamples in the system, it has proven useful and insightful for the analysis of precipitates. Not only does the principle hold for most of the familiar phases 6 in Al-Mg-Si-(Cu), but it often also holds in highly disordered precipitates, as well as hybrid precipitates consisting of two or more phases that are combined together. Even the interfaces that appear between phases and the Al matrix, will typically adhere to this structural principle.
By using this principle, together with their experience of the phase library of the system, researchers can overlay HAADF-STEM images of precipitates, with column species and Z-height indicators, often without the need for other information than the image itself. Note that a single column might contain several different atomic species, but this model attempts to capture the majority species, as well as structural function. In this scheme then, Al matrix columns that are enriched with Cu for instance, would still be considered an Al column, despite the bright signature of Cu. As will be discussed in section 3.2.4, it is this principle AutomAl 6000 has put into a systematic framework, which enables it to overlay HAADF-STEM images. 6 The most notable exceptions being β , Q and U 1.

Graph theory
Graph theory provides an elegant framework that can be used to solve many different types of problems, and can sometimes even reveal connections between problems that otherwise might have seemed disjoint. This section provides a short introduction to some of the basic concepts and definitions from graph theory that will be useful later. Most of the material presented here are based loosely on the introductory level textbook "Graphs and Digraphs" by Chartrand et. al. [7], which is only one of many extensive texts on the subject, see for instance [8]. Some of the notational conventions are slightly altered to better fit the style that are deploy in this text. To quickly cite the many definitions of graph theory can make for a very "dense" read. To combat this, the contents of this section is divided into small titled paragraphs, so as to facilitate both reading and back-referencing.

Set-building notation
Graph theory takes advantage of the relational language of set theory. This is a vast topic, but is presented here in its bare brevity and strictly limited to the notation that will be deployed.
Sets are collections of unique objects. If A is a set of a certain type of objects, and a is an object of that type, then a is either in A or not in A. Membership is notated as a ∈ A, if a is in A, and a / ∈ A, if a is not in A. The cardinality of a set A, is the number of elements in A, notated |A|. The union of two sets A and B is a new set C that contains all the objects that are in A or B or both, notated C = A ∪ B. The intersection of two sets A and B is a new set C that contains all the objects that are in both A and B, notated C = A ∩ B. The disjunctive union (sometimes termed the symmetric difference) of two sets A and B is a new set C that contains all the objects that are in A or B, but not in both, notated C = A B. The difference between two sets A and B, is a new set C that contains all the elements that are in A, but not in B, notated C = A − B. Table? Many of the relationships between sets can be described efficiently with set-builder notation [9]. The idea is to define a range of objects, and each of those objects that satisfy a certain rule or equation, is included in a set. The notation uses curly brackets around the range and the rule, separated by the "|" delimiter, so as to signify that it is a set, like so: set = {range|rule}. As an example, the definition of the intersection of two sets A and B, can be defined using set-building notation as , which can be read as for all x in the union of A and B that are not in the disjunctive union of A and B. The symbol "∀", which reads "for all", can also be omitted here, because it is implied with this notation. Another example, defining the difference between A and B, is Set-building notation is especially useful in conjunction with programming implementations, because it is easy to translate into code. An illustration of this with python, again using the difference between two sets, is shown in listing 1. From this understanding as the set-building notation as a sort of pseudo code, we see that the range can be any kind of iterable, and the rule can be any test that always evaluates as true or false on that range, but since the curly brackets indicate a set, the notation will always return a set. If an object x is found more than once in the range, then x will still only occur once in C, because both mathematical sets and python sets disallow duplicated objects, as they should. if not x in B : # if x is not in B , 6 C . add ( x ) # then include x in C if x is not already in C . 7 return C Listing 1: Example implementation of {x ∈ A|x / ∈ B} with python.

The general graph language Vertices
A vertex v i , or vertices in plural, is thought of as a point or node, and is the fundamental building block of a graph. In the visual representation of graphs, vertices are often represented by points or dots, but note that vertices have no spatial location, and are only located in a graph through their connections with other vertices.

Arcs
An arc a j is a set of two vertices, indicating a relationship between those vertices. In directed arcs, the 2-element vertex subset is ordered.
The maximum and minimum vertex degree in a graph G is denoted ∆(G) and δ(G) respectively. In an un-directed graph, the first theorem of graph theory 8,9 , states that if G is a graph of size m and order n, then Another interesting graph parameter that we will mention here, is the average degree of G. In an un-directed graph, it follows from the first theorem of graph theory that

Connectedness
Two vertices v k and v l are said to be connected if there exists a v k − v l path P in G. A graph where every two vertices are connected, is a connected graph. Note the fact that two connected vertices are not necessarily adjacent.

Graph operations
Graph operations can be sorted into two classes, unary operations, which will create a new graph from a graph, and binary operations, which will create a new graph from two distinct graphs. This work is mostly concerned with unary operations. Elementary operations 10 are unary operations that make small local changes to a graph, like deleting or adding an arc. Some operation definitions which are specific to this thesis, are given in section 3.2.5.

Classes of graphs
There are many ways to classify graphs. Only the most elementary classifications are defined here.
Thus far, the discussion has implicitly been concerned with undirected graphs. That is, the arcs have no direction, ie a j = v k v l = v l v k . In contrast to un-directed graphs, in a directed graph, also called a digraph, the arcs have a specific direction, ie a j = v k v l = v l v k . Digraphs are discussed in more detail in the next section. Other classes include multigraphs, which are graphs which can contain more than one identical arc. Pseudographs are graphs that allow loops, that is, arcs that connect a vertex to itself, ie a j = v k v k . Multigraphs and pseudographs are sometimes considered as sub-classes of digraphs. Pure digraphs that don't allow repeated arcs or loops are thus sometimes labelled simple digraphs for clarity. In this text the word "digraph" implies "simple digraph".
The adjacency matrix representation of graphs There is a way to represent graphs in a binary matrix form. Termed the adjacency matrix of G, it is the n × n matrix M with binary valued elements Excluding pseudographs, the diagonal elements are always 0, since a vertex can not be adjacent to itself. Furthermore, for an un-directed graph, M is symmetric about the diagonal, but not for a digraph. Another useful property of M is that the sum of the elements in row i (or column i) gives the degree of vertex v i . The study of the properties of the adjacency matrix forms the basis for spectral graph theory.

Digraphs
Using some of the fundamentals of graph theory, this section proceeds to discover the nomenclature of digraphs, which is similar to that of un-directed graphs.
A digraph G is a finite set of vertices {v i } together with a possibly empty set of ordered pairs of distinct vertices {a j } of G. In similar fashion as with un-directed graphs, one writes G = (V (G), A(G)), where given that v k and v l are elements of V (G). As with un-directed graphs, the order n of a digraph G is the number of vertices in the vertex set, n = |V (G)|, and the size m of a digraph G is the number of arcs in the arc set, m = |A(G)|.
If A(G) contains the arc a j = v k v l , then v k is said to be adjacent to v l , and that v l is adjacent from v k . Alternatively, v k is a direct predecessor of v l , and v l is a direct successor of v k . This text will stick with the terms "adjacent to/from". The arc v l v k is the inverse arc of v k v l and vice versa.
The set of all vertices that are adjacent The first theorem of digraph theory now reads A digraph G, is a symmetric digraph if and only if, for every arc a j in A(G), A(G) also contains the inverse of a j . The adjacency matrix of a symmetric digraph is symmetric about the diagonal. As is explored further in section 3.4, transforming a simple digraph into a symmetric digraph, by applying simple elementary graph operations under certain restrictions, is the central moment of this work. Figure 2.6 shows a simple example of a digraph, illustrating some of the definitions given here.
A vertex with out-degree equal to 0, is called a sink, while a vertex with in-degree equal to 0, is called a source. Vertex v 3 is an example of a source. To illustrate another concept,

Data analysis and modelling
In a very general sense, data can be said to have nominal attributes (categorical data), or numerical attributes (interval or ratio data). If we were to model an atom, we could for instance make a very simple model where every atom has a species, radius and number of electrons. In this example, the species of the atom (as defined by the number of protons in the core) is an example of a nominal attribute 11 . Nominal data is discreet and typically within a preset limited range of possibilities. The radius of the atom is an example of a continuous numerical attribute, and the number of electrons is an example of a discreet numerical attribute. The data format which appears in this text, will typically have have n datapoints, each with k numerical attributes and 1 nominal attribute. The concepts that are discussed in this chapter can be explored in more depth in [10,11].

Normal distributions, Z-scores and normal probability plots
The well-known normal distribution is remarkably effective at capturing the essence of a wide range of natural distributions. Given a real-valued random variable x, with mean µ, and standard deviation σ, the normal distribution f takes the form of the continuous probability density function The square of the standard deviation, σ 2 , is known as the variance of the distribution. For a given set of n data points {x h }, the mean and the variance can be calculated by the formulae (2.10) The standard score z h of a data point y h , is Mapping a data-set with the standard score, is a standardization method, which if applied on the normal distribution, recovers the standard normal distribution. The standard score has an application in normal probability plots, which is a graphical tool which can be used to asses if data is normally distributed. If the plot of y h against x h falls on a straight line, this is an indication that the distribution can be normal. However, for a positive normality check, the points should also be symmetrically distributed on the line, and consist of a single cluster.

Multinominal multivariate normal distributions
The normal distribution can be generalized to variables in higher dimensions. In the 2-dimensional case, it is called the bi-variate normal distribution. For a k-dimensional variable, it is called the multivariate normal distribution. Given a random k-dimensional variable x = [x 1 , x 2 , ..., x i , ..., x k ] T , the normal distribution takes the form where µ = [µ 1 , µ 2 , ..., µ i , ..., µ k ] T is the mean vector with elements corresponding to the components of the random variable, and Σ is the k × k co-variance matrix with elements given by Given a set of n k-dimensional data points [y 1 , y 2 , ..., y h , ..., y n ], a numerical scheme for calculating the covariances, that is stable towards catastrophic cancellation, is the two pass algorithm [source], that first calculates the means which can then be used to calculate the shifted covariances Note that the covariance matrix is symmetric about its diagonal, and that the elements along the diagonal are the variances of the individual numerical attributes. A numerical difficulty that may arise in the context of multivariate normal distributions, is when the determinant of the covariance matrix becomes very small. The inverse of the matrix is only defined for non-zero determinants, but extremely small determinants can cause numerical wildness when calculating the inverse matrix.
To combat this, one will typically standardize the data by subtracting the mean and scaling by the variance. Numerical wildness can still appear however, in particular when attributes are perfectly correlated 12 , which is an important consideration when designing the model (perfectly correlated data is superfluous data).
For data that is mutlinominal as well as multivariate, each category in the range of the nominal attribute gets its own multivariate distribution. For a dataset with k numerical attributes, there will be k means and k 2 covariances, giving a total of k + k 2 parameters for the model. Now assume multinominal data that can take on l different values for its nominal attribute, the statistical model will now consist of l(k + k 2 ) parameters. For large l and k values, the model can become challenging to present. Tables of numerical values are seldom useful, and hyperdimensional surfaces are hard to represent. One approach is to display the normal distributions of each numerical attribute separately, but this will not directly show the covariance behaviour. 12 A matrix with all elements close to one, will have a determinant near zero. 13

Method
The overall development process of AutomAl 6000, spanning from its beginning as a student specialization project, to its current form as a Master's project, has been steered mainly by the software's capability to overlay Al-Mg-Si-(Cu) precipitates as correctly as possible, while at the same time maintaining an efficient way for the end-use researchers to analyse the validity and quality of the resulting overlays, in a manner that does not require them to check every single column 13 . There was no clear plan on how to solve the overarching problem at the outset, so to arrive at the details of the current implementation, a lot of trial and error has occurred. The eventual approach, which is outlined in this section, can be described in brevity as a symbiosis between a statistical model and a technique termed untangling, that are set together in an algorithm-friendly mathematical framework which was inspired by the digraph structure, which results in a somewhat sophisticated column characterization algorithm. The source code of AutomAl 6000, which embodies the implementation of the methods laid out in this section, has grown quite large in its extent, and is therefore not explained in detail here. Rather, the focus is on developing a theoretical framework of the methodology.

Column detection
The general issue of detecting atomic columns in TEM images has received much interest over the last few years. There are many benefits to effective methodologies that can perform this task, and it is an essential starting point for any automatic or semi-automatic structural characterization method. There are already several existing methods that approach this problem. Of notable mention is Atomap [12]. Atomap is sophisticated, but in simple terms, can be said to work by fitting 2D Gaussian surfaces to the pixel intensities. Despite already existing solutions, a light-weight method for column detection for AutomAl 6000 was made. This method calculates the Center Of Mass (COM) of pixel intensities around extremal pixels.

Image pre-processing
For the column detection to work optimally, images that are to be analyzed using AutomAl 6000 should be noise filtered using software such as Gatan Microscopy Suite (GMS) by Gatan [13], which is a commercial image processor that is widely used in the TEM-field as a constituent of the standard software suites on TEM hardware. Applying an appropriate low pass filter on the Fast Fourier Transform (FFT) of the image will eliminate many of the noise frequencies of the image. Filtering out the noise in the image is necessary for column detection to work.
To apply a low pass filter in GMS, start by performing a FFT on the image. Click on the resulting FFT with the band pass tool selected, which will produce a donut shaped mask on the FFT. Adjust the inner radius of the mask to zero, and the outer radius to approximately 6,7 nm −1 , which will include the 200 Al reflection, and exclude the 220 Al reflection. This will eliminate features that are smaller than 0.15 nm in real space. Finally, perform inverse FFT on the masked FFT to obtain the noise filtered image.
If the scale of the image is greater than 7 pm/pixel, AutomAl 6000 will automatically upsample the image so as to double both the width and height of the image. Using bilinear up-scaling has proven to have a positive effect on the column detection in images with scales in this high range. This is because the circular samples used in the COM calculations becomes over-granulated (non-circular) for low scales. AutomAl 6000 uses the resampling method of Scipy's ndimage module [14].

Center of mass implementation
The input to the column detection algorithm is an image matrix M , where the elements are normalized to be floating point numbers in the range [0, 1]. The algorithm will also create a deep copy of M , termed the search matrix M s , that it will use to delete already detected columns from the pixel intensity search space. The detection will continue until the maximum pixel value in M s is below a threshold T ∈ [0, 1].
Let the scale s of the image be given in pm/pixel, which is automatically collected from the dm3 metadata by AutomAl 6000, but can also be overwritten by the user. The approximate atomic radii r is then determined by where are the floor operators that returns the integer value that leads the decimal point. The approximate atomic radii is an approximate measure of the general size (in pixels) of atomic radii in the image. This parameter determines the radius of the circular area around global maximums that will be used to calculate the COM. The overhead parameter o is given by and gives the radius r + o of the circular area around the found column position where the search matrix elements will be set equal to 0. The numerical values that enter into the expression for r and o are tuned to give the best results.
For the input-and parameter-space given above, the column detection method can now be described with the following recipe: 1. Create a deep copy of M as M s .
2. Find the coordinates (x 0 , y 0 ) of the maximum element value of the search matrix M s .
3. Calculate the coordinates (x fit , y fit ) of the centre of mass using the values from M in the subset P i = (x i , y i ) that is formed by the circular area with radius r centered at (x 0 , y 0 ). If κ is the number of elements in P i , such that i ∈ [0, 1, ..., κ], then 4. Store (x fit , y fit ) as a new atomic column.
5. Set all elements in the search matrix M s that is within a circular area of radius r + o centered at (x fit , y fit ), to 0.
6. Iterate the index counter.
Some attempts at developing automatic thresholding were made, but no consistent method was found.
In the current implementation, the threshold must be guessed manually. This, as well as other aspects of column detection, are discussed further in section 4.1 and 6.4.

Atomic graphs
To develop an overall algorithm that can "fit" the relational information between columns in an image to the structural principles of section 2.2.3, a framework termed atomic graph was developed. These atomic graphs are based on the digraphs of graph theory. Like digraphs, atomic graphs consist of vertices and arcs. In addition, atomic graphs have some properties that are related to the nature of this specific application. These properties are laid out here.

Basic atomic graph properties and vertex maps
As with digraphs, let an atomic graph be G = (V, A), where V = {v i } is the vertex set, and A = {a j } is the arc set. One of the differences between an atomic graph and a general digraph, is that in atomic graphs, the vertex positions have physical interpretation. Thus, for the particular problem of this thesis, distances and angles between vertices are essential, so these values are preserved. Let the position of vertex v i , be (x i , y i , z i ) relative to some arbitrary origin located within the image.
Since the vertices in an atomic graph will be representing atomic columns in projection, each vertex will have an associated relative height position in the beam direction, normal to the image plane. With one exception seen in the β phase 14 , this relative z-position will either be 0 pm or 1 2 a = 202, 3 pm, so let this displacement be represented with a Boolean value hereby referred to as the zeta value of the vertex.
Recall from section 2.3.2 that the distance between two vertices in a graph is an integer giving the number of vertices in the shortest path between the vertices, where "shortest" implies "fewest". In this application, spatial separation is meaningful. Thus, let the separation ∆ between two vertices v k and v l be and let the projected separation ∆ between two vertices v k and v l be The symmetry number n i , of v i , is the same as, and forces the out-degree of v i , deg + (v i ) = n i . The symmetry number of a vertex intends to reflect the number of opposite plane NN's seen in projection, as was discussed in section 2.2.3. This is a central concept in the structural principles that are being modelled with this approach, and the symmetry number of the different species under consideration, are shown in table 3.1. Un 3 In addition to the definitions related to general digraphs which were discussed in section 2.3.3, like indegree, out-degree, in-neighbourhood, etc, some additional set definitions now follow. These facilitate the mapping of neighbourhoods, which inform some of the methods explored in section 3.4. Consider first a district D, which is an ordered list of vertices that are local to a particular vertex at a level above neighbourhoods, with elements D k . The term "local" is used here in a purposefully vague sense. The column characterization method progresses by making changes to the district, so one could consider this list to contain local vertices sorted by their currently attributed relevance. At the outset however, the district D of v i shall consist of the 10 "closest" vertices to v i , where "closeness" is understood to be measured in projected separation ∆ . The out-neighbourhood N + of v i shall then be defined as the set containing the first n i elements of D(v i ). By using set-builder notation, this can be written as Let the neighbourhood N of v i be the union of the in-and out-neighbourhood, If a vertex v l is in the intersection of the in-and out-neighbourhood of a vertex v k , they are partners.
The set of all partners P of v i , is then (3.14) and let the semi-partners P of v i be Two final sets that are meaningful here, is the out-semi-partners and the in-semi-partners The sets that have been defined here are summarized in table 3.2. This turns out to be a convenient way to build the implementation of atomic graphs. Note that the district is an ordered list (not a set) from which the sets in the map of v i are defined.

Set Definition
District As can be surmised from table 3.2, once the districts and symmetry numbers are defined, the entire atomic graph is defined. However, there is more than one conceivable way to select the outneighbourhood, generating a whole family of graphs, with one useful in particular, discussed in section 4.3.2.
At last, introduce a restriction on atomic graphs, which in fact is a very severe restriction. Arcs in atomic graphs can not intersect. One of the shortcomings of the definition of the map of v i , is that it does not protect against intersecting arcs. Checking for arc intersections is a computationally difficult task that the current implementation solves by geometrically calculating the intersection between every arc and then checking the domain of the intersection. Discovering a method, for instance by using the adjacency matrix of the atomic graph that would simplify intersection detection, would contribute a major speed increase of the AutomAl 6000 column characterization. This restriction is a result of the application, and the interpretation of an arc as indicating a nearest neighbour atom pair in an atomic structure.
Atomic graphs shall be visualized by circles representing the vertices, that are filled or hollow, with filled circles indicating ζ i = 0, and hollow circles indicating ζ i = 1. Symmetrically adjacent vertices, meaning that they are both adjacent to each other, shall be connected by a simple black line. Thus, single black lines indicate two arcs, one arc and its inverse. It is desirable that un-symmetric arcs stand out, so these are given a red color, bold face and an arrowhead indicating the "adjacent to" direction. Arcs connecting vertices that occupy the same plane are indicated by a blue bold face and arrowheads, but un-symmetric arcs take precedence, so only symmetric in-plane arcs will ever appear in blue. Some examples of an atomic graph visualisation is shown in figure 3.1.

Subgraph classifications
Subgraphs can be produced from a graph by different methods, see section 2.3.2. Of special interest here, is the induced sub-graph method, which from a given vertex subset, V (H) ⊆ V (G), will produce an arc set A(H) given by equation 2.5. The induced subgraph H is then given by H = (V (H), A(H)).
The following paragraphs explain how to find certain vertex subsets, which then is assumed to imply the induced subgraphs from those vertex subsets. There are three categories of vertex subset definitions of special interest here.

Mesh centered subgraph
Given the no-arc-intersection restriction on atomic graphs and the spatial property of vertices, a mesh can be defined rather easily in atomic graphs as an area enclosed by arcs. Finding the vertex subset of a mesh is done by first defining an arc v l v k , and then selecting the vertex v q from N (v k ) which minimizes the vector angle between the vectors formed by v k → v l and v k → v q . This is repeated until q = l. Think of this as going from v l to v k and then taking a rightmost turn along any arc until v l is reached. The induced subgraph formed by such a vertex subset, is a 1st order mesh centered subgraph. A 2nd order mesh centered subgraph would include all adjacent meshes, but only 1st order subgraph are used by AutomAl 6000, so these are not covered here. Figure 3.2 provides some examples of the procedure. For notation, let the 1st order mesh centered subgraph defined from the vertex pair where v i and v j is assumed to be adjacent in at least one way.
mesh (v 5 , v 2 ). Note that this subgraph can be defined by the vertices v 5 v 2 , even though v 5 v 2 is not an arc in A(G), however, its inverse is, which is the minimum requirement. As a counterexample, the subgraph H Arc centered subgraph A 1st order arc centered subgraph, is formed by the vertex subset that includes an arc v l v k , as well as the meshes formed by v l v k and v k v l . This is illustrated in figure 3.3. The 2nd ordered arc centered subgraph additionally includes all the vertices of the 1st order arc centered subgraphs formed on each arc except the original defining arc. This is illustrated in figure 3.3c. Let the notation for arc centered subgraphs be H Vertex-centered sub-graph A 1st order vertex centered subgraph H

Numerical vertex attributes
The vertices of atomic graphs have several associated numerical attributes which can be assumed to be correlated to the symmetry number or atomic species. The angles formed in the graph, as well as the pixel intensity in the same image location, is among these, and they are defined below.

Alpha angles
As was discussed when introducing atomic graphs, the symmetry number n i determines the out-degree of the vertices and places this number in the range {3, 4, 5}. This guarantees that every vertex v i will be adjacent to at least three other vertices, forming at least three arcs The three angles in between these arcs can be calculated, and the minimum and maximum of these angles correlate to the symmetry number n i , which again is related to the species of the column. Let these 3 angles be the alpha angles, and they are illustrated in figure 3.5. The distribution of this attribute is explored further in section 3.3.

Theta angles
In the column-centered sub-graph of a vertex v i , the central angles formed between the arcs extending into or out from v i , should once again provide angles that should give clear indications of n i . Of particular interest is the minimum, maximum and average of these angles, here termed the theta angles, with special interest in the maximum θ max , minimum θ min and the average θ. In contrast to the alpha angles, which are used to get a statistical prediction from nothing more than projected distances, the theta angles are a bit more selective. Only angles that span a mesh corner that is consistent with the structural principles 15 , meaning that they have a vertex subset cardinality of 4. Considering again figure 3.4a, this would exclude the angles ∠(v 3 , v 2 , v 5 ) and ∠(v 5 , v 2 , v 8 ), successfully eliminating the erroneous data from being used in the classification. By the same token, it would also eliminate the non-erroneous angle ∠(v 2 , v 8 , v 5 ) from figure 3.4b. The distribution of the theta angles is explored in more detail in section 4.3.1.

Gamma
So far, the θ and α angles give some indication of the symmetry number n i , which then can be related back to atomic species. Knowing the symmetry number does not guarantee knowledge about the species though. For instance, since both Cu and Si columns have n i = 3 in the structural model, another attribute is needed to separate these. Luckily the column intensity in HAADF-STEM images, are related to the atomic number of the atomic species that occupies the column. Si and Cu differ a lot in atomic numbers compared to the other alloying elements of the systems under consideration here, meaning that the intensity can hopefully be used to assess species. Before one can use intensities as an advantage in this characterization though, there are some hurdles to overcome.
The value of the brightest pixel in a column, is termed the peak gamma. The average pixel values within the circle with radius r centered at the column position, is here termed the average gamma, with r being the same approximate atomic radii as discussed in section 3.1.2. In an attempt to make the gamma values directly useful in the statistical model, define a gamma normalization standard. Seeing as how the intensities are relative to the baseline of each image, as well as the sample thickness etc, some simple assumptions are made, that despite lack of merit, turns out to be useful. Measure first the intensities, excluding columns in the particle, and calculate a normalization factor such that the average intensity of the matrix takes on a pre-determined value. This obtains the normalized peak gamma γ peak and the normalized average gamma γ avg . Considering some of the factors that this procedure ignores, like Cu-rich columns in the matrix potentially shifting the normalization, or sample with different relative thickness, the resulting distributions will have a higher variance, but the method still seems useful. This is discussed further in section 3.3.

Structural principles in atomic graphs
When representing an atomic structure with an atomic graph, it is with each NN in the opposite plane included in the partner set of the corresponding vertex. Following this logic, the entire graph should include the inverse of each arc, an therefore be symmetric. The consequences of the structural principles, as discussed in section 2.2.3, in terms of their atomic graph representation, can then be formulated in the following manner.

For every vertex {v
2. For every arc {a j = v k v l }, the component vertices must belong to opposite planes such that ζ k = ζ l .

Graph operations
Section 3.4.2 attempts to develop a column characterization algorithm on the atomic graph framework, which requires the definition of some elementary operations. Let these graph operations be categorized as weak operations if they are limited to changing the successor of arcs, and as strong operations if they are enabled to also change symmetry numbers.

Arc permutation
The methods of AutomAl 6000 sometimes calls for an arc permutation, which means to change which vertex an arc is adjacent to, without changing the pivot vertex which the arc is adjacent from. In the vertex map implementation of atomic graphs, this involves permuting two elements, the original successor and the new successor, in the district of the pivot vertex, and then updating all sets in the area to reflect the change and to ensure that the definitions of table 3.2 remains true. Let the notation for an atomic graph arc permutation of the arc beP i (j, k), where i is the index of the pivot vertex, j is the index of the original successor vertex of the arc, while k is the index of the new successor vertex. An illustration is given in figure 3.6.

Weak arc termination
Weak arc termination is concerned with removing a specific arc, without changing any symmetries, which in effect means to relocate the arc, rather than to terminate it, despite this methods designation. To do this, AutomAl 6000 uses a set of rules in an attempt to find an alternative arc direct successor. If no alternative is found, the arc remains unaltered. Let v i v j be an arc subject to weak arc termination. Furthermore, let the search-space S for an alternative successor v k , be the set where H vertex (v i ) is the 1st order vertex centered subgraph of v i , and N + [v i ] is the closed outneighbourhood of v i . If there are elements in the intersection of the in-neighbourhood of v i and S, then these are prioritized as long as the "would be" arc v i v k will generate the desired cardinality in the subsets V (H (1) mesh (v i v k )) and V (H (1) mesh (v k v i )), which should be 4. If a suitable candidate within parameters is found, perform the arc permutationP i (j, k).

Weak arc preservation
Weak arc preservation on a un-symmetric arc v i v j , will attempt to find an arc v j v q that is suitable for permutationP j (q, i), which would create the inverse arc of v i v j , namely v j v i , thus making it symmetric. The search space S for v q will be S = P + (v j ), (3.19) where P + (v j ) is the out-semi-partners of v j . This set does not include v j , because v i v j is assumed to be un-symmetric for this operation to be meaningful. If a vertex v q in S is found to be such that both H (1)arc (v j , v q ) and H arc (v q , v j ) have cardinality 4, then the arc permutationP j (q, i) is performed.

Strong arc termination
Strong arc termination attempts to remove an arc v i v j , by reducing the symmetry number of v i , and thus also the associated atomic species, which is the defining feature of a strong operation. If this cannot be done, for instance if n i is already equal to the minimum number in the range of the symmetry number, then no action is performed.

Strong arc preservation
Strong arc preservation on an arc v i v j , attempts to preserve the arc by increasing the symmetry number n j of v j . If this is not possible, no action is performed.

Statistical model for numerical vertex attributes
AutomAl 6000 features a flexible data management module, which can be used to construct custom statistical models from selected data. In this section, the specifics of the default model which is featured with AutomAl 6000, are discussed.
Being asked to produce an atomic overlay, is in essence a classification problem. For a given data point, the statistical model must produce a probability vector with probabilities of how likely it is that the data point belongs to a certain category of the nominal attribute. It might seem obvious that the nominal classification attribute is the atomic species of the column. However, for both physical reasons, and practical reasons 16 , the nominal attribute which the default model classifies by, is arbitrarily termed the advanced species, and is a higher resolution classification system that aims to capture details that would break normality in a less considerate classification system. For instance, the distribution of Si α-angles in Q , is better modelled with two normal distributions, than with one, due to the difference between the structural functions between the in-plane NN Si-Cu position, and the Si without Cu as NN. This is because the structural environment is different between these positions, thus systematically causing some "streching" in the triangular symmetry, making it less symmetric in the case of in-plane Si-Cu, which in turn effects the expectancy values of the α-angles. Another example is the α max attribute for Mg, which can randomly take on one of two different expectation values, due to how α-angles are defined 17 . In this scheme then, each advanced species category must map to an atomic species, and a symmetry number. The categorization used in the default model is shown in table 3.3. This particular scheme is supported by experience and extensive trial and error.   figure 3.7, where both the overlays and atomic graphs have been manually corrected, and for each advanced species category, a mean vector µ and a covariance matrix Σ is calculated using equation 2.14 and 2.15. The determinant |Σ| and inverse covariance matrix Σ −1 is then calculated with Scipy's linalg module [14].
In figure 3.8, the results of this are shown with the individual attributes fitted to 1D normal distribution curves. This model can now make predictions on new sets of numerical attributes by using equation 2.12 with the appropriate parameters for each advanced species, thus producing a probability vector. The model can also make predictions from incomplete attribute data. By substituting missing data with the appropriate means, the model gives the best possible prediction from the data. This is especially useful at the initial stages of the column characterization, when only α-angles are available. More on this in section 3.4.3.

Column characterization
Structural principles and vertex statistics, both now set in atomic graphs separately, are in this section joined together to build an algorithm that can produce a column characterization 18 of a HAADF-STEM image. The column characterization consists of a whole slew of sub-algorithms. These are not all explained in their methodical detail, rather, their purpose is explored. The main feature of the algorithm, termed untangling, are explained in more detail however, as well as the eventual arrangement of methods.

Spatial mapping
The spatial mapping assigns the initial district of each vertex in the atomic graph. As was discussed in section 3.2, this involves finding the 10 closest vertices of each vertex, where closeness is measured in projected separation. This implementation calculates the projected separation between all the vertices of the graph and stores the values in the projected separation matrix, which is a n × n matrix with elements where n is the order of the atomic graph. The initial district of vertex v i will then be the vertices with column indices corresponding to the 10 smallest values along the row i.

Precipitate detection
It is important for many aspects of the analysis, that the particle can be identified in distinction from the aluminium matrix. Of most notable importance, is the need to assess exclusively the matrix pixel intensities so as to determine the normalized gamma attributes. Any vertex labelled as Al as its atomic species, is labelled as belonging to the precipitate if it has more than one non-Al vertex in its neighbour set.  Note the separation of the means between Mg 1 and Mg 2 in the α max attribute, as well as the separation between the Si 1 and Si 2 means in the α min attribute. The variance of the θ attribute is manually inflated, because of course, the average theta angle in a symmetric graph, will always be 2π.

Zeta analysis
Determining the Boolean ζ-value of each vertex is an integral part of the problem, that is, determining which plane a column belongs to, see section 3.2. The idea behind this sub-method, is a repeated election system. Start with a specific vertex, and give this vertex a vote of 1. All other vertices then have a vote of 0. Then let every vertex give a vote to the vertices in its partner set. If a vertex has 0 votes, its partners will also receive 0 votes. Thus, on the first election, only the partners of the single vertex with vote 1, will receive votes. On the next round, the original vertex will receive additional votes from its partners, who now have voting power, strengthening the voting power of the original vertex in a positive feedback loop. This causes the original vertex to act as a source for the entire graph, and thus making sure that the vertex zeta states are relative to the original vertex, and not only to their immediate local vertices. Repeat the elections several hundred times to achieve a convergent voting state across the graph. This method has been termed zeta analysis. This discussion is limited to this short qualitative explanation, because it is at the very forefront of the most recently conceived methods of AutomAl 6000, and as such is not yet fully refined, as is necessary for a technical discussion. The zeta analysis is very promising though, and performs very well, until it does not. If there is a point in the graph that changes the polarity of the voting, some areas of the graph will have inverted zeta values. This means that the more accurate the graph is, the more robust the zeta analysis is. There is perhaps some avenues of improvements to this method, which could potentially make it effective and reliable at determining the column planes, even when only projected separations are known. This could lead to a more general column characterization algorithm, and the possibility of this is discussed in section 4.3.2.

Arc intersection detection and termination
For each arc adjacent to or from each vertex, this method calculates the intersection point between the arc and the arcs adjacent to or from the neighbours of the vertex, as well as the neighbours of the neighbours of the vertex. If an intersection is found, this intersection is eliminated by somewhat brutally changing the atomic species of the involved vertices.

Sub-set mapping
The map of each vertex needs to be regularly updated as changes to the graph occur. This update occurs often during column characterization, and it simply uses the definitions of table 3.2 to assign these sets throughout the graph.

Untangling
Given an atomic graph G, the goal is to alter it in such a way that all the atomic graph rules given in section 3.2.4 are obeyed. An atomic graph that correctly describes an atomic structure, should be symmetric and therefore only contain symmetric arcs. To summarize the method, which is explained in more detail below, then for every un-symmetric arc v i v j in G, analyse the 1st order arc centered subgraph. These subgraphs can be categorized according to the order of the meshes in the subgraph, and again sub-categorized according to the content of symmetric and un-symmetric arcs in the meshes. This provides AutomAl 6000 with a map of which operation to perform for each given subgraph. This has been termed untangling. If AutomAl 6000 encounters a subgraph that is not within the map, no operation is performed.
Finding all the un-symmetric arcs in a graph is very straight forward, given the vertex maps. Using set building notation, all un-symmetric arcs v i v j of G, can be found in the following manner For each of these arcs, produce the 1st order arc centered subgraph H (1) arc (v i , v j ). Now, classify the subgraph according the combination of the cardinality of the meshes, and call this its class. Furthermore, classify the specific configurations of symmetric and un-symmetric arcs in the subgraph, and call this the configuration of the subgraph. The current version uses the classes and configurations that are shown in figure 3.9, in order to determine how to attempt to resolve the original un-symmetric arc v i v j . The classes are given arbitrary numbers, while the configurations are given arbitrary capital letters within each class. Each vertex in the subgraph is assigned lowercase letters in each class, except i and j, which are reserved for the subgraph-defining arc. The classes and configurations are not labelled with their alphanumerical designations in the figure, since these are arbitrary. Figure 3.9 is meant to illustrate the general idea with untangling, rather than to document its actual content. As can be surmised, only a minority of the configurations have an associated strong operation. The idea is that weak untangling should bring the most common graph configurations into more predictable ones, and that strong untangling will only be applied to highly predictable configurations. The strong operations are suppressed as much as possible, because it will compete and sometimes override the predictions of the statistical model. Weak untangling is always performed first, preferably more than once, before strong untangling is done.
There is no rigid logic behind the specific contents of the untangling map, rather, it is arrived at through consideration of actual results. There is a balance here, on the one hand, one could solve any specific subgraph by adding the appropriate operation to the map, while on the other hand, this could potentially break generality and produce unpredictable results in other cases. The configurations that are added in the map shown here, all have mostly predictable behaviours. This does not amend the fact that this untangling approach is imperfect. One strategy is to extend the range of the map, perhaps even into 2nd order subgraphs, but this has a taste of unrestrained growth in complexity. Another strategy is to circumvent untangling with other methods, which is discussed in section 4.3.2. Fig. 3.9: Map of the arc centered subgraph classifications by class and configuration, which is used by AutomAl 6000 for untangling atomic graphs. Neither the classes nor the configurations are exhaustive, but capture the most common subgraph configurations encountered in un-symmetric atomic graphs in a practical sense. The legend in the lower right corner explains which operation AutomAl 6000 will attempt to perform for each given configuration during weak or strong untangling.

Composite algorithm
The column characterization algorithm of AutomAl 6000 is a combination of all the methods that have been discussed previously. The sequence in which these methods are run, encrypts a complex behaviour, and different sequences will produce different results. The general idea is to take a very general starting point, with every vertex defined with the atomic species Unknown (Un), and thus symmetry numbers set to 3. Use the alpha angles to make an initial classification prediction with the default model across the graph. This classification is only expected to be as good as the graph information, so will produce many erroneous predictions. Next, apply untangling and thus hopefully improve the symmetry of the atomic graph, so that the graph will better represent the actual atomic structure. Now, the model is supplied with more reliable numerical vertex attributes, which improve the prediction capability of the model. Untangling is then applied again, and so on, until the method converges at its best result. However, this is only the qualitative thinking behind the particular sequence below, which of course has a much more complex underlying dynamic to it, which is highly nontrivial to unravel in a technical sense. The currently implemented sequence for the full column characterization algorithm, which assumes an accurate column detection, is:

A detailed example
The precise sequence presented in the previous section, section ??, is best illustrated with an example. This section will review the column characterization algorithm on a specific HAADF-STEM image, and present the state of the atomic overlay and atomic graph at each key step. Note that these atomic overlays will only show the atomic species, and not the advanced species of each column, and all though AutomAl 6000 can be configured to display advanced species overlays, the advanced species is mostly a service to the underlying statistics. The probability of a column belonging to a certain atomic species, is the sum of all the probabilities of each advanced species which maps to that particular atomic species.
The subject image of the example is presented in figure 3.10. Figures 3.11-3.21, tells a visual story about the interplay between the different constituent methods of the column characterization algorithm.   The atomic overlay and atomic graph after step 1 and 2 of the column characterization algorithm. (a) Atomic overlay. This example image is cropped so that details will be visible, but this means that the edges of the overlay is not visible, where the columns are now set to Al, and will be disregarded for the rest of the algorithm. (b) Atomic graph. Even though districts are now assigned (step 1), the map of table 3. 2 has not yet been applied. There are still many wrong predictions, but as can be seen, the alpha predictions are good enough to more or less correctly label the Al matrix. (b) The atomic graph will be identical to the previous step, because even though the symmetry numbers across the graph has now changed, the map of table 3. 2 have not yet been updated. The atomic graph is now becoming more sensible. There are still many red un-symmetric arcs in the graph, especially in the precipitate, but due to the particular sequence, these are often predictable and will thus hopefully be effectively captured by the untangling map at a later stage. (a) In the AutomAl 6000 GUI, to show/hide matrix columns is one of many display options, which in this case is set to not display matrix overlay. At this point, the parts of the matrix that is correctly labelled, will not change during the rest of the algorithm, because all the information will only be supporting the current labelling, meaning that the atomic graph and overlay has already converged to its ideal prediction. Therefore the matrix is elected away from the overlay to declutter the overlay and bring forth the particle. (b) The atomic graph makes no distinction between particle and matrix vertices, so no changes at this stage. Step 9 provides no visual indication in the overlay or graph, and this figure shows the overlay and graph after the sequence of steps 9, 10 and 11. That is, the transition from the previous figure (figure 3.18), to this figure, shows the effect of weak untangling, followed by strong untangling. (a) Atomic overlay. Untangling is designed to induce the minimal amount of changes to symmetries, and thus also to atomic species. Thus, the overlay is not expected to change very much, but we can for instance observe that strong untangling has correctly changed the atomic species of some of the matrix columns that was previously mislabeled. (b) The atomic graph changes dramatically at this stage, and now provides a much better informational representation of the actual atomic structure. The transition from the previous atomic graph, to this one, provides some merit to the use of the term "untangling" for this procedure.  Fig. 3.20: Another round of zeta analysis is performed in step 12, using the now much more improved relational information of the untangled graph. (a) Atomic overlay. (b) Atomic graph. The voting system of the zeta analysis will only vote along symmetric arcs. The key for this to work, is that no vertex has more erroneous symmetrically adjacent vertices, than correct un-symmetrically adjacent vertices. The low density of un-symmetric arcs in this atomic graph, is thus promising for the effectiveness of the zeta-analysis. Step 15, which repeats the steps 8, 9, 10, 11 and 12, will now produce the final result of the column characterization. (a) Atomic overlay. Some mislabeling of Si versus Cu is apparent. The matrix is correctly labelled, and thus not shown in the overlay. (b) The atomic graph still has some areas that stand out for the user to analyse. Much effort is put in to these methods such that the symmetric areas of the graph need not be closely inspected by the user, and can mostly be trusted as a correct analysis. The exception to this is the labelling of Cu versus Si. The atomic graph looks identical for these two species, because they have the same symmetry number. In this case, many of the Si columns are mislabelled as Cu, which is mostly on account of the default model. (b) Atomic graph. Despite now having been reviewed by an individual versed in atomic graphs and Al-Mg-Si-(Cu) precipitates, this is still not a perfectly symmetric graph. This is because, in some real-life scenarios, the structural principles of section 2.2.3, will not always hold, which is the case here. Note especially the un-symmetric arc in the top middle of the precipitate which indicates a vertex position with a symmetry number of 6. Since these are not included in the model, no vertex v i can have deg + (v i ) = 6, but these 6-fold Mg positions do in fact seem to appear with some regularity in hybrid precipitates. 47

Discussion
This section provides some discussions on some of the most prominent topics that have come into light during development.

Column detection
Improving the column detection method of AutomAl 6000 could potentially improve several aspects of the software, like the accuracy of atomic positions, the effectiveness of the column characterization and/or the user experience by minimizing the amount of required manual intervention needed. Some specific suggestions on column detection improvements are discussed further in section 6.4.

Atomic graphs
Atomic graphs was developed as a necessity for the approach that was chosen for the methodology of this thesis. As such, the properties of these atomic graphs are strictly defined in service to the overarching goal of creating a successful algorithm for column characterization. However, using graphs in this way to describe atomic structure, also has some interesting merits. The first theorem of digraph theory, equation 2.7, is still valid on atomic graphs. Also, an atomic graph that represents a structure which complies with the principles of section 2.2.3, always has an average vertex degree of exactly 4, which can perhaps be linked to principle 5 in the atomic graph representation of the structural principles of section 3.2.4. Tying the graph current atomic graph descriptions back to actual physics, certainly seems possible. Precipitate number displacement (the number of columns in the precipitate versus the number of columns in the hypothetical FCC matrix which the precipitate displaces), packing fraction, polarization and possibly other properties, seems possible to link to the atomic graph properties. If the column characterization can be made more general then, AutomAl 6000 can possibly become a very advanced analytical tool, that can tell a lot about the image in a very short time.
There have been some other groups which seems to have deployed advanced graph theory to model atomic structures, of which notable mentions are [15,16]. These papers have used undirected graphs to handle structure data, but the novel idea by the work presented herein, is to use digraphs to allow for a more flexible an algorithmic approach to discover the atomic structure. So atomic graphs might be a very purpose driven construct that has little use except the actual inner workings of the column characterization of section 3.4, but they might also have some more general usefulness, which is left as an open question here.

Column characterization
In general, the column characterization is not yet performing as well as it could. While the results are decent for many images, it leaves a lot to be desired for other images. An effort to make the method more general, robust and well understood, is an ongoing effort. The main conceived venues for improvements, lies either with the statistical model, the untangling or other as of yet unexplored methods. This section attempts to enlight some of these subjects further.

Statistical models
In section 3.3, the distributions of the default model was presented without much defense of its details. This section will examine the sample data a bit more closely. One of the complexions of the default model, is of course the advanced species nominal attribute. To illustrate a reasoning for this approach, examine the normal distributions of one single numerical attribute, the α min attribute. In figure 4.1a, the normal distribution of the Si atomic species nominal attribute, is compared against the normal distributions of the advanced species nominal attributes, Si 1 and Si 2 . This "streching" in the symmetry of the physically different structural functions, are captured by the separation of the means between the advanced species attributes. In figure 4.1b, the other example which was mentioned in section 3.3, is illustrated. Here, the separation of the means in the advanced species nominal attribute, is not a physical one, but a consequence of the definition of alpha angles, which can be predicted by examining figure 3.5. Figure 4.1c and 4.1d shows the corresponding normal probability plots, which is relevant to the following paragraph.
Once appropriate nominal classifications have been determined, another assumption of the default model, is that each numerical attribute is normally distributed within each nominal category. A check for normality is the normal probability plot, where each data point in the data is plotted against its z-score. Figure 4.2 shows the normal probability plots of each individual nominal and numerical attribute, and reflects the data that is used to calculate the distributions in figure 3.8. All though the attribute plots mostly fall on straight lines, which is a positive test for normality, the densities along each lines, seems to be slightly uneven. It is left as an open question here, whether there are other distributions that would better model the numerical vertex attributes of the atomic graphs. In particular distributions that are not symmetric about the mean.
Some of the numerical attributes can be considered in pairs in scatter plots, which sometimes illuminate effects that disappear when data is "black boxed" as just "data". Figure 4.3a shows a scatter plot of the α min and α max attributes, with the normal distributions calculated from the projection of the data on each axis. Individually, these normal distributions does not reveal the peculiar shape of the data, which is a result of the definition of the alpha angles. There are two sharp boundaries through this 2D data, which can best be explained by considering what a data-point outside these boundaries would entail. Take for instance a hypothetical vertex which had (α max , α min ) = (3,5,1,5). Both of these values are reasonably within the bell curve, so that the vertex could be considered an Al column for instance. But with these values, the "middle" of the alpha angles would be 2π − 3, 5 − 1, 5 = 1, 28, but this "middle" angle is now smaller than the assumed minimum angle, which means that such a configuration would be mapped into the data-point (3,5,1,28). The maximum and minimum theta angles can be plotted in a similar manner, which is shown in figure 4.3b. Figure 4.3c shows a scatter plot of the normalized peak gamma versus the normalized average gamma. This plot illustrates the effect of images that are "overexposed" in the data. (c) Scatter plot of γ peak versus γ avg . Notice how the overexposed images breaks the linear correlation between peak and average gamma. This also causes the variance of the Cu 1 curve to be a lot less sharp (low variance) than it could have been. A better data sample, would create a better model.

Zeta graphs
In section 3.2, it was hinted at more than one conceivable way to select the members of the outneighbourhoods from the districts. One such way, given a vertex v i , is to select the first n i vertices v j where ζ i = ζ j . In set-builder notation then, let If there are not enough vertices in D(v i ) to fill N + (v i ), then repeat the process by adding the first vertices v j that are not already in N + (v i ). When the out-neighbourhoods are defined in this way, with all the other sets of table 3.2 defined in the same way as before, assuming the out-neighbourhood of equation 4.1, then let the resulting graph be the zeta graph.
The strength which the zeta graph has over the regular atomic graphs, is that given correct symmetry numbers and zeta values for each vertex, the zeta graph will almost automatically be the correct representation of the atomic structure just from its definition, without any need for untangling or other advanced graph analysis. This is in stark contrast to the atomic graphs, which can appear densely un-symmetric and still have correct symmetry numbers and zeta values. This can be demonstrated with a reverse-engineering example. Take a precipitate and by some method assure that it is the correct overlay of the precipitate. Then reset the atomic graph by re-running the spatial mapping of section 3.4.1, which will reset the vertex districts. Then re-apply the definition of table 3.2 to make the atomic graph. In a different instance, apply the zeta graph definition to the same situation and compare the results. This demonstration has been performed to produce figure 4.4. Now, the zeta graph requires correct symmetry numbers and zeta values, but as it turns out, if at minimum the zeta values are correct, the zeta graph will still be a more simple object to extend the algorithm on, than the atomic graphs. So, if the zeta analysis method of section 3.4.1 can reliably assign correct zeta values a-priory, then this would be a much better, faster, accurate and reliable approach, than the untangling method. However, if the zeta analysis does not perform well, and some inversion occurs across the image, then this approach fails. This potential path of improvement with zeta graphs, hinges on how sophisticated the zeta analysis can become, which is left as an open question here.
Even if a column characterization based on zeta graphs will not bear fruits, the zeta graphs are still a useful tool as an interface between AutomAl 6000 and its user. The review of an atomic graph, requires the user to perform graph permutations on the graph, so as to manually untangle whatever problematic areas that remains. If it is a "difficult" image, this review can sometimes be a bit hard and demand some time and effort. However, if working in the zeta graph, all permutations are automatic, which means that the user must simply set correct symmetry and/or zeta values, which can be done at the push of button, and is thus much faster. Unless the zeta graph is to far removed from the actual situation it is supposed to represent, and once one gets used to the behaviour of the zeta graph, it provides a much smoother interaction with the AutomAl 6000 data-structure. Note that all though every vertex symmetry number and zeta value are correctly labelled, the atomic graph has a very low symmetry density in the precipitate. (d) The zeta graph after reassigned districts according the the spatial mapping method, and the definition of the zeta graph. Note that the zeta graph is unchanged by the reordering of the districts, and remains the correct graph given correct symmetry numbers and zeta values.

Algorithms versus machine learning
Machine learning is a procedure that is swiftly becoming a common tool when approaching difficult problems. The method is concerned with constructing a neural network, much like a weighted graph, which will produce some output for a given data input, see for instance [17]. The range of problems that machine learning is applicable to, is vast. One of the requirements for machine learning, is that it requires a lot of data, appropriately configured, labelled and formatted for the training, which is most commonly done by gradient decent on the entire space of all the parameters in the graph. And even with a large set of training data, neural networks will typically suffer from diminishing returns after a certain point. This entails that by doubling the available training data, one does not necessarily double the accuracy which the neural network is able to achieve, thus producing a theoretical roof on how well the neural network can perform. Some approaches to atomic resolution image analysis using neural networks has already been done, notably [18,19]. One advantage with an algorithmic approach though, is that developing an algorithm will offer much more insight into what is going on. All though neural networks can be analysed in many different ways, how it actually works can often be totally or partially encrypted in the sometimes vary large neural networks (millions of parameters in some cases). In the future, databases of overlays, conceivably created with AutomAl 6000, could potentially become large enough to enable machine learning as an alternative or symbiotic approach.

Conclusion
When a researcher is manually creating an atomic overlay of an Al-Mg-Si-(Cu) precipitate by using the structural principles, he or she will in general not consider this a difficult task, albeit a tedious one. Excluding difficult and/or unusual areas, and given reasonable familiarity with the 6000-system, the overlaying process is fairly straight forward, cognitively speaking. To translate this menial task into an algorithm, has turned out to be not so simple though. There are sometimes many advanced concepts and hidden layers in the cognitive process when a human applies its "intuition", "understanding" and/or "experience". The translation of the overlaying process into an algorithm has required the application of statistics, graph theory, specialized methods and a delicate interplay between the three. It requires significant effort from AutomAl 6000's algorithms to become confident of the symmetry of a column, that is, whether it is 3, 4 or 5. In most simple cases, a human can recognize this instantly, seemingly without much effort.
The resultant algorithm presented in this thesis is probably not the ultimate approach to this problem, but even if it can not outperform a human in accuracy, it can of course outperform a human in speed.
The required manual review of the algorithm outputs, will typically require a lot less of a researches time, than manual overlaying, and this manual revision is amply aided by the visual simplicity of the atomic graphs. There is also an inherent strength in having the atomic structure ordered in a deliberate data structure, as it can be queried in almost any conceivable way. Manually counting atoms to determine the composition of a particle should no longer be necessary, and novel statistical insights seems to be within reach, perhaps through a systematic application of AutomAl 6000 at the hands of an adept researcher. Examples of studies that potentially could have benefited from AutomAl 6000, either for the purpose of extracting composition statistics from images or to create atomic overlays, are plentiful [20,21,22,23,24,25].
56 Some work still remains, before AutomAl 6000 can be confidently applied as an exploratory research tool. Some of the most pressing issues that remain, as well as some of the most promising ideas, are presented in this section.

Technical docstrings
A docstring is a string of text that appears in the preamble of a method or class, that clarifies the arguments, exceptions, purpose and special considerations of the method or class.

Exception protection
The source code for AutomAl 6000 systematically lack any exception handling, even in critical areas. This could potentially be a source of frustration for users, due to unexplained and frequent crashes and bugs. If AutomAl 6000 is to be accepted as a useful tool by the community, making a stable version is of very high priority.

Additional export and import formats
At the time of writing, the data of AutomAl 6000 can be exported as a Comma Separated Values (CSV) file, which is a commonly used file format for parsing tabular data between different software. There are however some additional export formats that would be highly useful in this case. In particular, the Scalable Vector Graphics (SVG) file format, which would allow users to export the overlay data into a file format that could be edited with vector graphics software, such as Inkscape [27]. This would be highly beneficial, because overlay graphics could be exported with preserved layered data, which would enable researchers to generate publishing quality graphics from AutomAl 6000 overlays, which has rather limited graphical customization on its own. The SVG file format is an open standard with available documentation, and should therefore not be to hard to implement as an export format.
The current version of AutomAl 6000 can only import DM3 files, but there should of course also be some more options here. 19 Citation needed 57

Improvements to column detection
Desired improvements to column detection includes a way to determine the ideal threshold value automatically at run-time, as well as a general improvement to its accuracy. Another option all together is to perhaps allow other open-source software that are specialized on column detection to take part in the AutomAl 6000 process at this stage, either by directly importing functionality from such modules into AutomAl 6000, or by crating a conversion tool that could convert file formats such that AutomAl 6000 would be able to read out the atomic positions. Atomap is a column detection software that is already widely in use by the community, open source and is rather sophisticated [12]. The lack of effort to develop advanced column detection for AutomAl 6000, is partly on the account that this problem is in some sense already solved. To conclude this short discussion, the column detection method as it is described in section 3.1.2, all though possibly adequate for a familiar user, is a dated minimal working naive conceptualization and should be considered for a partial or total re-work.

Improvements to column characterization
As was discussed in section 4.3.2, the zeta analysis together with zeta graphs, seems like an interesting venue of exploration for a more general column characterization method. Before this can be explored, a very robust version of zeta analysis must be arrived at. Another immediate improvements to the column characterization, would be to make more refined statistical models. In particular, the current version is not very good at distinguish between Cu and Si. A model build on a larger data set, where all overexposed images was removed, would decrease the variance of the Cu curves in the normalized peak gamma and normalized average gamma attributes, and thus improve the separation between Cu 1 , Si 1 and Si 2 . If the methods of the column characterization can be further generalized, it should also be capable to be extended to also work on phases that does not strictly adhere to the structural principles, like β , U 2 and vacant columns 20 . Also, if the statistical models can become sophisticated enough, it should also be possible to take precipitates in other alloys into account, like alloys containing 21 Zn or Ag. How much these methods can potentially be generalized, is still an open question here, and is an exciting topic.

Optimizations
The current implementation of AutomAl 6000 has many as of yet un-optimized areas, many of which where optimizations are desirable. Especially for large images, some of the GUI functionality can feel a bit unresponsive, making the software feel sluggish. It is perhaps even more important to improve the responsiveness of the GUI, than it is to increase the speed of the algorithms, which already is not too bad on a decent computer 22 . This is because a slow GUI will often feel more frustrating to a user, 20 Like in the particular β arrangement, discussed in figure 13b in [5]. 21 Some of these alloying elements can take on different symmetries and structural functions in the atomic structures, but if the advanced species nominal attribute capture all possibilities, they might be separable from the other categories. 22 For an image of approximately 4000 columns (which is the case for the image in figure 3.7b), expect a time of 5-10 minutes for column detection, and 5-ish minutes for column characterization, on a typical office computer. Manual than the time it takes AutomAl 6000 to run column detection or column characterization, because while the algorithms are running, the user is free to do other things, while that would not be the case while waiting for the response from a button click. Deleting, moving or creating new vertices is particularly slow, due to how these actions are implemented in the background. Improving the responsiveness in these, and similar actions, is likely to improve the user experience of the software.

Improvements to the GUI
Some work is still needed on the GUI, in particular, many of the elements are yet to be updated to reflect the new functionality of the new data-manager module, as well as the new graph module. Some increased customization capabilities of the graphical elements are also desirable. All though the GUI receives very little attention in this text, due to not being very relevant from the perspective of a thesis in physics or mathematics, it is still an important consideration if the sanity of a potential researcher using AutomAl 6000 extensively, is to be preserved.

Web-page and bug-tracking
For AutomAl 6000 to become a useful tool to the community, it could do with a more professional distribution practice, than what it has currently. These practices, like a proper issue-tracking practice on GitHub and a professional looking web-page, is a bit outside the scope of this thesis work, but should be considered going froward, on account of the potential longevity of AutomAl 6000 as a useful tool.
revision of column detection result can be expected to take approximately 10 minutes, and manual revision of the column characterization can take anything between 1 minute and 1 hour, depending on the contents of the image. docstring A string of text in the preamble of a method or class, which which clarifies the input/output and purpose of the method or class. A docstring should comply with a specific style guide. 57 elementary operation A simple graph perturbation, also termed a graph edit. 9, 24 graph A combination of a non-empty set of vertices and a possibly empty set of two-element arc subsets of the vertex set. 8 hybrid precipitate A precipitate with two or more phases present in its structure. 5 in-degree The cardinality of the in-neighbourhood of a given vertex. 10 in-neighbourhood The set of all vertices adjacent to a given vertex. 10 in-semi-partner A vertex is in the in-semi-partner of another vertex, if it is in the in-neighbourhood of that vertex, but not the out-neighbourhood of that vertex. 18 induced subgraph A subgraph build from a subset of a vertex set such that the arc set includes all valid arcs. 9, 20 Inkscape Vector graphics software. 57 intersection The intersection of two sets contains all elements that occur in both sets. 7 interstitial In between. In the context of Al, positions that don't lie on any of the FCC lattice points. 6 inverse arc If a directed arc extends from A to B, the its inverse extends from B to A. If the arc set contains the inverse of an arc, both arcs are symmetric. 10 iterable Any data type that can be iterated over. Examples in python includes lists, sets, dictionaries, tuples and strings. 7 loop An arc connecting a vertex to itself. 10 machine learning Computer base method for training neural networks to solve problems. 55 map The collection of sets that define the graph at a local level for a given vertex. 18 membership The objects in a set are members of that set, and not members of the complement of that set. 7 mesh centered subgraph A subgraph centered at a mesh. 20

Miller indices A system for indexing directions and planes in crystals. 4
multigraph A graph the can contain more than one identical arc. 10 multivariate normal distribution A normal distribution on a multidimensional random variable. 12 neighbour Two vertices are neighbours if they are adjacent, see adjacent vertices. 9 neighbourhood The set of all vertices that are adjacent to or from a given vertex. 9, 17 neural network A graph-like structure used by machine learning and deep learning methods. 55 nominal attribute A categorical data attribute. 11 normal probability plot A plot of the standard scores against the data values. If the data falls along a straight line, it is a positive test for normality in the data, with some caveats. 12, 49 normalized average gamma The average pixel values in a circular area with radius equal to the approximate atomic radii, centered at a column, and after normalization against the Al matrix. pseudograph A graph that allow loops, that is, arcs that connect a vertex to itself. 10 search matrix A matrix used b AutomAl 6000 to define the search-space for new columns during column detection. 15 semi-partner Two vertices are semi-partners if they are adjacent by an asymmetric arc, that is, and arc that has no inverse. 18 separation The spatial distance between the center of atoms. 17, 66 set A collection of unique objects of a specific type. 7 set-builder notation A notational method to define sets by some rule on some range.. 7,9,17,53 simple digraph A digraph classification which excludes multigraphs and pseudographs. 10 size The size of a graph is equivalent to the cardinality of the arc set of that graph. 8 smart align A HAADF-STEM image acquisition technique, that matches and overlays information from several images. 28 species dictionary The classification system applied by AutomAl 6000. 26 spectral graph theory The study of the properties of the adjacency matrix representation of graphs. 10 standard score The standard score of a data value, is the mapping of that value in the standardization of its distribution, which is obtained by subtracting the mean and dividing the result with the standard deviation. 12 strong arc preservation Creating the inverse of an un-symmetric arc by increasing the symmetry number of the direct successor of the arc. 25 strong arc termination Removing an arc by, if possible, reducing the symmetry number of the predecessor vertex of the arc. 25 strong operation An atomic graph operation that can permute districts, and also symmetry numbers or atomic species. 24, 25 subgraph A graph where the vertex and arc sets are subsets of another graph. 9 super saturated solute solution A state of an alloy when the alloying elements are homogeneously distributed in the alloy, occupying crystal positions. 5 symmetric difference See disjunctive union. 7 symmetric digraph A digraph that for each arc, also contains its inverse. 11 symmetry The symmetry of a column is the number of opposite plane NNs as seen in projection for a given column in the model description of the structural principles. 6 symmetry number The symmetry number of a vertex determines the size of its out-neighbourhood, that is, its out-degree. 17, 53 theta angle The central angles formed in a vertex centered subgraph. 22 unary operation Graph operation which will produce a graph from a graph. 9 undirected graph A graph with undirected arcs. 10, 48 union The union of two sets contains all the elements that occurs in either set or both. 7, 17 untangling The process of solving an atomic graph into its most symmetric version, given a subgraph operation map. 31, 33, 43 vertex An abstract node acting as the fundamental building block of a graph. 8, 53 vertex centered subgraph A subgraph centered at a vertex. 20 vertex degree The degree of a vertex is equivalent to the cardinality of the open neighbourhood of the vertex. 9 weak arc preservation Creating the inverse of an un-symmetric arc by finding a candidate arc for weak arc termination that is adjacent from the successor of the arc to be preserved. 25 weak arc termination Removing a vertex by permuting the arc to a new successor, if a suitable successor candidate is found. 24 weak operation An atomic graph operation that can permute districts, but not symmetry numbers or atomic species. 24 Z-contrast The number of electrons scattered by the sample, indicated as a pixel intensity in HAADF-STEM images. 3

A Source code overview
This section provides an overview of AutomAl 6000's source code, its modules and how they are connected. More in depth and up to date technical documentation can be found at http://automal.

A.1 Overview
In general, there are four main parts to the software. The modules related to the GUI, a central core which handles the project files and acts like an Application Programming Interface (API) to the GUI, the modules that implement atomic graphs and the column characterization methods, and finally peripheral utility modules and other parts that are used by all parts of the software. Figure A.1 shows the import structure of the source code. Note that the core functionality can operate without the GUI, which was always a desired design aspect, so that AutomAl 6000's methods can be imported elsewhere, even if it is not optimally configured for that as of yet. The column detection method is contained in the core module.