Quantitative analysis of spectroscopic Low Energy Electron Microscopy data: High-dynamic range imaging, drift correction and cluster analysis

For many complex materials systems, low-energy electron microscopy (LEEM) offers detailed insights into morphology and crystallography by naturally combining real-space and reciprocal-space information. Its unique strength, however, is that all measurements can easily be performed energy-dependently. Consequently, one should treat LEEM measurements as multi-dimensional, spectroscopic datasets rather than as images to fully harvest this potential. Here we describe a measurement and data analysis approach to obtain such quantitative spectroscopic LEEM datasets with high lateral resolution. The employed detector correction and adjustment techniques enable measurement of true reflectivity values over four orders of magnitudes of intensity. Moreover, we show a drift correction algorithm, tailored for LEEM datasets with inverting contrast, that yields sub-pixel accuracy without special computational demands. Finally, we apply dimension reduction techniques to summarize the key spectroscopic features of datasets with hundreds of images into two single images that can easily be presented and interpreted intuitively. We use cluster analysis to automatically identify different materials within the field of view and to calculate average spectra per material. We demonstrate these methods by analyzing bright-field and dark-field datasets of few-layer graphene grown on silicon carbide and provide a high-performance Python implementation.


Introduction
Low Energy Electron Microscope (LEEM) is a surface science technique where images are formed from reflected electrons of low kinetic energy-down to single electronvolts.This is achieved by decelerating the electrons before they reach the sample and projecting them onto a pixelated detector after interaction with the sample.LEEM has proven to be a versatile tool, due to its damage-free, real-time imaging capabilities and its combination of electron diffraction with spectroscopic, and real-space information.This enables more advanced LEEM-based techniques such as dark-field imaging, where electrons from a single diffracted beam are used to create a real-space image, revealing spatial information on the atomic lattice of the sample [1,2].
Aside from usage as an imaging tool, LEEM is frequently used as a tool for quantitative analysis of physical properties of a wide range of materials.Multi-dimensional datasets can be created by recording LEEM images as a function of one or more parameters such as interaction energy E 0 , angle of incidence or temperature [3,4].Using this, a wide range of properties can be studied, for example, layer interaction, electron bands [5], layer stacking [2], catalysis [6], plasmons [7], and surface corrugation [8].However, to unlock the true potential of quantitative analysis of multi-dimensional LEEM data, post-processing of images and combination with meta-data is needed.In particular, it is necessary to correct for detector artifacts and image drift and to convert image intensity to physical quantities.
To this end, we here present a modular data acqui-sition and analysis pipeline for multi-dimensional LEEM data, combining techniques well established in other fields such as general astronomy or transmission electron microscopy (TEM), that yields high resolution spectroscopic datasets and visualizations thereof.In particular, we start with the correction of the raw data for detector artifacts using flat field and dark current correction, as illustrated in Fig. 1(a).Combining these corrections on the images with active feedback on detector gain enables High Dynamic Range (HDR) spectroscopy, which makes it possible to measure spectra over four orders of magnitude of intensity.Subsequently, we demonstrate that compensation of detector artifacts also enables drift correction with sub-pixel accurate image registration, yielding a fully corrected data stack (Fig. 1 (b)).This creates a true pixelby-pixel spectroscopic dataset, as shown in Fig. 1(c), i.e. every pixel contains a reflectivity spectrum of the corresponding position on the sample.Finally, we explore the potential for more advanced computational data analysis.We show that by using relatively simple dimension reduction techniques and clustering, these datasets can be intuitively analyzed and visualized, enabling semi-automatic identification of areas with different spectra.
To demonstrate these features and quantify the accuracy, we apply the drift correction algorithm to artificial data and then apply the full pipeline to a real dataset acquired on the SPECS P90 based ESCHER system in Leiden [9,10,11].The sample of the dataset is fewlayer graphene grown by thermal decomposition of Silicon Carbide (SiC) [12], followed by hydrogen intercalation to decouple the graphene from the SiC substrate [13,14].Bright Field LEEM spectra can be used to distinguish the resulting mixture of bilayer, trilayer and thicker graphene, as interlayer states cause distinct minima in the reflectivity spectra [15,16].In addition, the growth process causes strain-induced stacking domains, which can be distinguished using Dark Field LEEM spectra [17,2].The sample dataset consists of bright-field and dark-field LEEM images of the same area for a range of landing energies (sometimes referred to as LEEM-I(V)).The dark-field dataset uses a first order diffraction spot and tilted illumination such that the incident beam has the opposite angle to the normal as the diffracted beam, as described in more detail in Ref. [2].The data is available as open data [18] and is interpreted and investigated in detail in Ref. [2,19].

Detector Correction
No physical detector system is perfect, i.e. each detector system introduces systematic errors and noise.Knowledge of the sources of these imperfections enables the correction of most of them.The ESCHER LEEM has the classical detector layout: A chevron microchannel plate array (MCP, manufactured by Hamamatsu) for electron multiplication, a phosphor screen to convert electrons to photons and a CCD camera (a PCO sensicam SVGA) to record images of the phosphor screen.
The CCD introduces artifacts in the form of added dark counts and a non-uniform gain [20,21,22,23].Furthermore, the MCP gain is also spatially non-uniform, for example due to overexposure damaging of the MCP, resulting in locally reduced gain.Therefore we describe the measured intensity I CCD on the CCD as the following combination of the previously named detector artifacts and the 'true' signal I in : Where DC(x, y) is the intensity caused by dark current and G(x, y) is the position-dependent and as-of-yet unknown gain factor comprising all modifications to the gain.
To compensate for these detector artifacts, we employ techniques well-established in astronomy (and other fields using CCD cameras) to effectively invert the relation in Eq. ( 1), to extract I in (x, y) without the deleterious effects of background dark counts DC(x, y) and local gain variations G(x, y).
First, the dark current of the CCD is compensated by pixel-wise subtracting a non-illuminated dark count image, i.e. an image with the same exposure time as used for the measurement, but no electron illumination at all.A pixel-wise average of a set of such dark count images is shown in Fig. 2a.The dark current arises from thermal excitations in the sensor and varies over time with an approximately Gaussian distribution.The mean of this distribution is dependent on the pixel, i.e. the x, y location, for example visible in Fig 2(a) as a slight increase in the lower right corner.To suppress the thermal fluctuations in the template dark current image, it is desirable to average over several dark count images to prevent the introduction of systematic errors.We assume that the per-pixel dark currents are identically distributed with a variance Var therm except for a spatial variance Var spatial of the mean.This is mathematically equivalent to assuming the dark current fluctuates around its mean with both spatially dependent (but fixed in time) noise and timedependent thermal noise.By averaging multiple dark images, we reduce the thermal variance but not the spatial variance.The remaining variance is given by: Where Var(n) is used to denote the variance of n pixel-wise averaged images.By determining Var tot (n) and Var tot (1) experimentally we can isolate the thermal noise on a single image: For the ESCHER system with its Peltier-cooled camera, we find Var therm (16 × 250 ms images) = 114.3.Therefore, a set of 120 × 16 images (a total exposure time of 8 minutes) is sufficient to suppress the systematic errors to values smaller than the discretization error.We find that the dark count image does not significantly change over time, and therefore remeasuring dark count images is seldomly needed.
Second, to compensate for spatial gain variations, which are mostly due to the MCP, a (conventional) flat field correction is performed, dividing the full dark count-corrected dataset by an evenly illuminated image [24].In LEEM, at negative landing energies of E 0 ≈ −20 eV the sample behaves as a mirror, yielding an almost perfect flat field image as approximation of G(x, y) in Eq. (1).A relatively large value for the negative energy is taken to prevent artifacts from local in-plane electric field components, e.g.due to work function or height differences in the sample [25,26,27].For the ESCHER system, it is necessary to take flat field images within hours of the measurement, as the MCP wears over time and the gun emission profile and system alignment change on relatively short timescales [28].Furthermore, taking a flat field image at the same precise alignment as the measurement is preferred for two reasons.First, barring absolutely perfect alignment of the system as well as a perfectly uniform emission from the electron gun, the beam intensity is not spatially uniform.As illumination inhomogeneities are dependent on the precise settings of the lenses, these will also be compensated for if the flat field is recorded in the exact same configuration.Second, for proper normalization of the data, as explained in Section 3, the same magnification (projector settings) is needed.
An alternative to this mirror mode flat fielding is to average over a sufficiently large set of images of different positions on the sample and use the resulting average as a flat field image.In most cases however, mirror mode flat fielding is preferred over such ensemble-average flat fielding since for the latter many images of different locations are required.Even when such a set is already available, it is hard to rule out any systematic (statistical) errors.Lastly ensemble-average cannot provide proper normalization of the data to convert to true reflectivity.

High dynamic range spectroscopy
In LEEM and LEED, large variations occur in the amplitude of the signal, both within individual images and from image to image.For example, in LEED spectroscopy, features of interest are often orders of magnitude less bright than primary Bragg peaks.This necessitates a detector system with a large dynamic range.The CCDcamera of the ESCHER setup has a bit depth of 12 bits and a possibility to accumulate 16 images in hardware, yielding an effective bit depth of 16 bits for singular images.
For most materials, the reflected intensity I(E 0 ) changes over orders of magnitude as a function of E 0 .Starting in mirror mode, the reflected intensity tends to decrease roughly exponentially for E 0 100 eV.To obtain spectra with such a large dynamic range, the dynamic range offered by the bit depth of the CCD alone is not sufficient.
However, the gain G of the MCP, i.e. the ratio of outgoing electrons to incoming electrons, can be tuned by the  voltage V MCP applied over the MCP.This gain scales approximately exponential in V MCP (over a reasonable range, see next section), enabling image formation of approximately constant intensity on the CCD, for a wide range of incident electron intensities.We use this property to develop a scheme to further increase the dynamic range in which G(V MCP ) is adjusted by setting a new MCP bias for each new image, i.e. increasing the gain for images where the reflected intensity is low.Measuring V MCP for each recorded image and calibrating G(V MCP ) makes it possible to employ the full dynamic range of the CCD-camera for all landing energies, without losing the information of the absolute magnitude of the measured intensities, thus extending the range of spectroscopy without significant decrease in signal-to-noise ratio.

Calibration
Hamamatsu Photonics K.K., the manufacturer of the microchannel plate in the ESCHER setup, specifies an exponential gain as function of voltage for a part of the range of possible biases [29].To extend the useful range beyond this limit and thus enable the use of the full bias range up to the maximal 1800 V, the gain versus bias curve was calibrated as follows: 1. First, in mirror mode, V MCP is adjusted such that the maximum intensity in the image corresponds to the full intensity on the CCD, staying just below intensities damaging the MCP.
2. While decreasing V MCP , images are acquired for evenly spaced bias values.The intensities of these images form the dataset for calibration of the low bias part.3. Returning V MCP to the previous maximum value, E 0 is increased until the intensity of the image is so low that it is barely distinguishable from the dark count.4. Again V MCP is turned up until the maximum image intensity corresponds to the maximum CCD intensity.5. Steps 2. to 4. are repeated until a dataset is acquired starting at maximum MCP bias V MCP .The resulting curves are shown in Fig. 3(a).6.These datasets are then corrected for dark count as discussed above, resulting in the curves shown in Fig. 3(b).Comparing to the uncorrected curves, the increase in accuracy for low intensity values, crucial for accurate calibration, is very apparent.7. A joint fit of Eq. ( 3), allowing for a different amplitude A i for each curve, is performed to the corrected data to obtain a general expression for MCP gain G as a function of V MCP .The fit is performed using least squares on the logarithm of the original data with no additional weights, to ensure a good fit over the large range of orders of magnitude.The fitted curve is then normalized to a convenient value, e.g.G(1 kV) = 1.This normalization can be freely chosen, as G will be applied equally to datasets and flat field images, yielding absolute reflectivity as resulting data.
A first choice for a fitting function would be a simple exponential, but this would not account for any deviation from perfect exponential gain, visible as deviations from a straight line in Fig. 3(b).For the ESCHER setup we therefore choose to add correction terms of odd power in the exponent: Only odd powers were used to accurately capture the visible trends in the data.For the ESCHER setup correction terms up to order V 19 MCP (k < 9) turned out to give a satisfactory good approximation, as illustrated by the residuals in Fig. 3(d).

Active per-image optimization of MCP bias
The resulting curve with calibration coefficients is then used to actively tune the MCP bias during spectroscopic measurements: A desired range is defined for the maximum intensity on the camera, corresponding to a maximum safe electron intensity on the MCP to prevent damage on the one hand, and a minimum desired intensity of the image on the CCD on the other hand.Whenever the maximum intensity of an image falls outside this range, the MCP gain G(V ) will be adjusted such that the intensity of the next image again falls in the center of this range.
Assuming the intensity changes continuously, this method ensures the use of the full intensity range of the camera for each image, while protecting the MCP against damage.Additionally, after the measurements, the calibration curve is used to calculate the real, relative intensity from the image intensity and the recorded V MCP .By dividing this intensity by the intensity of the flat field image (taken in mirror mode and corrected for dark current and the MCP bias), we calculate a (floating point) conversion factor to true reflectivity values for each image.These ratios are added to the metadata of every image.By applying this conversion as a final step after any analysis of the data, errors due to discretization of highly amplified, and therefore low true intensity, images are minimized.Note that this procedure makes the conversion to true reflectivity possible even for datasets with no mirror mode in the dataset itself, such as dark field measurements.

Comparison of results
Spectroscopic LEEM-I(V) curves on bilayer graphene on silicon carbide are measured both with constant MCP bias and with adaptive MCP bias as described above.A comparison between the resulting curves is shown in Fig. 4.While the regular, constant MCP, curve starts to lose detail around E 0 = 50 eV, i.e. after a factor of 100 decrease in signal, the adaptive measurement captures intensity variations in the spectrum almost 4 orders of magnitude lower than the initial intensity.We thus call the adaptive method high dynamic range (HDR) imaging.

Drift Correction by image registration
In LEEM imaging, the position of the image on the detector tends to shift during measurement as shown in Fig. 1(a).This prevents per-location interpretation of the data, both for spectroscopic measurements and measurements with varying temperature.Although the shift can be minimized by precise alignment of the system, we find that a significant shift always remains, especially in tilted illumination experiments such as DF-LEEM or angle-resolved reflected-electron spectroscopy (ARRES) [4,27,2], which makes the compensation of this image drift necessary.
This problem has been studied in depth in the field of image registration, motivated by wide-ranging applications such as stabilization of conventional video, combination of satellite imagery and medical imaging [30,31,32].Techniques generally rely on defining a measure of similarity between a template and other images, either by some form of cross correlation, or by identifying specific matching features in both.The image is then deformed by a fixed set of transformations (either affine, i.e. purely shifts and rotations or non-rigid, i.e. additional deformation), until the match between the features in the images and the features in the template is maximal.For LEEM data, the measurement drift is almost completely described by inplane shifts, significantly reducing the space of expected transformations.A common approach in this case is to use the (two-dimensional) cross-correlation as a measure of similarity between two shifted images and to find the maximum for all images compared to a template, as the location of maximum of the cross-correlation corresponds directly to the shift between the image and the used template.
The cross correlation of two n×n pixels images I 1 (x, y) and I 2 (x, y) is defined as follows: where the coordinates can be wrapped around, i.e. all spatial coordinates are modulo n.Furthermore, we can relate this to the convolution operation (denoted as •): ) Using this, the cross correlation can be expressed in terms of (two-dimensional) Fourier transforms F: Where F(I 2 ) denotes the complex conjugate of the Fourier transform of image 2. This makes the cross-correlation extra suitable as a measure of similarity, since it can be computed efficiently using the two-dimensional Fast Fourier Transform (FFT).Determining the local maximum of the cross-correlation yields the integer shift for which the two input images are most similar, with the height of the maximum an indication of the quality of the match.To further increase accuracy, several variants, such as gradient crosscorrelation and phase-shift cross-correlation, have been shown to achieve sub-pixel accuracy for pairs of images [30,33,34,35].For LEEM data however, the straightforward crosscorrelation approach is often hindered by the physics underlying the electron spectra, resulting in contrast changes (c.f.Fig. 5(a,b)) and even inversions for different values of E 0 .The problem can be slightly alleviated by using multiple templates, but in general this approach is unsatisfactory.Instead we present another approach here: We first apply digital filters and then compare each image to all other images, similar to the algorithm by Schaffer et al. for energy filtered transmission electron microscopy [36].It then uses a statistical average of the found integer shifts between all pairs of images to achieve sub-pixel accuracy.
We analyze the accuracy of this algorithm using an artificial test dataset and show that the accompanying Python implementation [37] is fast enough to process stacks of hundreds of images in mere minutes by performing benchmarks on a real dataset, followed by a discussion of the algorithm and results.
The algorithm consists of the following steps: 1. Select an area of each of the (detector-corrected) N images, suitably sized for FFTs (i.e.preferably 2 n × 2 n pixels).

Accuracy testing
To validate and benchmark the accuracy of the drift correction algorithm beyond visual inspection of resulting drift corrected datasets, an artificial test dataset with known shifts was created.This enables exact comparison of results to a 'true' drift.The test dataset, as shown in Fig. 7, consists of N = 100 copies of an annulus of intensity 1.0 on a background of 0.0.The dataset is shifted over a parabolic shift in the x-direction and random shifts uniformly chosen from the interval [−0.5, 0.5] pixels in the y direction (see Fig. 7(a)).Finally pixel-wise Gaussian (pseudo-) random noise is added to all images.The standard deviation A of the added random noise is then varied to simulate images with different signal-to-noise ratios (SNR).
The resulting maximum error in the found shift compared to the original, 'true' shift, as well as the resulting mean error for different values of A and σ is shown in Fig. 8, separately for the x and y directions.These results verify that, at least for synthetic datasets, the algorithm achieves sub-pixel accuracy, with the mean absolute error in pixels of about 0.1 times the relative noise amplitude A for the optimal value of smoothing σ, and the maximum absolute error just reaching 0.5 pixel for the extreme value of A = 2.As expected, the error is strictly increasing for decreasing SNR, i.e. increasing A. After an initial cutoff, visible in saturated yellow in Fig. 8, the accuracy of the algorithm is also generally decreasing for increasing smoothing width σ.However, after this initial cutoff, there is a comfortably large range of σ where the algorithm performs well.
We found that most features visible in the high-σ, high-A regime of Fig. 8 (b,d) are dependent on the initialization of the random generator for the added pixel-wise noise and are thus not significant (c.f. a second run in Supplementary Fig. A.13).

Time complexity
To benchmark the computational complexity of the algorithm, it was applied to subsequently larger parts of the real dataset, while measuring the computation times for the least squares optimization (step 8 above) and the shifting and writing of images (step 9) separately.The results show calculating the cross-correlations takes the most time, as it scales almost perfectly quadratically in the number of images N , as shown in Fig. 9.The shifting and writing of images scales linearly and is not significant for larger datasets.The total time therefore scales nearly perfectly quadratically, with a dataset of 500 images drift corrected in less than 7 minutes of computation time (exact details of the hardware and software used for benchmarking can be found in the Supplemental Information).As such, LEEM spectroscopy datasets can be comfortably and regularly drift corrected on a desktop PC. Figure 9: Run times of different phases of the drift correction algorithm on 256 × 256 pixel images.Linear (green) and quadratic (red) slopes are added as a guide to the eye, illustrating that the cross correlations scales quadratically in the number of images N , while the shifting and saving of images itself linear.

Discussion
We elaborate here on the choices made in the algorithm.The use of the magnitude Sobel filter has multiple benefits, similar to using the gradient cross-correlation: Contrast inversions between areas with different spectroscopic properties nonetheless result in similar images (c.f.Fig. 5(c,d)).In addition, the constant zero background reduces errors due to wrap-around effects due to performing the calculation in the Fourier domain.
The exponent 4 for the weighing matrix W ij in the least squares minimization step 8 was empirically found to give the best results for real datasets.

As already noted by Schaffer et al., the use of crosscorrelation between multiple image pairs and combining the returned integer shift values enables sub-pixel accuracy. The maximum theoretical accuracy is 1
N pixels, but is reduced for pixels where false positive matches are thresholded out.
Alternative methods of obtaining sub-pixel accuracy in determining shifts include a combination of upscaling and matrix-multiplication discrete cosine transforms [34] and a rather elegant interpolation of the phase cross-correlation method proposed by Foroosh et al. [30].However our current method is less complex and combines robustness against global drift with handling of changing contrasts, which is crucial for spectroscopic LEEM data.Although the sub-pixel precise phase correlation method seems a straightforward extension of regular cross correlation, it is less suitable for datasets with changing noise levels and not suitable for false positive detection by normalization, both properties we found essential to handle spectroscopic LEEM data.
Contrary to Schaffer et al., we found smaller values of Gaussian smoothing width σ yield the best results, with larger values yielding artificial shifts around contrast inversions for real datasets and generally performing worse for the synthetic dataset, as visible in Fig. 8.
Schaffer et al. found their approach at the time ( 2004) not computationally feasible for large amounts of images but, as shown in the previous section, the current implementation is able to drift correct a stack of several hundreds of images comfortably on a single desktop computer.We want to emphasize that the use of Python gives flexibility and makes it easy to adapt the code.For instance, increasing performance even more lies within reach by performing the FFT cross-correlations and maximum search on one or more graphical processing units using one of several libraries or by using a cluster running a dask scheduler.Further speedup would be possible by pruning which pairs of images are to be cross-correlated.An avenue not explored here, is the use of pyramid methods to create a multi-step routine where firstly a fast estimate of the shift is computed on a smoothened and reduced-size image before using consecutively larger images to refine the estimate [38,39].
Beyond drift correction, the same method presented here can also be applied to create precisely stitched overview images of areas much larger than the electron beam spot size.Although, as no contrast inversions or large feature differences are expected for the matching areas, the added value of using a gradient filter is nullified.Additionally the number of images that can be matched to the same template is limited, forcing a low upper bound on the subpixel accuracy of the optimization part of the algorithm.Instead, we found that an algorithm based on more regular phase-shift cross-correlation is sufficient for sub-pixel accurate stitching.

Dimension reduction
The sub-pixel accuracy drift correction now makes it possible to reinterpret a LEEM-I(V) dataset as a truly per-pixel set of spectroscopic curves, opening up possibilities for further data analysis.For a dataset of N images, each such curve (c.f.Fig. 1) can be seen as a vector in a N -dimensional vector space of the mathematically possible spectra.Even for moderate datasets of a few hundred images this is a huge vector space.In almost all cases however, the physical behavior of the data can be described with a model with far fewer degrees of freedom, i.e. the vector space of physically possible spectra has a much lower number of dimensions.Therefore, it should be possible to summarize all significant behavior in a much smaller dataset, which can be analyzed (and visualized) much more easily.
Here, we use Principal Component Analysis (PCA), a linear technique based on singular value decomposition (SVD), often used for dimension reduction in data science fields [40,41,42].The randomized iterative variant of PCA allows for efficient computation of the largest variance components without performing the full SVD decomposition [43].This technique therefore projects the spectroscopic data to a lower dimensional subspace, in such a way that maximal data variance is retained.It does so in a computationally efficient way, making it well suited for, and popular in, data science.Before applying PCA, we crop the dataset to remove any areas that lie outside of the detector for any image inside the used range of E 0 .Additionally, each image is scaled to zero mean and unit variance, to not let brighter images contribute stronger to the analysis as they have larger variance.A lot of other choices for standardization of the data are possible, most of them with useful results, but for the scope of this paper we adhere to this standard choice.
After performing PCA, the lower dimensional subspace or PCA-space, is now spanned by orthogonal 'eigen-spectra', referred to as PCA components.Since the projection map onto this subspace retains most of the variance in the dataset, it is possible to build an approximate reconstruction of the full physical spectra from the reduced PCA spectra.
For spectroscopic LEEM datasets, we find that reducing down to 6 dimensions is often enough to capture more than 90% of all variance.This is shown in a so-called scree plot in Fig. 10(a) for the sample dataset of dark-field images of N = 300 energies.The dataset can be projected onto a single PCA component by taking the per-pixel inner product with the corresponding 'eigen-spectrum'.This yields images visualizing the variance retained by the respective components, as shown in the top half of Fig. 10(b) for all 6 PCA components.Below each image, the spectra corresponding to the pixels with the minimum and maximum value of this projection are shown in black and color, respectively.

Visualization
Reducing a spectrum from hundreds of dimensions to a few opens up new opportunities for data visualization.In particular, it allows for the visualization of nearly all of the variation in spectra of an entire dataset in only two images, as shown in Fig. 11.Here, the values of the six principal components (c.f.Fig. 10), are displayed as the RGB color channels of two pictures per dataset.To lift the degener-acy in the possibilities of the sign of the PCA components (a PCA eigen-spectrum with the opposite signs retains as much variance), we change the signs such that the a positive projection onto the PCA component corresponds to being brighter in the majority of the images.This way, areas that are bright in the majority of the original images also appear bright in the visualization.To compensate for the human eyes' preference for green, a scaling of colors as proposed by Kovesi is applied [44].It is given by the following matrix: scenario due to its extreme off-axis alignment (see [2] for full details), which causes strong image drift and relative shifts of features.

Clustering and automatic classification
In addition to the visualization possibilities explored in the previous section, the dimension reduction by PCA lowers the complexity of the data enough to enable the use of other, more quantitative data analysis techniques.In particular, reduction to less than ten dimensions is enough to perform unsupervised classification or clustering on the entire dataset.Here, we show that a relatively simple clustering algorithm, the classical k-means, also known as Lloyd's algorithm [45], applied to the PCA reduced dataset, can already be used to distinguish the relevant, different areas.The structure of the bright field dataset is visualized in terms of the first three PCA components in Fig. 12(a), both as a point cloud with colors corresponding to Fig. 11(a) and as density projections (gray scale) on the three planes.The resulting classification from the application of k-means to the six PCA components is visualized in the same way in Fig. 12(b), where the color of the points now corresponds to the assigned labels.These same label colors are shown in real space in Fig. 12(c).In the real space visualization it is clear that the different layer counts are separated (bilayer, trilayer and four-layer as purple, orange and red, respectively) from a class with the point defects and step edges (blue) and a class containing the domain boundaries in the bilayer (green).The cluster labels can now be used to calculate spectra of each area, e.g.all trilayer pixels without the defects.For this, we take the mean over all pixels belonging to one cluster for each energy.This can be done even for energies outside the range used for the initial clustering as well as for energies where we only have partial data due to drift.The resulting spectra for the clustering in Fig. 12(c) are plotted in Fig. 12(d).
The same clustering method is applied only to the first 4 PCA components of the dark-field dataset since component 5 and 6 show virtually no distinguishing features and corresponds to very little variance [cf.12(e) and (f), respectively.Here, although not perfect, the clustering algorithm manages to mostly separate the different possible stacking orders (red and orange for the bilayer and purple, brown, pink and blue for the trilayer).The green and gray areas correspond to areas where clear classification as a stacking order is not possible due to phase contrast and non-uniform illumination artifacts due to the tilted illumination.
Thanks to the proper calibration and mutual registration of the data, this relatively simple algorithm classifies the areas in the dark-field data set with only minor errors (e.g. the incorrect assignment of brown trilayer spectrum in the lower left), without any input of the positions of each spectrum in the image or any input about the expected differences between spectra.We anticipate that this classification using unsupervised machine learning will be useful for identifying unknown spectra in new datasets.

Conclusion
We have shown that treating (energy-dependent) LEEM measurements as multi-dimensional datasets rather than as collection of images, opens rich opportunities for detailed and quantitative insights into complex material systems that go well beyond morphological and crystallographic characterization.
Three key steps are necessary to convert a stack of raw LEEM images into spectroscopic dataset with a greatly increased body of quantitative information.First, we compensate for common detector artifacts such as camera dark count and non-uniform detector gain, which is crucial to quantitatively interpret LEEM images.Second, by calibrating the channel plate gain and adjusting it during spectroscopic measurements, we can not only extend the dynamic range of the dataset by two orders of magnitude, but also convert image intensity into absolute reflectivity or electron intensity (provided the beam current is accurately measured).Third, we describe a drift-correction algorithm that is tailored for spectroscopic LEEM datasets where contrast inversions make many other approaches unfeasible.It relies on digital filtering and cross-correlation of every image to all other images and, without requiring large computation times, yields sub-pixel accuracy.It thus produces spectral LEEM data with high spatial resolution, i.e., true pixel-wise spectra.
This suite of techniques is already in regular use to obtain data from the ESCHER system in Leiden [9,11,10,5,46,2].In addition, the resulting spectral datasets enable more sophisticated data classification and visualization methods that rely on the spectrum (I(V)-curve) in every pixel.We demonstrate how we can use dimension reduction on the spectra to automatically compose images from only the six strongest spectroscopic features (PCA components).This approach produces rich color images that capture most of the features of the dataset and can thus give an intuitive view on complex material systems.Furthermore, we show that a relatively simple cluster analysis on those data sets of reduced dimensionality yields a quantitative representation of this information.Different materials within a field of view are automatically identified and statistical information such as the mean spectra and their spread per material can be extracted.
Treating LEEM measurements as multidimensional datasets as presented here will further strengthen the role of LEEM as a quantitative spectroscopic tool rather than as a pure imaging instrument, thus deepening its impact in the research and discovery of novel material systems.Furthermore, the presented techniques can be applied to related spectroscopic imaging techniques, such as energy-filtered PEEM [47] or even adapted for use in scanning probe techniques such as scanning tunneling spectroscopy [41,48].To facilitate the use of the approaches discussed here , the test data as well as Python code is available online [? ?].

NFigure 1 :
Figure 1: (a) A stack of raw LEEM images where images are shifted with respect to each other due to experimental drift.(b) Drift correction aligns features in the images to compensate.(c) Spectra corresponding to pixels indicated in (b).

Figure 2 :
Figure 2: (a) Dark Count image taken on ESCHER averaged over 19 × 16 images with 250 ms exposure time.(b) Flat field image with visible edges of the round microchannel plate and damaged areas (arrow).(c) Uncorrected bright-field LEEM image from the sample dataset.The field of view corresponds to 3.5 µm.(d) Dark count and flat field corrected version of the image in (c).(e) Line cut through the raw image (c) and the corrected image (d) shown in red and green, respectively.Note that the dip due to MCP damage at y = 140 (arrow) is removed and the profiles for similar areas are flattened.

Figure 3 :
Figure 3: (a) Calibration curves as measured with 16×250 ms exposure time per image measured on graphene on SiC.(b) Calibration curves corrected for dark count.(c) Calibration curves with matched intensity and normalized by joint curve fit of Eq. (3) and resulting best fit (black line).(d) Residuals of the joint fit in (c).

Figure 4 :
Figure 4: Regular spectroscopic reflectivity curve (orange) of bilayer graphene on SiC, corrected for dark count and flat field, but with a single setting of VMCP (top panel).The HDR measurement of the same area with active MCP bias tuning (blue) can resolve details down to lower intensity.

Figure 5 :
Figure 5: (a,b) Two bright-field LEEM images of few-layer graphene obtained at different E0.(c,d) Their Gaussian and Sobel-filtered versions with a Gaussian standard deviation of 3 pixels highlights the edges and erases the contrast inversion.(e) The cross-correlation of the filtered images exhibits a clear maximum.Its position compared to zero (white lines) corresponds to the relative shift of the images.

2 . 7 .
Apply Gaussian smoothing with standard deviation of σ pixels to reduce Poissonian noise in the images.3. Apply a (magnitude) Sobel filter to highlight edges only, as shown in Fig. 5(c) and (d).As such, images with inverted contrast (c.f.Fig. 5(a) and (b)) become similar to each other.4. Using Eq. (6), compute the cross-correlation, as shown in Fig. 5(e).Do this for all pairs (i, j) of images.5. Compute the location (DX, DY ) ij and value W ij of the maximum of the cross-correlation for all image pairs (i, j).DX ij and DY ij form the anti-symmetric matrices of found relative shifts in either direction, while W ij is a symmetric matrix of weights of the found matches, as shown in Fig. 6. 6. Normalize the maximum values W ij to be used as weights in step 8: W ij = Pick a threshold W min to remove any false positive matches between images.A threshold of W min = 0.15, based on DX, DY and W ij , is shown in Fig. 6 as gray shading.Set W ij = 0 for all W ij < W min .8. To reduce the N 2 relative shifts DX to a length N vector of horizontal shifts dx, minimize the errors (dx i − dx j − DX ij ) W 4 ij (using least squares).Do the same with DY to obtain the vertical shift vector dy.9. Apply these found shifts dx and dy to the original detector corrected images, interpolating (either bilinearly or via Fourier) for non-integer shifts.

Figure 6 :
Figure 6: Calculated shift matrices DX and DY and weight matrix W ij for the bright-field dataset.Matches of a weight below Wmin = 0.15 (shaded in gray) are mostly false positives.Consequently, they are set to zero weight in the algorithm.

Figure 7 :Figure 8 :
Figure 7: (a) Image shifts in x and y used for the synthetic dataset.(b) Image 0 of the synthetic dataset with Gaussian noise with standard deviation of A = 1.0.(c) Gaussian-and Sobel-filtered version of (b), with a Gaussian filter width of 5 pixels, highlighting edges.

Figure 10 :
Figure 10: (a) Scree plot indicating the retained variance per PCA component for the Dark Field dataset.(b) Images of the first six PCA components for the Dark Field sample dataset and the spectra corresponding to the maximum and minimum of the respective components occurring in the dataset.

Figure 11 :
Figure 11: The first six PCA components can be used to summarize a spectroscopic LEEM measurement in two RGB pictures.(a,b) Visualization of a spectroscopic bright-field LEEM measurement of quasi-freestanding few-layer graphene on SiC.Different layer counts, stacking boundaries of two types and point defects are distinguishable.(c,d) Visualization of a spectroscopic dark-field LEEM measurement of the same area.All six different possible stacking orders for up to trilayer graphene are easily distinguishable.

PFigure 12 :
Figure 12: (a) Bright-field dataset visualized as point cloud in the space of the first three PCA components.The points are colored according to the mapping in Fig. 11(a) and are projected onto the planes in gray scale.(b) Point cloud as in subfigure (a), but colored according to the computed clustering.(c) Indication of the cluster labels in the real-space image.(d) Mean bright-field spectroscopy curves for each cluster, automatically recovering layer count and domain walls.(e,f ) Same as subfigures (c,d) for the dark-field dataset reveals all stacking orders present as well as two sets of edge-case curves.C.f. Fig. 2(c,d) of Ref. [2].
Fig. A.14(a,b)].The resulting real space labeling of the clusters and the spec-tra are shown in Fig.

Figure A. 13 :
Figure A.13: Maximum and mean error in x and y shift as calculated by the N 2 algorithm, with a different random seed compared to Fig. 8.The optimal value of the Gaussian smoothing s as a function of added noise amplitude A is drawn in white.Contour lines are added as a guide to the eye.

Figure A. 14 :
Figure A.14: (a) Images of the first six PCA components for the Bright Field sample data set and the spectra corresponding to the maximum and minimum of this component occurring in the dataset.