Undersampling Raster Scans in Spectromicroscopy for reduced dose and faster measurements

Combinations of spectroscopic analysis and microscopic techniques are used across many disciplines of scientific research, including material science, chemistry and biology. X-ray spectromicroscopy, in particular, is a powerful tool used for studying chemical state distributions at the micro and nano scales. With the beam fixed, a specimen is typically rastered through the probe with continuous motion and a range of multimodal data is collected at fixed time intervals. The application of this technique is limited in some areas due to: long scanning times to collect the data, either because of the area/volume under study or the compositional properties of the specimen; and material degradation due to the dose absorbed during the measurement. In this work, we propose a novel approach for reducing the dose and scanning times by undersampling the raster data. This is achieved by skipping rows within scans and reconstructing the x-ray spectromicroscopic measurements using low-rank matrix completion. The new method is robust and allows for x 5-6 reduction in sampling. Experimental results obtained on real data are illustrated.


Introduction
X-ray spectromicroscopy combines x-ray spectroscopy and x-ray microscopy to map changes in chemical state across a specimen on the micro and nano scales.These techniques have been broadly applied to problems across life and physical sciences, such as chemical engineering [1], material science [2], and biology [3].A spectromicroscopy experiment involves measuring an  1 ×  2 grid at   energy levels across the absorption edge of an element of interest.The resulting  1 ×  2 spectra represents a big data challenge; to extract meaningful information they are typically analyzed using PCA and cluster analysis to reduce to a mapping of representative spectra, or to a low-rank representation of the data.
While the technique has been generally successful, the application of spectromicroscopy to in-situ studies and in areas of soft matter or biological materials is limited by two main factors.First, the total experiment time required to collect the data to a given statistical significance; second, the total radiation dose over the collection and any resulting damage to the object, or changes to the chemical state, that may occur as a result.The issue of damage due to dose and long collection times occurs across both x-ray and electron optical systems.
To alleviate this issue, a variety of approaches to reduce the number of samples has been proposed.In electron tomography, compressed sensing schemes have been investigated to solve the missing wedge problem, or to reduce the number of angles used [4][5][6].Random sampling or jittered row sampling has also been used with in-painting to reduce dose and experiment time [7,8].In the x-ray regime, sparse studies are limited, but a low-rank matrix decomposition approach using PCA analysis of spectrotomography datasets has also recently been demonstrated, which merges the angular and energy measurements to reduce overall measurement time [9].
Low-rank matrix completion is a well-known inverse problem that was widely used in sensor networks [10], computer vision [11] and medical imaging [12,13].See e.g.[14] for a review of those and further inverse problems.The classical formulation of matrix completion [15] aims to recover the missing elements of a low-rank matrix from an incomplete set of known entries, each of which is sampled independently at random (for example, uniformly).However, there are important differences between x-ray and electron systems that make independent sampling unattractive for the reduction of the acquisition time in x-ray spectromicroscopy.An electron beam is rapidly moved across the specimen and can be blanked electrostatically at high rate to control dose, and recreate random patterns.In contrast, an x-ray beam is fixed, and the specimen is moved mechanically instead.Moreover, a mechanical chopping of the x-ray beam is needed to recreate sampling patterns.The mechanical operations in the x-ray regime limit the translation of some of these schemes from equivalent electron experiments.
The use of non-uniform sampling patterns in matrix completion has been widely studied, with the main goal of reducing the approximation error.Non-uniform sampling methods include adaptive cross interpolation [16], maximizing the volume of sampled submatrices [17], adaptive importance sampling using leverage scores [18] or application-oriented distributions [19], and supervised learning using a training dataset [13].However, all of these methods still assume some arbitrary control over the sampling pattern rather than conventional acquisition patterns such as a raster scan.
In this paper we propose a novel approach for undersampling and reconstructing low-rank x-ray spectromicroscopy data that uses the raster sampling pattern.Our motivation with this work is to deliver a solution that can be deployed routinely at spectromicroscopy facilities by non-experts.This requires an approach where no intervention or tuning is needed to produce results.The method should also work with the standard raster acquisition approaches at these facilities, in particular our initial experimental focus was on optimizing fast low-dose in-situ experiments to study the evolution of battery materials over time.Scanning along a line can be carried out with a faster mechanical movement, hence the time per pixel can be reduced compared to independent or adaptive sampling.In addition, we can lower the x-ray dose on the specimen without compromising the recovery of missing entries: we develop a procedure to generate a robust raster pattern such that each row and column of the matrix has at least one sampled element.
Besides the sampling pattern, the performance of matrix completion depends on the cost function.In addition to the squared misfit of the sampled elements, the cost function may include regularization terms, promoting sparsity [13] or smoothness [20].Alternatively, the entire completion problem can be turned into a Bayesian inference problem by introducing a prior distribution on the low-rank matrix factors, treated as random matrices, and a likelihood of the observed data samples [21,22].As a by-product, sparsity-promoting priors may provide an automatic selection of the rank as the number of nonzero posterior components [23].However, mathematical study of spectromicroscopy is still in some infancy, and lacks well-recognised priors.The only regularization assumption that is generally valid is the low-rankness of the true absorption distribution.Therefore, we start with a simple Alternating Steepest Descent algorithm (ASD) [24] with the squared misfit cost only.For a reliable selection of the rank and number of samples, we propose an algorithm, LoopedASD, which successively increases the rank from 1 to some generous value (e.g.20), taking a lower-rank result as the initial guess in each step.This provides a smoother convergence which allows the KNEEDLE [25] algorithm (already used in spectromicroscopy for the PCA analysis) to determine the final rank accurately.
Five datasets were used in this study.The first three (labelled DS1, DS2, DS3) are full datasets that can be undersampled numerically after the experiment; the final two (labelled DS4, DS5) are measured using both full and sparse raster patterns to verify the experimental implementation.The sparse experimental measurements were conducted at a range of undersampling ratios (details on the different sampling approaches can be found in Section 3).The specimen were produced by mixing Fe 2 O 3 , Fe 3 O 4 and FeO powders.The powders were ground in a ball mill and then drop cast onto a silicon nitride membrane.The dimensions of each data set can be seen in Table 1.The pairs (DS1, DS2) and (DS4, DS5) are data derived from the same specimen but at different spatial dimensions, i.e.DS2 is DS1 focused on a smaller area, as is DS4 of DS5.We aim to illustrate results using all 5 datasets, but where space is restrictive we only show the 3 independent datasets: DS1, DS3, DS5.The paper is organised as follows.We begin in Section 2 with a description of the x-ray spectromicroscopy model, and derive its low rank nature.Next, we discuss general matrix completion and raster-aware sampling, which leads into our proposed matrix completion algorithm in Sections 3 & 4. Finally, in Sections 5 & 6, we discuss the results comparing the reconstructed sparse data against full data.

Low rank model
When the energy of a photon increases beyond the binding energy of a core electron, we see a sharp rise in a material's absorption -an absorption edge.The absorption coefficient, , will vary or be modulated by the local chemical environment.Measuring around the edge, x-ray near edge absorption spectroscopy (XANES) can be used as a fingerprinting tool to identify known standards or materials; investigation of the energy past the absorption edge, the Extended X-ray Absorption Fine Structure (EXAFS), can be used to extract information of nearest neighbour bonding and coordination.Further details regarding the different experimental setup and derivation of the following formula can be found in [26].
The variation of  with energy depends on how an element is chemically bonded and when there are more than one chemical states present the absorption coefficients sum linearly.For a mixture of  materials, the measured x-ray absorption (or optical density),  (), can be modeled as, where   () and   ,  = 1, ...,  are the distinct absorption coefficients ( −1 ), and thicknesses () of the  materials, respectively.With a focused x-ray beam, this experiment can be performed over  1 ×  2 positions, and at   distinct energies.By stacking each spatial scans, the distribution of x-ray absorption spectra can now be represented as a 3D tensor where (  )  1  2 is the thickness of material  at pixel (  1 ,  2 ), and   (  ) is the absorption coefficient of material  at the  − ℎ energy level.This dataset can be flattened by considering a matrix  ∈ R   ×  , such that where  = 1, ..., ,  =  1  2 indexes over all pixels, and  ∈ N (0,  2 ) represents Gaussian noise with standard deviation  ∈ R. With a slight abuse of notation, consider the matrices  ∈ R   × ,   =   (  ) and  ∈ R ×  ; the columns of  represent the absorption coefficients of each material within the specimen, and the rows of  represent the thickness/presence of each material within each pixel.This allows us to write Eq. ( 3) as This is significant, as it illustrates that for the majority of experiments, where the specimen is made up of only a few materials or chemical states ( is low), the corresponding spectromicroscopy data is inherently low rank, as rank() = .With the addition of noise,  is approximately low rank.
To analyse the spatial distribution of the absorption spectra, standard techniques are used to filter and decompose  back to a smaller set of representative spectra; see, for instance, [27].First, PCA is applied to reduce the noise in the data by producing a rank- approximation of the most significant components.We compute with The variable  is chosen to capture as much variation in the data as possible with the smallest rank, and should approximate : it is typically set to be the elbow point (point of maximum curvature) of the singular values of , and can be selected automatically using algorithms like KNEEDLE [25].
The decomposition in Eq. ( 5) is similar to the model in Eq. (3), however the columns of  ′ are abstract spectra -linear combinations of the true spectra with no physical interpretation.Similarly, the rows of  ′ are the corresponding abstract thickness maps.Pixels are now clustered together based on the similarity of the normalised mixing factors of the PCA components.This is achieved by clustering the columns of  ′ using standard clustering algorithms such as kmeans [28] and lvq [29].Taking the mean spectrum from the columns of  for each cluster increases the signal to noise ratio when compared to the measurements from an individual pixel, and produces accurate x-ray absorption spectra for the dominant material in each cluster.
In [27], it is noted that the first principal component from the PCA (the component with the greatest variation in the data) often describes the average x-ray absorption data across the whole specimen.In some cases the first principal component is discarded, or scaled down, and cluster analysis is only applied to the remaining components.This is done to emphasise the more subtle features and variations in the data and ensure the clustering results are determined by differences in materials not the thickness of the specimen.In this paper, we will refer to the process of discarding the first principal component as Reduction of Thickness Effect (RTE), which is achieved by simply removing the first row from  ′ in Eq. ( 5) before applying kmeans to its columns.Generally, the comparison between reconstructed and full data are worse when RTE is used, and they have been applied to several tests to provide the worst-case results.Any cluster results that have used RTE will be noted clearly.Further details on PCA, cluster analysis and RTE can be found in the supplemental document.
Figure 1 shows a schematic of the sparse spectromicroscopy process.In our proposed scheme, we measure only a small proportion of the data, and recover the missing entries later using low rank Fig. 1.Schematics of Sparse Spectromicroscopy using DS2.The original specimen contains a mixture of two materials: FeO in the blue region and  2  3 in the red region over the background (green region).The experimental setup is as follows: (A) a third/fourth generation synchrotron light source produces a beamline of energy   .(B) the intensity of the incident beam is measured.(C) the light is focused using an optic such as a zone plate or Kirkpatrick-Baez mirror to produce a micro or nanoscale beam.(D) The specimen is mounted on the stage, (E) which is able to move the specimen across in the XY plane to measure at different positions; the specimen can also be moved in the Z direction to bring the sample to the focus of the beam.Depending on the material and it's thickness, we measure either the transmitted flux, (F), or the fluoresced flux, (G), by counting photons using different detectors (H).The specimen is moved in a raster pattern; for sparse experiments we only sample  1 rows for each energy.The processed results for each energy are stacked as seen in the diagram.The tensor is flattened by concatenating slices of the sparse data.The missing entries are recovered using LoopedASD -our low rank matrix completion algorithm.PCA is applied to the completed data to produce the most significant components in the datathe eigenspectra and (reshaped) eigenimages seen in the schematic.The results of the cluster analysis with 5 clusters show similar results to the original specimen, with the locations of the two materials identified, and accurate absorption coefficients for each.matrix completion.The remaining steps are consistent with a standard x-ray spectromicroscopy scheme.Precise details on the sampling are discussed in Section 3, and the completion methods are described in Section 4. The image illustrates the data acquisition, the formatting and flattening of the data tensor, the reconstruction of the sparse entries, and the PCA and cluster analysis.

Raster-aware sampling patterns
To formulate the sampling and reconstruction of x-ray spectromicroscopy data, we set out the following notation.Let Ω ⊂ {1, ...,   } × {1, ...,  } be the set of known measurements, called the sampling pattern.For a matrix  ∈ R   ×  , we define the sampling operator P Ω as Alternatively, the sampling pattern can be thought of as a binary matrix Ω ∈ {0, 1}   ×  with 1s in the locations of the known entries, and 0s everywhere else.It is then easy to compute P Ω () = Ω • , where • is the Hadamard product.The key parameter for matrix completion is the undersampling ratio, , the proportion of known entries: The standard matrix completion problem involves computing a matrix of minimal rank such that the known entries, indexed by Ω, are equal to the given values.However, this problem is notoriously difficult, and many algorithms will instead seek to solve easier, but provably related, problems; see [30][31][32][33].
For this application, the presence of noise means the spectromicroscopy datasets are only approximately low rank -i.e.there is no exact low rank matrix that would perfectly match the subset of known entries P Ω ( ).A more useful and efficient approach to reconstruction is to fix the rank, , then solve an optimisation problem to find the best rank- approximation to the known entries.Thus, to reconstruct the sparse set of measurements, P Ω ( ), we seek to solve, min Here, we use the frobenius norm This approach is generally more effective (and far more computationally efficient) than attempting to solve the standard problem; we must, however, correctly input the completion rank , since completion algorithms work best when  is close to the approximate rank of the full dataset.More details on accurately setting  can be found in the supplementary material.

Setting the sampling pattern
In many applications of low rank matrix completion, it is impossible to predict which entries will be known.To model this, the known entries of the sampling pattern are typically set at random.Usually, Bernoulli sampling is used, where each entry is sampled i.i.d.(independent and identically distributed) with probability .Unlike other settings, in x-ray spectromicroscopy we have complete control over the data acquisition and can set the scanner to implement specific patterns.However, when formulating the sampling selection, we must consider the physical restrictions of the experiment.For x-ray spectromicroscopy, the specimen is on an XY stage and moves at constant velocity through the beam in a raster pattern -scanning across each row in turn before switching to the next energy level and repeating the spatial scan.To maximise the efficiency of the modelled sparse scans, we must ensure the known entries of the sampling pattern are collected together into spatial rows to be scanned.The scanner can then move between known entries, quickly passing over the empty rows.
To model the raster aligned patterns, we introduce Raster sampling, where spatial rows of the specimen are sampled i.i.d with probability .Illustrations of both Bernoulli and Raster sampling patterns on flattened data sets can be found in Figure 2, in which the known entries are highlighted yellow.Notice that the raster sampling pattern groups the known entries into the spatial rows of the specimen, which appear as short segments after the data is flattened.
Data that has been sampled with a Bernoulli pattern generally allows completion at lower undersampling ratios, because the known entries are spread more evenly.In practice, undersampling using Bernoulli sampling is possible, however it is only preferable under certain circumstances.Sampling patterns like these can be implemented for x-ray spectromicroscopy in two ways: either the specimen is moved through every position while rapidly blanking the beam, or the specimen is only positioned at the location of each known entry using stop-start motion.The former is difficult to implement and doesn't reduce the experiment time, however may be useful for dose reduction.Conversely, despite the potential use of lower undersampling ratios, stop-start motion is generally less efficient than the continuous motion used for raster sampling, resulting in longer experiments.Indeed, for typical dwell times of around 0.01, the time spent accelerating, decelerating and stabilising the scanner outweighs the time saved by scanning fewer entries.Despite this, Bernoulli sampling may be applicable for specimen that require longer dwell times (around 1), since reducing the number of known entries has a greater impact on the experiment.Developing methods for such specimen is left for further research.
One issue with Raster sampling is that the grouping of known entries increases the probability that certain rows are not scanned at all, especially for low undersampling ratios.Clearly, it is impossible to recover a row or column of the data set  (distinct from the spatial rows and columns of the specimen) that contains no known entries, so we must ensure that every spatial-row is measured at least once.By testing different sampling patterns, we have also seen that low rank completion is more reliable when the known entries are more evenly spread across the data.
To promote the spread of data, and ensure every row and column of  contains known entries, we have developed Robust Raster sampling.This is a variation of Raster sampling that ensures every row of the specimen is sampled exactly once before any row can be sampled a second time.Further details on Robust Raster Sampling can be found in the supplementary document.

Completion algorithm
To reconstruct undersampled data, we have developed a low rank matrix completion algorithm LoopedASD.This is a rank-incremental algorithm based on Alternating Steepest Descent (ASD) [24]; it uses the output of one ASD-completion as the input for the next higherrank completion.After testing several different options, [31][32][33], it was found that this algorithm provided more accurate and reliable results on both simulated data and raster-sampled spectromicroscopy data.
ASD is an iterative method that fixes the rank, , of its iterates by imposing the following decomposition: Thus, the optimisation of the problem in Eq. ( 8) now becomes, min ,  (,  ), where  (,  ) = We minimise the function  by alternately optimising the components  and  using steepest descent with exact step sizes.Indeed, by fixing one component the gradient of  can be computed easily.It is then possible to analytically compute the exact step size required to minimise  along the gradient direction.Once the factor has been updated, we alternate the fixed component and repeat the process, iterating until suitable stopping conditions are satisfied.Let  (,  ) be written as   () for fixed  and   ( ) for fixed ; the corresponding gradients are written ∇   () and ∇   ( ), and the exact step sizes are written   ,   .Beginning with random matrices  0 ∈ R   × ,  0 ∈ R  ×  , we implement ASD as follows, The gradients and step sizes are given below.Full derivations can be found in the supplemental documents: One advantage of ASD is that the residual (P Ω ( ) − P Ω ( )) can be easily updated between iterations, removing the need to compute the matrix product  for each iteration.Thus, the per iteration cost of ASD has leading order 8|Ω| (see [24]).A maximum number of iterations was set as a stopping condition, as well as a tolerance on the relative norm of the residual at each iteration, Once either stopping condition is satisfied, the two factors are output by the algorithm and are denoted  * and  * .
To evaluate the success of the completion algorithms, we compute the completion error, often denoted   .This is simply the relative norm of the difference between the true low rank matrix  and the output of the ASD algorithm  * =  *  * , defined as, A more detailed description of the algorithm can be found in the supplemental material.

Improving results with LoopedASD
Following rigorous testing of ASD for raster sampling, it was found that there exists an approximate optimal undersample ratio,  * , that depends linearly on, and is correlated with, the data's rank .For  >  * , the probability of a successful completion was almost certain, and the completion errors were consistently low.For  <  * , the completion errors increased and the completion rate (the proportion of tests that successfully recover data to a certain accuracy -usually 10 −4 ) decreased rapidly to zero.Intuitively, the dependence of  * on  makes sense: to reconstruct more complex datasets (with higher ranks), more known entries are required to successfully recover the missing ones.
LoopedASD was developed to take advantage of this heuristic: beginning with  = 1, we use the outputs of the  =  completion as initial guesses for the  =  + 1 completion, iterating up to the completion rank set by the user.
The aim of this variation is to allow the iterates to converge quickly to a rank-1 approximation of the sparse data, where fewer known entries are required.Once a rank- approximation is known, the distance to the rank-(  + 1) solution should be relatively low, and again fewer known entries are required to converge quickly to the next one.Ensuring iterates remain close to the minimum should also avoid convergence to a spurious local minima of  (which is non-convex).
The completion results of LoopedASD confirm the validity of this heuristic, as it yields good reconstructions more reliably at low undersampling ratios.
In addition to the usual stopping conditions, an early stopping procedure was required due to slow convergence rates when the iterates approached optimal solutions.In the case that the average change in the norm of the residual (Eq.( 15)) over 50 iterations drops below a second tolerance (typically 10 −5 ), then the early stopping condition is satisfied and will halt the algorithm.
It has been noted that the completion rank must be set before implementing ASD and LoopedASD.This, however, is not an obvious choice since we cannot evaluate the accuracy of the different rank- completions without prior knowledge of the results.To overcome this, we have developed a method that implements short completions over several ranks and evaluates the completion errors using cross validation so that we can efficiently identify the optimal completion rank to use.A full description and justification of the method can be found in the supplemental material.

Validation of the algorithm with numerical undersampling
We test the method in two distinct ways.First, we numerically sample full datasets.To produce these datasets, 'unknown' entries are set to zero according to a randomly generated robust raster sampling pattern.Then, in Section 6, we implement sparse scans on the beamline to determine if any additional issues arise and to test in a new setting outside of our test set.
To properly evaluate the performance of the methods described in this paper, several measures of success must be considered.We can compare the original and reconstructed datasets directly by computing the average completion error.Perhaps more significantly, we evaluate the difference between the cluster maps and the absorption spectra themselves.Finally, we can plot and visually compare the clusters and absorption spectra.Due to specimen drifts (described in Section 6), the first two 'computational' approaches can only be achieved using numerically sampled data; visual comparisons must be used when evaluating reconstructions using the sparse scan data.During these tests, we use the optimal completion rank  * determined by the rank selection algorithm described in the supplemental document to produce the completed matrix (  * ) ( * ) = ( * ) ( * ) ( * ) ( * ) . * is then used instead of the full data set  in the analytic processes (PCA and clustering).16)), and the minimum approximation error (see Eq. ( 17)) for that completion rank.

Completion errors of LoopedASD
We first examine the norm of the difference between data sets.When conducting experiments such as this, it is important to consider the constraints on the completion error,   (Eq.( 16)), for a rank- completion.
Consider a rank- approximation of a matrix , denoted  ( ) .We can compute the minimum approximation error of  ( ) in the Frobenius norm using the Eckart-Young-Mirsky (EYM) Theorem [34].We shall refer to this value as the minimal rank- approximation error, and it is given by: where  1 , ... ,   are the singular values (SVs) of .Since any completion result with a completion rank of  is just an example of a rank- approximation, the corresponding minimum approximation errors are lower bounds for the corresponding completion errors.Thus the success of any completion should be evaluated by comparing back to the corresponding minimum error.
LoopedASD produces successful completions for all datasets, and for undersampling ratios as low as  = 0.15.By examining the mean completion errors,   , one can identify the optimal undersampling ratio,  * .As before, for  <  * average completion errors rise quickly, and for  >  * average completion errors decrease slowly, reducing the benefit of taking further measurements.It was found that LoopedASD produces lower completion errors more reliably than ASD, in particular for undersampling ratios around  * .
To help with the visualization of the completion, we plot the flattened data sets themselves as colourmap images.Purple points indicate a value of zero, while bright yellow points indicate higher measured values.In Figure 3 we provide a visual representation of the sampling and reconstruction process at the optimal rank,  * , and with a consistent undersampling ratio of  = 0.20.Note that completions often appear smoother than the original data -since it is low rank, it will have filtered out much of the noise.
One advantage of ASD and LoopedASD is that the impact of any artifacts that have been sampled remain contained in their rows/columns in the dataset.In Figure 4 we illustrate this by intentionally sampling a corrupted entry in DS2.We can see that the majority of the image remains accurate and it is just the row and column that contained the original artefact that are affected.Indeed, it is very easy to identify the corrupted regions, which can be cut from the dataset.The example in Figure 4 initially had a completion error of 0.052, compared to the mean 0.33 (the corresponding minimum approximation error is 0.026).Once the affected entries are removed,   is computed as 0.047 providing a much stronger result.In practice this property is very useful, since it implies the process is robust against artefacts found in the data.

Impact of completion on cluster analysis
We now compare the results of the cluster analysis from the full data and the sparse data.In some sense, this comparison is more relevant than simply comparing the completion errors because the success of the new methods will be determined by the quality of the cluster maps produced by the reconstructions, not necessarily the abstract difference between the two datasets.
We can evaluate the similarity of clusters using the Adjusted Rand Index (ARI) [35].This is a symmetric score, with values between −1 and 1.An ARI score of 1 indicates clusters are a perfect match, while a score of 0 implies classifications have effectively been assigned at random.Generally we use the inverted score of (1 − ARI) to evaluate cluster quality so that 0 indicates a perfect match, and smaller scores are generally better.
On the other hand, we can compare individual absorption coefficients using the Euclidean 2-norm, but we must ensure that we are taking the difference between coefficients for equivalent materials/clusters.Suppose we have performed cluster analysis on both the full data set and the completed data with  cluster many cluster centres.Let   ∈ R   ×  cluster denote the absorption coefficients of the full data and   ∈ R   ×  cluster denote those from the completed data.In both cases, the   ℎ columns, (  )  , (  )  respectively, represent the absorption coefficient corresponding to the   ℎ cluster.Note that the   ℎ clusters from the full data may not correspond to the   ℎ cluster from the completed data.We now align the clusters by taking each full-data cluster and finding the completed cluster that best fits it (minimises (1 − ARI)); using this permutation index, we rearrange the columns of   to get the aligned absorption coefficients,   ∈ R   ×  cluster .We can now be sure the absorption coefficients represent the same area and the same material.We now wish to combine the relative differences of each cluster such that each cluster is weighted equally (not all absorption spectra are of the same magnitude) and is normalised for the number of clusters used.Thus, we compute the spectral difference,  spec , as: One complication is that cluster analysis is not deterministic, since there is some randomness in the starting vectors of the cluster centres.Because of this, we often find several common cluster maps resulting from the same dataset.This is true for both the full datasets and the reconstructed datasets, which can produce false ARI scores and spectral differences when opposing outputs are compared.To overcome this, we apply a clustering process to the clusters themselves -grouping similar maps together into 'Super Clusters'.Taking the mode within each super cluster produces a series of representative cluster maps that are produced by the original data.We can now compare the reconstructed cluster against each representative in turn, recording the highest score as the most appropriate comparison.
The spectromicroscopy datasets were sampled numerically, their optimal ranks were computed, and the data reconstructed using LoopedASD with early stopping.The completion ranks were set to be the optimal rank,  * , using the method described in the supplementary document.PCA is used to decompose the data into its most significant components (we set the number of components used  =  * , so the rank is consistent).Finally, we perform the cluster analysis using kmeans with 5 clusters and compute the associated absorption spectra.For DS1, DS2, DS3 we used RTE (dropping the first principal component before applying kmeans) to provide the worst-case results.Note that if RTE had not been used, both the ARI scores and the spectral difference would improve.For DS4 and DS5, we avoided using RTE due to the poorer quality of the clustering results for both the full and reconstructed data.When using RTE on DS4 and DS5, the clusters are not smooth, the images and spectra were noisy, and the outcomes were not representative of typical results.
Averaging over many iterations, we plot the clustering results for each dataset and for various undersampling ratios in Figure 5.In Figure 5a we plot log 10 (1 − ARI) against the undersampling ratio, and in Figure 5b we plot the log of the spectral difference,  10 ( spec ), against the undersampling ratio.The purpose of the plots is to illustrate the qualitative behaviour of the results, so have translated the individual plots vertically to improve clarity.The quantitative results can be found in Table 2.Here we see that, as we increase the undersampling ratio, the (1 − ARI) scores and spectral differences behave similarly to the completion error,   (Eq.( 16)).Indeed, by plotting the logs of these scores we see the similarly shaped curves -a faster decline for lower  that flattens for higher .Once again, we identify the optimal undersampling ratio,  * , at the elbow of the plots -the points of maximum curvature.It is clear from the plot where the majority of the optimal undersampling ratios lie, however for DS2 and DS4 the elbows occur at different points across the two plots ( = 0.16 for Figure 5a and  = 0.18 for Figure 5b).In these cases, we take the higher value to ensure enough entries are known for better reliability.For DS4 and DS5, the pattern in Figure 5b is less clear and appears more as a constant value.This is simply because the variation in spectral difference over this range of undersample ratios is much smaller than for the other datasets, and by appropriately scaling these plots one can see the same curve as before.
In Table 2, we record the optimal completion rank, the corresponding optimal undersampling ratio (found at the elbow of the results in Figure 5), the completion errors,   , and the cluster results produced by these parameters.In Figures 6 & 7, we visually compare the cluster maps and absorption spectra of the full data and the reconstructed data for DS1, DS2 and DS3 (DS4 and DS5 are used in Section 6 to illustrate the sparse scans).The data sets are sampled numerically with  = 0.20, the optimal completion rank was used, and each of these results uses RTE.These images were produced by Mantis X-ray [36], an open source software package used for analysing x-ray spectromicroscopy data.Fig. 7. Cluster and spectral results from reconstructions of sampled data, with  = 0.20, using RTE.Results computed using Mantis-xray [36].Please note that for DS1 and DS2 there is a sharp downward going discontinuity at around 7270 in both the full and completed results.This feature is present in the original data, and can be seen in the absorption spectra of many of the pixels with stronger signals.It is likely due to an error in the calibration of the energy stepping, which requires coordinated motion of an insertion device and monochromator, resulting in a strong variation in the intensity of the beam which was not normalised correctly.
We can see that both the cluster maps and absorption spectra are very similar between full and reconstructed data, with only some small (mostly insignificant) differences in each.The general The boundaries of the completion clusters lose some of their smoothness, but only a few isolated pixels have been misclassified with minimal impact to the overall image.Similarly the most important features of the absorption spectra have been preserved in the completions for DS1 and DS2, including the pre-edges, the peaks, and in particular the shifts in the absorption edges for each material.There are a few examples of noise affecting the data: DS3 is particular noisy, where for some clusters we see the sharp spikes indicative of noise.Despite this, the general outline is still consistent.Overall, the similarity of the clusters ensures the sparse data would be interpreted in the same way as the original data, and the absorption spectra are sufficiently clear to be able to identify the materials within each cluster.

Practical sparse scanning experiments
We now bring all of the above material together to implement sparse scanning in practice.Instead of numerically sampling a full dataset, we physically implemented a sparse robust raster sampling pattern.This was done on the same specimen, with the same pixel sizes and dwell times for  ∈ {0.15, 0.25, 0.35, 1}.Unfortunately, it is not possible to numerically compare the results of the data due to experimental variations between the known entries in each dataset.Small changes in the ambient temperature can cause the specimen to drift very small distances between each spatial scan during the experiment.Since measurements are being taken on the micro/nano-scopic scale, this effect can create pixel level change from start to end.This is usually compensated for during the stacking process, but sparse scans have a much shorter experimental time, and registration to correct drifts cannot be performed in the same way as full datasets.The intensity of the incident beam will also vary with time and, although this can be normalized, some variation can still occur.Because of these potential spatial discrepancies and changes in intensity and background noise, the same known entries across different scans will show different values.Thus, it is impossible to know whether the difference between the full and reconstructed data is because of differences in the measurements or due to the limitations of the completion algorithm.
Despite this, we can still visually compare the results.For each of our sparse scans (DS4 and DS5) and at each undersampling ratio, we use LoopedASD to reconstruct the data.We equip the algorithm with the optimal completion rank for the data set, described in the supplementary documents.We use Mantis X-ray to perform the cluster analysis, setting  =  * and using 5 clusters.Recall that RTE was not used for these results.In Figures 8 and 9, we plot these results.
Once again, despite a slight loss of smoothness around cluster boundaries and some individual miss-classified pixels, the overall structure of the cluster maps is near identical and clearly shows the same variations across the scans.Similarly, there are a few sharp discontinuities in the reconstructed absorption spectra, especially for lower undersample ratios, but each materials' XANES are still clearly identifiable.After extracting the timestamps from the files' meta data, we can compare the run time for each sparse experiment, which have been summarised in Table 3.
The true time efficiency will never exactly match the undersampling ratio used due to 'dead time' as the scanner resets and for critical machine processes.Despite this, we see huge gains for the lowest undersample ratios, and a significant increase in experimental efficiency.In particular, we see greater improvements for the larger region of DS5, due to the higher measurement to dead time ratio.Another benefit for scanning larger regions is that the completion algorithms become more efficient.Because it is approximately low rank, the total number of entries grows much faster than the number of degrees of freedom -i.e.larger datasets are recoverable at lower undersampling ratios.Thus, it is in the interest of researchers using this approach to scan larger areas to improve both the scanning efficiency, and the quality of reconstruction.

Conclusion
We have demonstrated a new undersampling approach for spectromicroscopy data acquisition.By taking advantage of the inherent low rank structure of the datasets, we have produced algorithms that can accurately recover unknown entries from as little as 15% of the measurements when raster sampling.We have illustrated the robustness of these algorithms to machine artefacts, and how to derive parameters like the completion rank from the sparse data itself.
Finally, we showed the minimal impact reconstructing sparse data has on the cluster analysis that is currently used to interpret the spectromicroscopy data.By implementing these methods, we can conduct experiments at a much faster rate, over larger areas, and with lower dose on the sample: the method consistently produces near-identical results using 20% of the measurements, with potential improvements to 15% for larger samples.Alternate sampling schemes, compatible with continuous motion scans, may be able to produce further reductions.These improvements will help in the development of in-situ spectro-microscopy measurements and future developments will reduce times in areas such as XANES nano-tomography and nano-EXAFS, which are currently limited in their application by long acquisition times.

Backmatter
Funding.This research is partially supported by a University of Bath scholarship via the EPSRC Centre for Doctoral Training in Statistical Applied Mathematics at Bath (SAMBa) under project EP/S022945/1.
This research is partially supported by Diamond Light Source Plc.Content in the funding section will be generated entirely from details submitted to Prism.Authors may add placeholder text in the manuscript to assess length, but any text added to this section in the manuscript will be replaced during production and will display official funder names along with any grant numbers provided.If additional details about a funder are required, they may be added to the Acknowledgments, even if this duplicates information in the funding section.See the example below in Acknowledgements.

PCA and RTE
We begin by describing the Principal Component Analysis (PCA) and cluster analysis of spectromicroscopy data in more detail.Additionally, we explain how Reduced Thickness Effects (RTE) are implemented.This is based on the methods outlined in [27].
Given our flattened x-ray spectromicroscopy dataset  ∈ R   ×  , we wish to separate the data into its principal components by imposing the following decomposition, with  ∈ R   ×  and  ∈ R   ×  .This can be done either by computing the covariance matrix of , or using the SVD (singular value decomposition).Both methods are equivalent; we shall describe the covariance approach here.First, we compute the spectral covariance matrix of ,  ∈ R   ×  , then perform an eigen-decomposition on  to get, where Λ ∈ R   ×  is a diagonal matrix of the eigenvalues of , and the columns of  ∈ R   ×  are the corresponding eigenvectors.Since the columns of  are orthogonal, its left inverse is equal to   .Note that we compute the spectral covariance since typically   << .
To complete the - decomposition of  in Eq. ( 19), we compute The columns of  and rows of  are the principal components of , and the corresponding eigenvalues in Λ describe their significance in the dataset .Additionally, the columns of  can be thought of as abstract absorption spectra and the rows of  as abstract spatial maps -linear combinations of true spectra and spatial maps, without any physical interpretation.
To compute the PCA's low rank approximation, we simply take the first  columns of  and the first  rows of , and discard the remaining data.Thus we get, where  ′ ∈ R   ×  ,  ′ =  (: , 1 : ) ∈ R   × and  ′ = (1 :  , :) ∈ R ×  .Note, we have used MATLAB-like notation to partition the rows and columns of the matrices.To maximise the variation of the approximation while minimising the rank,  is chosen to be at the elbow (point of maximum curvature) of the eigenvalues, Λ.Again, this is equivalent to setting  to be the elbow of the singular values of .
In Figure 10 we illustrate the eigenvalues of DS2, alongside the first 9 eigenspectra and eigenimages (columns of  and rows of  respectively).KNEEDLE, [25], was used and identified the elbow as  = 4.In Figures 10b & 10c, we see that  = 4 correctly determines the number of components that show significant variation and little noise: the first four spectra are much more smooth than the rest, and all eigenimages after the 4  ℎ show the 'salt and pepper' pattern that is indicative of noise.In practice we use  = 5 to ensure we capture all the variation in the data.
To identify the different materials in the specimen, we now use cluster analysis on the columns of  ′ to collect together pixels with similar combinations of abstract absorption spectra.The number of cluster centres must be set, and should be greater than, or equal to, the number of materials within the specimen.Several different cluster algorithms may be used, but we generally use a standard implementation of kmeans.Averaging the x-ray absorption spectrum over all the pixels in each cluster provides the corresponding absorption spectra of that material, with a much higher signal to noise ratio.As was explained in the main text, it can be seen that the first component is simply averaged over all pixels and is associated more with the overall thickness of the specimen [27].Occasionally, we wish to avoid the thickness of the specimen skewing the clustering results by reducing the effect of the first principal component.To implement reduced thickness effect (RTE), we simply ignore the most significant component and set  ′ =  (: , 2 : ) and  ′ = (2 :  , :) before applying the cluster analysis as usual.Note that other approaches are available, such as scaling the principal components.

Robust raster sampling
Given a dense dataset , we model the sparse scan P Ω ( ) by computing the following projection, where • is the Hadamard product, and Ω ∈ {0, 1}   ×  is the sampling pattern indicating the location of the known entries.We now discuss different methods of producing the randomised sampling patterns to be used in sparse scans.
The most common sampling model used is Bernoulli sampling, where each data-point is sampled i.i.d (independent and identically distributed) with probability .To improve the efficiency of sparse spectromicroscopy, we instead use Raster sampling, which scans physical rows of the specimen together i.i.d. with probability .This way, rows that aren't sampled can be passed over quickly.
One disadvantage is that, for low undersampling ratios, the probability that a row is not sampled at all increases sharply.
It is clear that it is impossible to recover pixels that are not sampled for any energy level using Low Rank Matrix Completion.In fact, it seems the more frequently a point is sampled, the easier it is to recover the surrounding missing entries (e.g. one pixel at different energy levels, or different pixels at one energy level).To ensure all rows of the specimen are scanned, and to promote a more even sampling pattern, we use a robust variation of Raster Sampling called Robust Raster Sampling.
Robust Raster Sampling can be implemented either by varying the probability of each row, or by discretely restricting which rows can be scanned.Here we describe the discrete method.
Begin with a set  = [ 1 ] = {1, ... ,  1 } that indexes the rows that have yet to be scanned (recall that  1 is the number of rows on the sample).Next, compute the expected number of rows to be sampled at each energy,  1 .For each energy level, sample  1 rows uniformly at random from , removing the sampled rows from .If  1 is not an integer, simply take the ceiling or floor at ranodm using the decimal as the probability.Repeat this process for each energy level   ,  = 1, ...,   .For energy levels where  1 > || (i.e.there are fewer than  1 rows remaining in ), we must first record the deficit  =  1 − || before sampling the last rows in .We now repopulate , withholding only the rows that have already been sampled on this energy level (to avoid sampling them twice).Finally, we sample the remaining  rows from the repopulated set  before moving to the next energy level.
In short, Robust Raster sampling ensures that every row is sampled once before any can be sampled twice.Additionally, be ensuring  1 rows are sampled for each energy level, the known entries are more evenly distributed across P Ω ( ).

Details on ASD
To recover the missing entries of P Ω ( ), ASD [24] imposes the decomposition  =   ,  ∈ R   × ,  ∈ R  × and seeks to minimise the objective function: This is done by alternately fixing one component and minimising the other using gradient descent with exact step sizes.For this simple procedure, the directions   ,   and step sizes   ,   are computed to give the next iterates: We write  (,  ) as   () when  is fixed, and   ( ) when  is fixed.Thus, we have gradients, Finally, we can compute the exact step sizes required for steepest descent.This is done by minimising,   () =   ( +   ), and   () =   ( +   ),

Fig. 3 .
Fig.3.Reconstructions of the three independent datasets, using LoopedASD as the completion algorithm.Each figure is a scaled colour image of the flattened dataset, with energy levels across the vertical axis, and all pixels across the horizontal axis.The left column shows the original datasets.The middle column shows the sampled data at  = 0.20.The right column shows the reconstructed data.Below each reconstruction is the completion rank, , the completion error   (see Eq. (16)), and the minimum approximation error (see Eq. (17)) for that completion rank.

Fig. 4 .
Fig. 4. Completion result following intentional sampling of the artifact in DS2 (dark spot towards the right).Note that the corrupted points are contained to the column of the initial artefact.

Fig. 5 .
Fig. 5. Comparing log 10 of the cluster results against undersampling ratio using the optimal completion rank.In each figure, lower scores are better.Note that the individual plots have been translated vertically to improve the clarity of the image.

Fig. 10 .
Fig. 10.PCA results of DS2.KNEEDLE identified the elbow of the eigen values at 4. Notice the first 4 eigenspectra and eigenimage contain useful information, after which we see the components are corrupted by excessive noise.

Table 2 .
Optimal Clustering Results for numerically sampled Spectromicroscopy datasets.For the metrics used (  , 1 − ARI,  spec ), lower scores are better.

Table 3 .
Time and efficiency (ratio of sparse and full scan times) of implementing sparse scans during experiments.