Topological data analysis and machine learning

Topological data analysis refers to approaches for systematically and reliably computing abstract ``shapes'' of complex data sets. There are various applications of topological data analysis in life and data sciences, with growing interest among physicists. We present a concise yet (we hope) comprehensive review of applications of topological data analysis to physics and machine learning problems in physics including the detection of phase transitions. We finish with a preview of anticipated directions for future research.


I. INTRODUCTION
Topological quantities are invariant under continuous deformations; an often-cited example is that a doughnut can be continuously transformed into coffee mug -both are topologically equivalent to a torus. The robustness of topological quantities to perturbations is inspiring physicists in many fields, including condensed matter, photonics, acoustics, and mechanical systems [1][2][3][4]. In all these areas topology has enabled the prediction and explanation of surprisingly robust physical effects.
Most famously, the extremely precise quantisation of the Hall conductivity observed in twodimensional electronic systems since the 1980s was explained as a novel topological phase of matter, the quantum Hall phase [5]. In this and many other examples from physics, we deal with smooth deformations in some parameter space, such as the energy bands of solid state electronic systems.
Physics is however an outlier among fields of science in that idealised continuous models and functions can explain a wide variety of observed phenomena. Other fields do not have the luxury of continuity and have to make do out of sparse data and limited observations in high dimensional parameter spaces. Despite this very different setting, topological approaches remain powerful.
A suite of computational topological techniques known as topological data analysis (TDA) has been developed over the past twenty years to systematically define and study the "shape" of complex discrete data in high dimensional spaces. TDA is attracting growing interest among physicists, particularly those working on topological materials or the application of machine learning techniques to physics [6][7][8][9][10].
At this time we are aware of two existing reviews on TDA aimed at the physics audience. The first by Carlsson, one of the founders of the field, gave a broad survey of different techniques of TDA and their applications in various areas of science [11]. The second review, by Murugan and Robertson, provided a detailed pedagogical and physicist-friendly introduction to two important techniques, persistent homology and the Mapper algorithm, applying them to the example of an astronomical dataset [12].
Since publication of these two reviews there has been growing interest in applying TDA methods to physics, including the incorporation of TDA into physics-targeted machine learning, with applications including the unsupervised detection of phase transitions. Moreover, the field of TDA has continued to evolve with new generalisations and techniques being actively studied.
The aim of this article is to review cutting edge applications of TDA to physics. We will provide a gentle introduction to the basic techniques, survey how TDA shows promise for the detection of novel phases of matter, and speculate on what we believe to be important directions for future research, including opportunities offered by newer TDA methods such as zigzag persistence.
The structure of this article is as follows: Sec. II provides a brief introduction to TDA guided by the simple example of two-dimensional point clouds. Sec. III discusses how TDA has been applied to identify order parameters and phase transitions in various physical systems. Sec. IV covers recent studies that employ TDA to compute features of physical systems that are then incorporated into a larger machine learning pipeline. Sec. V speculates on anticipated future directions and applications of TDA to physics, and vice versa. We conclude with Sec. VI.

II. TOPOLOGICAL DATA ANALYSIS
Admittedly TDA has a rather steep learning curve, since its foundation differs from the topology of continuous spaces to which physicists are more accustomed. However, after battling through  Examples of noisy point clouds. Point clouds sampled from objects with differing shapes and even differing dimensionality may be difficult to distinguish using standard summary statistics such as the centre of mass and variance. In "Circle" and " Figure 8" the noise randomly perturbs the points in the ambient two-dimensional space. In "Swiss Roll" points are sampled from a one-dimensional interval before being embedded into the two-dimensional space (x, y). the unfamiliar jargon and notation one can develop a powerful intuition for the subject. Our aim here is to give an equation-free sketch of the general approaches and terminology, while referring the motivated reader to more comprehensive and mathematically-rigorous reviews [11][12][13][14][15].
A. From point clouds to persistence diagrams As an instructive example let us consider the two-dimensional point clouds shown in Fig. 1. Each point may correspond to a distinct measurement of some object, e.g. the locations of photons arriving at a camera, or the positions of particles in a system. With our eyes we can clearly see that each cloud has a different shape: The points in the "Circle" and " Figure 8" clouds are distributed around one and two loops, respectively. On the other hand, the "Swiss Roll" corresponds to a noisy one-dimensional point cloud embedded into a higher (two-) dimensional space.
We would like to formalise these qualitative observations in a more systematic way, such that we are not reliant on directly plotting the data, which is an approach limited to two-or threedimensional datasets. How can we quantify the obviously different shapes of these point clouds? Standard summary statistics such as the centre of mass or variance are clearly inadequate, since they are not invariant under shape-preserving translations or rescaling of the data.
Fortunately graph theory provides rigorous ways of quantifying intuitive shapes of discrete datasets including point clouds. The idea is to construct a graph by connecting pairs of points (vertices) that are sufficiently close together by edges, and then quantify shape by computing topological invariants of the graph, its Betti numbers B k . The kth Betti number is the number of k-dimensional holes, e.g. the number of independent connected components (clusters B 0 ) or non-contractible loops (cycles B 1 ). In practice, evaluating graph invariants amounts to computing the ranks and null spaces of linear operators (matrices) acting on the graph's vertices and edges. In a nutshell, the computation of the shape of point cloud data can be reduced to simple linear algebra.
Higher-dimensional topological features can be similarly obtained by constructing generalisations of graphs known as simplicial complexes, which capture higher dimensional objects (faces, volumes, etc.) by triangulation. A k-simplex is a combination of (k + 1) vertices; edges are 1-simplices, triangular faces are 2-simplices, tetrahedral volumes are 3-simplices, and so on. A k-simplicial complex is collection of simplices with dimension of at most k.
Increasing k complicates matters. First, since k-simplices are combinatorial objects the number of possible simplices grows rapidly with k, limiting practical calculations to low dimensional topological features. Second, there is no unique way to construct a simplicial complex given only pairwise distances between points and a cutoff scale; different methods may differ in their computational costs, stability properties, and ability to faithfully reproduce shapes of the underlying space from which the points are sampled [16].
There is one big elephant in the room we must address: what do we mean by "sufficiently close" when connecting vertices to form the graph or simplicial complex? How do we determine which pairs of vertices to link by an edge and which pairs to leave disconnected? The number of cycles and clusters will be sensitive to the choice of cutoff distance and even possibly the addition or removal of a single edge, as illustrated in Fig. 2. This seems like a big problem making the approach lack robustness to noise and other perturbations.
The neat solution to the scale-dependence of graph invariants obtained from point clouds is to compute the shape of the graph over an entire range of scales known as a filtration, i.e. study its topology as a function of the cutoff length scale [17]. Topological features (e.g. clusters, cycles) persisting over a wide range of scales are more robust and should provide a meaningful characterisation of the overall shape of the data. On the other hand, features sensitive to small changes in scale or the addition or removal of a few edges can be attributed to noise and discarded if necessary. By studying the persistence of topological features we will be able to distinguish robust features from noise.
Persistence diagrams are one stable way to represent scale-dependent topological features of a dataset [18]. Fig. 3 shows persistence diagrams computed for each of the point clouds in Fig. 1. The most persistent topological features not only allow us to infer the overall shape of the data, but also gives information as to the geometry of the point cloud. For example, the birth scales of the long-lived cycles in the "Circle" and " Figure 8" clouds are related to the a maximum separation between neighbouring points comprising the cycle, while the death scale will be related to the   Fig. 1 computed using the Vietoris-Rips complex [16]. Each point represents a distinct topological feature. Horizontal and vertical axes denote the length scales at which each feature is created (b; birth) and destroyed (d; death) respectively. Points that are further from the diagonal dashed line therefore persist over a larger range of scales and are said to have a longer "lifetime" l = d − b. Since features must be created before they are destroyed, no points lie below the diagonal. At sufficiently large spatial scales all points become connected to form a single connected graph, corresponding to a single cluster with an infinite lifetime. Typically the infinite lifetime cluster is either discarded or plotted at a finite d and distinguished using a horizontal dashed line. cycle's diameter.
The attentive reader will notice that the persistence diagrams for the "Circle" and "Swiss Roll" clouds share the same long-lived features, despite their obviously-differing shapes. Closer inspection will, however, reveal noticeable differences in their short-lived features. For example, the cycles appearing in the "Swiss Roll" dataset all have similar birth scales, corresponding to the distance between the inner and outer part of the spiral and hinting at a one-dimensional embedding. This suggests that the differing shapes of these two point clouds may indeed be captured by inspecting their short-lived features; thus, persistent homology can also capture the local features (geometry) of the data.

B. Comparing and computing persistence diagrams
While persistence diagrams provide a compact visual summary of the scale-dependent topological features of a single dataset, it is not immediately clear how we should go about comparing persistence diagrams computed for different datasets; they will generally differ in their number of features and level of noise, making it difficult to establish a common threshold between genuine features and noise-induced features.
These issues motivated the development of stable distance and similarity measures for persistence diagrams. Here stability means that a small change to one dataset results in, at most, a similarly small change to the similarity to other fixed persistence diagrams.
One example of a stable distance measure is the Wasserstein distance, which is the smallest distance the points in a pair of persistence diagrams must be moved in order to transform one diagram into the other. Unpaired features (i.e. if one diagram has more features) are moved to the diagonal. For example Fig. 4 shows the matching between the one-dimensional cycles of the Circle, Figure 8, and Swiss Roll point clouds. Since all features contribute to the Wasserstein distance, even the noise-induced ones close to the diagonal, it can be less sensitive to changes in the most persistent features. Another popular choice of distance measure is the bottleneck distance, which is the largest deformation of a pair of features required to convert one diagram to another (i.e. the Wasserstein distance under the p = ∞ norm). The bottleneck distance is thus independent of the short-lived features near the diagonal.
Alternative approaches for characterising and comparing the information contained in the persistence diagrams employ vectorisation: the variable length information encoded in the (b, d) pairs of the persistence diagram are mapped to a vector or vectors in a fixed-dimensional space; different persistence diagrams can then be studied using more familiar tools such as vector inner products. For example, one might compute a set of summary statistics such the entropy or moments of the feature lifetimes l = |d − b| [19][20][21], assuming they are relevant to the task at hand.
Often the relevant features are unknown a priori and it is preferable to compute a highdimensional vectorisation to minimise the loss of relevant information. For example, the persistence landscape provides a stable and invertible (i.e. information-preserving) vectorisation of persistence diagrams [22,23].
Using a distance measure or vectorisation allows one to combine persistent homology with powerful machine learning techniques such as artificial neural networks or clustering algorithms to compare topological features of different datasets and perform tasks including shape-based identification and classification of different point clouds, which will be explained further in Sec. IV. However, one important consideration in applying vectorisation or distance measures is that they can introduce additional hyper-parameters that may affect the sensitivity to different topological features of the data.
There are a variety of software libraries for computing persistence diagrams, their vectorisation, and distance measures [24][25][26][27], surveyed in Ref. [28]. Crucial for applications, persistence diagrams can be efficiently computed given a filtration by building up the simplicial complex one element at a time, detecting any changes to the topological features at each step. This yields not only the feature birth and death scales, but also their representations, e.g. edges comprising a cycle. Nevertheless, due to the combinatorial nature of simplicial complexes the computational requirements grow rapidly with the feature dimension k, with most practical applications limited to k ≤ 2.
To compute a persistence diagram the end-user must provide at a minimum either the data points or a distance matrix encoding pairwise distances between points. One can also consider custom filtrations. For example, when dealing with image data one can use the greyscale pixel values as a filtration parameter, constructing a simplicial complex out of pixels less than (or exceeding) a given threshold [29,30]. The resulting sublevel (superlevel) set filtration summarises the critical points of an image, i.e. its local minima, maxima, and saddle points, as well as their higher-dimensional generalisations.

C. Other approaches and recent developments
The above discussion of persistent homology has been limited to the simplest case of simplicial complexes constructed from two-dimensional point clouds. There are a variety of related techniques for studying complex datasets by reducing them to families of graphs or simplicial complexes which we only mention briefly here due to space constraints.
The Mapper algorithm reduces point clouds to simpler low-dimensional graphs by performing clustering on overlapping subsets of the data [31,32]. Local anomalies such as intersections and cusps can be similarly detected by comparing the persistent homology of different subsets of the data [33].
Standard persistent homology constructs filtrations as a sequence of nested simplicial complexes; as the filtration parameter (e.g. cutoff distance) is increased edges and higher-dimensional simplices are added to the complex and never removed. In certain situations, e.g. when studying temporal network dynamics, simplices can be both added and removed as a control parameter is varied. Zigzag persistence is a technique that enables the identification of significant topological features in this case [34].
Another important problem is to compute persistent topological features as multiple control parameters are varied, which is termed multidimensional persistence [35]. This problem is a lot more complicated than the single parameter case, due to the absence of simple persistence diagram representations.
We considered examples where point clouds are used to construct undirected graphs and simplicial complex, encoded by matrices with binary elements {0, 1}, denoting whether a simplex is present or absent. Persistent homology can also be calculated with respect to other fields such as integers modulo 3, describing e.g. directed graphs or simplices, which can be useful for analysing data with twists including points sampled from the surface of Möbius strips [25].

A. Early examples
Early applications of TDA appearing in physics journals in the 2000s considered examples where the underlying data already has a well-defined shape or graph structure, making the construction of graphs more straightforward. Examples include dynamical systems [36], random clouds of spheres [37], random networks [38], and binary image data [39,40].
In the case of over-sampled time-series measurements of dynamical systems, the sampled points will form a single continuous curve in the absence of noise. This fact can be used for topological filtering of certain types of noise, e.g. when a small fraction of the measured points are perturbed, as shown in Fig. 5(a). By computing the scale-dependent distribution of zeroth Betti numbers B 0 one can separate points belonging to the dynamical trajectory (forming a single big cluster) from noise-perturbed points (each forming a separate cluster), filtering out the latter in Fig. 5(b) and improving the accuracy of estimated Lyaponov exponents [36].
A second early application was the analysis of convection in two-dimensional fluids under heating [39,40]. There, the fluid separates into distinct hot and cold regions, illustrated in Fig. 5(c). In this case the zeroth and first Betti numbers B 0,1 were used to characterise the shape of the hot and cold regions. The scaling of the number of distinct microstates (shapes) with the area of the fluid yields an effective dimension of the dynamics that could be computed more efficiently than the conventional approach based on the singular value decomposition of the images' two-point correlation functions. One application of the effective dimension is the detection of boundary effects, shown in Fig. 5(d). The strong contrast between hot and cold regions of the images in this case meant that persistent homology was not required; analysis of the graph formed at a single cutoff scale was sufficient.

B. Persistent homology of point clouds and images
In many applications the construction of well-defined shape from the data is less straightforward, or one may be interested in identifying structures present at different spatial scales. For example, in the case of point cloud data it may be difficult to assign a size or radius to the individual points. In other cases one may want to apply intuition obtained from simple analytically-solvable limits to more realistic systems [41,42]. In situations such as these persistent homology becomes a powerful tool for extracting meaningful shape information from the raw data.
For example, suppose we wish to study the microscopic structure of materials. The raw data naturally takes the positions of the constituent atoms and their sizes. Persistent homology enables studying the multi-scale structure of materials using just the positions of atoms in threedimensional space (obtained from imaging or simulations) together with the standard Euclidean distance. Refs. [43,44] used persistence diagrams computed from molecular dynamics simulations of various materials exhibiting glassy phases to characterise their structure. Figure 6 shows examples of persistence diagrams obtained for liquid, glass, and crystalline phases of silica. In the crystalline phase the clustering of feature births and deaths reveal scales corresponding to the bond lengths of the material, i.e. separations between the constituent atoms. Moreover, inspection of the cycles corresponding to persistent features also reveals the nature of the short-range order appearing in the glass phase.
Subsequent works applied similar techniques to amorphous ices [45], granular media [46,47], spin configurations in lattice spin models and gauge theories [48,49], and two-dimensional materials, where the measures obtained using persistent homology can be directly compared with more standard metrics [50].
Another application of the point cloud formalism concerns the analysis of time series signals including detection of chaotic dynamics. Already in the 1990s there was interest in applying computational topology to study the shape of the dynamics in phase space, including quantify- ing the shape of chaotic attractors [51]. Here the key ingredient is Takens' embedding theorem, which states that a sequence of observations φ t taken at regular time intervals τ can be used to reconstruct the shape of the dynamics by constructing a point cloud of n-dimensional vectors v t = (φ t , φ t−τ , φ t−2τ , ...), provided the embedding dimension n is sufficiently large. Persistent homology enables the systematic study of dynamics via the shape of the point clouds in the high-dimensional embedding space [52][53][54]. For example, period-doubling transitions can be detected via the emergence of new persistent clusters. Successive period-doubling transitions as a system approaches the chaotic regime results in the creation of many clusters, which merge into a single line or volume.
Applying persistent homology to image data enables the study of shapes of images in which there may not be a clear distinction between "bright" and "dark" regions, or in images where structural information at multiple intensity scales is important. Large point cloud datasets for which a direct persistent homology calculation may be quite time-consuming can alternatively be studied using image filtrations by converting the cloud to a density image [55].
Early works on persistent homology of images used Betti numbers to characterise solar magnetic field distributions [56] and force networks in different kinds of compressed granular media, studying the number and connectivity of regions at different scales [57,58]. More recently, persistence diagrams obtained from images have been used to study non-Gaussian temperature fluctuations in the cosmic microwave background [59], the shape of iso-frequency contours in photonic crystals [60], many-body dynamics and solitons in Bose-Einstein condensates [61,62], phase transitions in spin models [63,64], and order-disorder transitions in nematic liquid crystals [65] and optical waveguide lattices [66]. Recent work aims to better understand how to relate the shape information captured by TDA to physical properties including the permeability of fractured materials [67].

C. Finding meaning using abstract distance measures
So far we have considered examples where we have some intuitive notion of the shape of the underlying data, and the role of TDA has been to study these shapes more systematically. One exciting emerging application of TDA is in studying and discovering structure in complex systems for which simple visualisations (such as images or phase space trajectories) do not exist, including families of high-energy physics models [68][69][70]. This typically requires the identification of a suitable distance measure for the data.
For example, in the case of quantum many-body systems measures of entanglement between pairs of subsystems such as the concurrence or entanglement entropy can be used to study the abstract shapes of quantum states and group them into different classes [71][72][73]. Understanding this entanglement structure may be helpful for judging when approximation techniques such as tensor networks may be used to efficiently simulate the system of interest.
Another important application of abstract distance measures is in the study of condensed matter systems at finite temperatures, where one would like to quantify the "shape" of an ensemble of system configurations sampled at a given temperature to detect phase transitions and critical points [48,63,64,74]. There are various notions of distance that can be applied in this context including the geodesic distance between different spin configurations [75] and the quantum dis-tance based on the overlap between eigenfunctions [60,76]. Tests of the Anderson, Hubbard, and Potts models suggest TDA may be useful for precisely detecting critical points without requiring computationally-expensive finite size scaling analysis [77,78].

IV. MACHINE LEARNING FOR PHYSICS USING TOPOLOGICAL DATA ANALYSIS
A. Applications of machine learning to physics Machine learning offers powerful data-driven approaches for modelling, characterising, and designing complex physical systems [6][7][8]10], including topological materials [79][80][81][82][83][84][85]. Two classes of machine learning approaches attracting interest among physicists are supervised and unsupervised learning algorithms. Supervised learning aims to correctly classify new observations after being trained on a set of labelled examples. Unsupervised learning aims to detect novel features in unlabelled datasets, e.g. by grouping similar observations into clusters or identifying outliers.
Dealing with the deluge of data generated by high energy physics experiments was an early application of large scale machine learning techniques to physics [6,86]. Anomaly-detection techniques are used to identify the small fraction of interesting events to be recorded and processed further. Supervised learning techniques based on human-labelled or computer-generated examples can be used to convert the high-dimensional raw detector data (e.g. particle trajectories and deposited energies) into a signal of interest (e.g. the type of particles generated). Similar techniques are now being adopted in other fields involving high repetition rate experiments, including reconstructing ultrashort optical pulses [87], identifying solitons in Bose-Einstein condensates [88], and optimizing the fidelity of quantum gates [89]. In all these examples, machine learning can be used to perform tasks faster and at a larger scale than conventional approaches.
The performance of machine learning algorithms is closely tied to the quality and quantity of the input data; the machine learning model needs a sufficiently large set of relevant observations to make accurate predictions. On the one hand, the computational costs of machine learning algorithms can be enormous when they are applied to real-world problems involving large-scale datasets. On the other hand, in many physics problems the amount of available data may be highly constrained (e.g. due to high costs of fabrication, characterization, or computational resources), making approaches compatible with sparse datasets essential.
B. Combining TDA with machine learning TDA methods are promising as a means of enhancing the performance of machine learning methods [15]. Instead of feeding all observables of the system of interest (e.g. entire images or full many-body quantum wavefunctions) into the machine learning algorithm, TDA can identify a smaller set of relevant topological features which can be used as input into a simpler and faster machine learning model. Especially, TDA methods seem naturally suited to studying phenomena such as topological phase transitions or other global structural changes which may be difficult to capture using conventional techniques.
As noted in Sec. II B, a key challenge in combining TDA with machine learning is the question of how best to convert the information encoded into persistence diagrams into a format usable by machine learning algorithms. The two main approaches are distance measure-based and vectorisation.
Refs. [48,[76][77][78]90] have used distance measures of persistence diagrams to compute a distance matrices used as inputs for kernel-based machine learning algorithms for supervised and unsupervised detection of phase transitions in several lattice models including the Ising, XY, and Heisenberg spin models [64,77] and classification of biological time series [54,91,92]. There are many possible metrics to use. In practice the Wasserstein and bottleneck distances [76,77] can be time-consuming to compute. There are faster alternatives including the sliced Wasserstein distance [48] and Fisher kernel [90], however they introduce additional hyper-parameters which need to be optimised.
Vectorisation-based approaches can reduce the persistence diagrams into a simpler format that is more easily interpretable, at the expense of losing some of the contained information. For example, Refs. [62,66,74] employed simple summary statistics of the feature lifetimes including their Shannon entropy and norms to verify that persistent homology does indeed detect relevant features that can be used to train machine learning models. Alternatively, Refs. [63,64,93] employed persistence images [94], which form a discretised representation of persistence diagrams. One word of caution in the use of the persistence images is that their construction involves hyperparameters that should be optimised to obtain good performance [15].
Once the persistence diagrams have been converted into a format usable by machine learning algorithms, the final step is to choose the specific machine learning model. Several studies have considered supervised classification using logistic regression and support vector machines, which find an optimal separating hyperplane between different data classes [62][63][64]. More recent studies comparing various machine learning models suggest that classification and clustering based on topological features can still be a highly nonlinear problem, making nonlinear machine learning models such as multidimensional scaling, k-nearest neighbours, or artificial neural networks a better choice [48,64,74,93]. Even when neural network methods are required to obtain an accurate model, the use of TDA-based input features can offer significant reductions in the required width and depth of the networks, making them easier to train [93].

C. Learning phase transitions using TDA
The prospect of discovering novel phases of matter motivates studies of machine learning-based approaches for detecting phase transitions. Supervised learning methods can make use of labelled data drawn from known phases or exactly-solvable limits to draw inferences about the location of phase boundaries [95,96]. On the other hand, unsupervised methods such as manifold learning use an appropriately chosen-similarity measure to compare different samples and group them into different classes, without requiring precise knowledge of the number of distinct phases [79][80][81][82][83][84][85].
The reliable detection of phase transitions using machine learning requires use of an appropriate cost function or similarity measure that is sensitive to the transition of interest. For example, topological phase transitions require the bulk band gap of the system to close and re-open, motivating the use of non-local similarity measures invariant under gap-preserving deformations [81] or sensitive to points at which the gap closes [82,85]. These measures typically-involve hyper-parameters such as the kernel resolution which must be chosen carefully to ensure good accuracy.
By directly capturing shape information of the system of interest, persistent homology-based methods are able to capture phase transitions using simpler models with fewer hyper-parameters. Figure 7 shows an example of a persistent homology-based machine learning pipeline for studying phase transitions in the two-dimensional XY model [64]. Persistence diagrams are computed for a given spin configuration based on the relative angle between neighbouring spins. The persistence diagram is then vectorised into a persistence landscape encoding the probability of obtaining features with a given birth scale and lifetime, which can be averaged over spin configurations sampled at a given temperature. The averaged persistence landscapes obtained for various temperatures are then used to train a machine learning model, such as logistic regression, which estimates location of the phase transition using the training data.

A. New techniques for topological data analysis
An area of active research among physicists is the application of TDA tools to analyse the structure of more complex systems including flow networks involving directed links [97] and timeevolving networks [92]. One approach used in recent studies that is compatible with standard persistent homology tools is to convert the directed network into a regular point cloud using a diffusion map, which constructs edges between a pair of vertices (i, j) by computing the probability of diffusion between i and j. It will be interesting to explore alternate approaches that can work directly with unidirectional or time-evolving systems without requiring diffusion maps, such as zigzag persistence [98].
The metrics used for quantifying differences between persistence diagrams have applications beyond persistent homology. For example, Ref. [99] used the Wasserstein distance to compare different local neighbourhood structures of disordered media, based on the intuition that it encodes the energy cost required to transform one configuration to another. The advantage of such a topological metric compared to more conventional measures including the Kullback-Leibler divergence is that the former is better at distinguishing weakly-overlapping distributions. Are there other examples where such metrics can be linked to physical observables?

B. Quantum topological data analysis
All of the examples considered in the physics literature relate to the study of low-dimensional topological features using TDA, largely because higher dimensional features are both harder to interpret and become extremely time-consuming to compute for large datasets, due to exponential scaling. The advent of more efficient quantum algorithms for TDA including computation of Betti numbers and persistence diagrams is anticipated to enable the study of higher-dimensional topological features of complex datasets.
The first quantum algorithm for TDA was proposed by Lloyd et al. in 2016 [100]. Their algorithm exhibited an exponential speedup for calculating Betti numbers by using quantum phase estimation to efficiently construct combinatorial Laplacians of simplicial complexes and identify cycles by computing their zero modes. This proposal was followed in 2018 by a small-scale few-qubit proof-of-concept quantum optics experiment [101].
Subsequent studies have started to address limitations of the first quantum TDA algorithm [102] by proposing more efficient variants [103][104][105] as well as quantum algorithms for computing persistent Betti numbers [106,107] and the Wasserstein distance [108]. While most of these algorithms are designed for future fault-tolerant quantum computers, there is also the potential for near-term quantum speedups using shallow quantum circuits with depth linear in the number of input data points [109,110], exploiting the efficient implementation of the boundary operator using entangled quantum states [111][112][113]. While the prospect of an exponential speed up compared to the best classical TDA algorithms is entrancing, whether and when such a large speed for practical problems will be achieved is under debate [103,104,110,114,115], especially with new and improved classical algorithms still being developed [116].

C. Learning physics versus machine learning physics
One challenge encountered by existing literature applying TDA to physics problems is that the techniques are unfamiliar to the physics audience. Many of the original TDA articles in which techniques were first introduced are highly theoretical and mathematically-rigorous, thus articles in physics journals require long introductions explaining the approaches used to this non-specialist audience. This can lead to a focus on the technical calculation details while perhaps obscuring the bigger picture.
For instance, many articles include examples of persistence diagrams computed from the system of interest in order to illustrate qualitative differences between different phases or states. However, the persistence diagram is itself not easily interpretable, requiring knowledge of the form of the input data and filtration used. For this reason, many studies then apply machine learning techniques to extract quantitative predictions from the information contained in the persistence diagrams.
On the other hand, as physicists we would prefer to make sense of the system ourselves, rather than delegate understanding to a machine learning algorithm. What is of interest to us are which topological features are meaningful, and what they look like. The approach used in Ref. [44], where representative cycles of the persistent features are included as insets in the persistence diagrams [reproduced here in Fig. 6(b)], is one way their meaning can be made more explicit. Still, selecting appropriate features can be challenging when there is no clear boundary between the signal and the noise. It will be interesting to explore other TDA-based techniques for dimensionality reduction and compactly conveying the significant features of high-dimensional physical systems to nonspecialists in TDA.

VI. CONCLUSION
In summary, we have given an overview of emerging physics applications of topological data analysis, focusing on persistent homology. The take-home message is that TDA can be used to compress complex datasets into their essential (topological) features which can be used as input to simpler machine learning models compared to widely-used and computationally-expensive artificial neural networks. Nevertheless, as topological data analysis is relatively new it is still largely employed on an ad-hoc basis and further work is needed to establish a standard set of methods that non-specialists can trust [15].
Topological data analysis has already been fruitfully applied to other areas of research including image analysis and medical science, enabling the extraction of useful insights from complicated hard-to-visualise datasets. For example, TDA-based methods have outperformed other more popular machine learning approaches for complicated problems such as predicting biomolecule binding efficiencies [117]. We hope that the techniques discussed here and in other recent reviews aimed at the physics audience [11,12] will not merely provide a transient fashionable alternative to more standard methods of data analysis used by physicists, but will form a new set of long-lasting tools enabling a better understanding of complex physical systems from classical to quantum.

DISCLOSURE STATEMENT
We declare no potential conflicts of interest.