Quantum Machine Learning in High Energy Physics

Machine learning has been used in high energy physics since a long time, primarily at the analysis level with supervised classification. Quantum computing was postulated in the early 1980s as way to perform computations that would not be tractable with a classical computer. With the advent of noisy intermediate-scale quantum computing devices, more quantum algorithms are being developed with the aim at exploiting the capacity of the hardware for machine learning applications. An interesting question is whether there are ways to combine quantum machine learning with High Energy Physics. This paper reviews the first generation of ideas that use quantum machine learning on problems in high energy physics and provide an outlook on future applications.


Introduction
Particle physics is a branch of science aiming at understanding the fundamental laws of nature by studying the most elementary components of matter and forces. This can be done in controlled environments with particle accelerators such as the Large Hadron Collider (LHC), or in uncontrolled environments such as cataclysmic events in the cosmos. The Standard Model of particle physics is the accomplishment of decades of theoretical work and experimentation. While it is an extremely successful effective theory, it does not allow the integration of gravity, and is known to have limitations. Experimentation in particle physics requires large and complex datasets, which poses specific challenges in data processing and analysis.
Recently, machine learning has been playing a significant role in physical sciences. In particular, we are observing an increasing number of applications of deep learning to various problems in particle physics and astrophysics. Beyond typical classical approaches [1] (boosted decision tree, support vector machine, etc.), state-of-the-art deep learning techniques (convolutional neural networks, recurrent models, geometric deep learning, etc.) are being explored for prototype applications and successfully deployed in several tasks [2,3].
The ambitious high luminosity LHC (HL-LHC) program in the next two decades and beyond will require enormous computing resources. New technologies such as quantum machine learning could possibly help overcome this computational challenge. The recent development of quantum computing platforms and simulators available for public experimentation has lead to a general acceleration of research on quantum algorithms and applications. In particular, quantum algorithms have recently been proposed to tackle the computational challenges faced in particle physics data processing and analysis. Beyond explicitly writing quantum algorithms for specific tasks [4][5][6][7][8], quantum machine learning is a way to learn quantum algorithms to achieve a specific task, similarly to classical machine learning.
This review paper of how quantum machine learning is used in high energy physics (HEP) is organized as follows. An overview of the fields of quantum computing and quantum machine learning are first provided in Sections 2 and 3. We review the applications of quantum machine learning algorithms for particle physics using quantum annealing in Sections 4 and quantum circuits in Section 5. We provide a field view of unpublished work and upcoming results in Section 6. We conclude with discussions on the future of quantum machine learning applications in HEP in Section 7.

Quantum Computing
After more than three decades since Richard Feynmans proposal of performing simulations using quantum phenomena [9], the first practical quantum computers are finally being built. The scope of calculations significantly expanded beyond simulations, with a range of promising applications emerging, including optimization [10][11][12], chemistry [13,14] and machine learning [15][16][17].

Quantum circuit model
Quantum computers were formally defined for the first time by David Deutsch in his 1985 seminal paper [18], where he introduced the notion of quantum Turing machine, a universal quantum computer based on qubits and quantum circuits. In this paradigm, a typical algorithm consists in applying a finite number of quantum gates (unitary operations) to an initial quantum state, and measuring the expectation value of the final state in a given basis at the end. Deutsch found a simple task that would require a quantum computer less steps to solve than all classical algorithms, thereby showing that quantum Turing machines are fundamentally different and can be more powerful than classical Turing machines. Since then, many quantum algorithms with a lower computational complexity than all known classical algorithms have been discovered, the most well-known example being Shor's algorithm to factor integers exponentially faster than our best classical algorithm [19]. Other important algorithms include Grover's algorithm invented in 1996 to search an element in an unstructured database with a quadratic speed-up [20], and the Harrow-Hassidim-Lloyd (HHL) algorithm, invented in 2008 to solve linear systems of equations [21].
However, all those algorithms require large-scale fault-tolerant quantum computers to be useful, while current and near-term quantum devices will be characterized by at least three major drawbacks: (i) Noise: the coherence time (lifetime) of a qubit and the fidelity of each gate (accuracy of the computation) are currently very low in all devices due to the interaction of each qubit with its surrounding environment, limiting the depth of practical quantum circuits that can be run on current machines.
(ii) Small number of qubits: most near-term quantum computers have between 50 and 100 qubits, which is not enough for traditional algorithms such as Shor's or Grover's to achieve a quantum advantage over classical algorithms.
(iii) Low connectivity: most current quantum devices have their qubits organized in a certain lattice, where only nearest-neighbors can interact. While it is theoretically possible to run any algorithm on a device with limited connectivity-by "swapping" quantum states from qubit to qubit-the quantum advantage of some algorithms can be lost during the process [22].
Therefore, a new class of algorithms, the so-called Near-term Intermediate-Scale Quantum (NISQ) algorithms [23], have started to emerge, with the goal of achieving a quantum advantage with those small noisy devices. One of the main classes of NISQ algorithms is based on the concept of variational circuits: fixed-size circuits with variable parameters that can be optimized to solve a given task. They have shown promising results in quantum chemistry [13] and machine learning [24] and will be discussed in more details in Section 3.1.

Quantum annealing
Another paradigm of quantum computing, called adiabatic quantum computing (or quantum annealing, QA) has been introduced several years after the gate model described above [25,26] and has been implemented for instance by the company D-Wave. In theory, this paradigm is computationally equivalent to the circuit model and Grover's algorithm can for instance be ported to quantum annealing [27]. It is based on the continuous evolution of quantum states to approximate the solution of Quadratic Unconstrained Binary Optimization (QUBO) problems, of the form: where x i ∈ {0, 1} and J ij and h i are real numbers defining the problem. This general problem belongs to the complexity class NP-Hard, meaning that it can probably not be solved exactly in polynomial time even by a quantum computer ‡. Quantum annealing is a heuristic proposed to approximate the solution of a QUBO ‡ While a proof is still to be found, complexity theorists believe that quantum computers will not lead to exponential speed-ups for NP-Complete or NP-Hard problems problem, or even solve it exactly when the input parameters J ij and h i have some particular structures [27].
More precisely, solving a QUBO instance is equivalent to finding the ground-state of the problem Hamiltonian where σ z i is the Z-Pauli matrix applied to the i th qubit. Quantum annealing consists in initializing the system in the ground-state of a simpler Hamiltonian, such as and slowly evolving the system from H I to H P during a total time T , for instance by changing the Hamiltonian along the trajectory: The quantum adiabatic theorem tells us that if the transition between the two Hamiltonians is "slow enough", the system will stay in the ground-state during the whole trajectory, including at the end for our problem Hamiltonian. Measuring the final state will therefore give us the solution to our QUBO problem. The main caveat of this approach is that the maximum speed of the evolution can become increasingly low with the system size (sometimes exponentially low), removing any potential advantage compared to classical algorithms. Knowing if a given problem (or class of problem) can take advantage of quantum annealing is an open research question, which is why research on quantum annealing applications has been driven largely by empirical studies. Many optimization problems, including in machine learning, can be mapped to a QUBO instance, making quantum annealing an attractive platform for quantum machine learning, as developed in Section 3.2.

Quantum Machine Learning
Quantum machine learning has evolved in recent years as a subdiscipline of quantum computing that investigates how quantum computers can be used for machine learning tasks -in other words, how quantum computers can learn from data [17,28]. One can approach this question in three different ways, which reflect similar angles established in quantum computing: • the foundational approach that reformulates learning theory in a quantum setting [29,30], • efforts to find quantum algorithms that speed up machine learning with regards to computational complexity measures [31][32][33][34], • a near-term perspective that develops new machine learning applications tailormade for NISQ devices [35] Currently, classical machine learning is a distinctively empirical discipline, pioneered by research conducted in industry. It is therefore possibly not surprising that quantum machine learning research is also dominated by the near-term perspective, a fact reflected in the selection of papers discussed in this review.
The near-term perspective of quantum machine learning starts from the quantum devices available today and asks how they can be used to solve a machine learning problem. Predominantly, circuit-based quantum computers have been proposed to compute the prediction of a quantum machine learning model that can be trained classically [24,36], while quantum annealers have been proposed to optimize classical models [37,38].

Quantum circuits as trainable models
A machine learning model can often be written as a function f (x, θ) that depends on an input data point x -for example describing the pixels of an image or a vectorized text document -as well as trainable parameters θ. The result of the model, f , is interpreted as a prediction, for example revealing the label of x in a classification task. For simplicity, we will here assume a scalar output.
We know from the basics of quantum mechanics that the result of a quantum circuit is a measurement with a probabilistic outcome.The expectation value of a quantum observable however is a deterministic value, like the average state of the qubit. To estimate the expectation in practice, one has to repeat the measurement multiple times (a.k.a shots). The quantum circuit is thereby serving as a quantum model.
Associating the expectation value with a machine learning model, making it depends on inputs x and trainable parameters θ, can be done by associating physical control parameters with the input features and individual parameters. For instance, in most circuit-based quantum computers we have control over the rotation angle of qubits. Assuming for now that x is a single scalar, we can therefore rotate one qubit by an angle of exactly x to encode the input §. Using the same strategy for a parameter θ, considered to be a scalar as well for now, we can rotate another (or the same) qubit by an angle θ. Physically, there is no difference in how the inputs and free parameters are treated, but there are profound conceptual differences; see for example [39]. These rotations can be performed as part of a larger quantum algorithm that consists of other gates, and which is described by an overall unitary U (x, θ) that depends on the input and parameter (see Figure 1). The crux is that now the expectation value of the circuit with respect to an observable M is formally given by f q (x, θ) = 0 U (x, θ) † M U (x, θ) 0 , § Note that x has to be rescaled to lie in the interval [0, 2π] for the encoding to be unique. and can be interpreted as the prediction of x. In short, the quantum circuit is used as a machine learning model. Figure 1. Example of a variational quantum circuit encoding a scalar input x as well as a free parameter θ into the rotation angles of two different qubits. The three qubits are represented in the standard circuit notation as wires, and gates are represented by symbols acting on the wires. The unitaries V 1 and V 2 summarise other general, fixed quantum operations applied to the qubits. The first qubit is measured in the end, and an expectation value is computed by averaging over measurement results. This expectation is interpreted as the prediction of a quantum model.
Of course, the heart of machine learning is to adapt a model to data. The circuit can be trained by adjusting the parameters θ by a classical optimization routine that minimizes a standard cost function comparing predictions with the correct target outputs, such as the mean square loss. Trainable circuits are also known as variational or parametrized circuits (or sometimes, a bit misleadingly, as quantum neural networks), and have been initially proposed in quantum chemistry [40]. The optimization can be performed by using the quantum computer to evaluate f q (x, θ) at different values for θ, and using a classical co-processor to find better candidates for the parameter with respect to the cost function, using either gradient-free or finite-difference based optimization methods.
Inspired from quantum control, quantum machine learning has recently developed an elaborate framework of gradient-based optimization [41,42] that has already been implemented in powerful software frameworks [43,44], which may prove superior to gradient-free methods when quantum computers get bigger [45]. An essential result was to notice that in many cases used in practice, one can compute the analytic or exact gradient from f q (x, θ + s) and f q (x, θ − s) for a fixed but known constant s. While this reminds of a finite-difference rule, the important fact is that s is a macroscopic variable such as π/2, which makes estimating the two values by repeated measurements on a noisy device possible. Furthermore, the resulting gradient is not an approximation, but the true analytic gradient. The ability to compute gradients of variational circuits has potential consequences that reach far beyond quantum machine learning, since it makes quantum computing amenable for the paradigm of differentiable programming.
Finally, it should be mentioned that there are many other ways that variational circuits are employed in quantum machine learning. For example, the genuinely probabilistic nature of quantum measurements suggests that variational circuits can be used as an ansatz for generative models. In the generative mode, the result of a quantum measurement is interpreted as a sample of a probabilistic machine learning model that defines a probability distribution over the data that may depend on parameters [46,47]. This has amongst other proposals led to quantum generative adversarial networks [48,49].

Quantum Annealers as Optimizers
Quantum annealers represent a different approach to quantum machine learning. As natural optimizers, they outsource the training part of machine learning to quantum computers, rather than the prediction part. Since quantum annealers solve very specific optimization problems, more precisely QUBO problems (see Eq. 1), the central challenge is to rephrase the loss function of a (quantum) machine learning problem in this format.
For example, an interesting and very early proposal [38] recognized that the mean square loss of an ensemble of perceptrons -the simple building blocks of neural nets -can be written as a QUBO problem. A prerequisite is that the weights of the model have to be binary values -a condition that may even offer advantages for machine learning. The approach has been termed QBoost and tested in one of the first commercial quantum annealers, the D-Wave machine, as early as in 2009 [50]. Other proposals to use the QUBO structure of quantum annealers for machine learning have been proposed for anomaly detection, in particular software verification and validation [51].
Another, slightly different idea uses quantum annealers as samplers to support classical training of classical models [37]. In the training of so-called Restricted Boltzmann Machines (RBMs), samples from a Gibbs distribution are required to find better candidates for the parameters in every step. The intimate connections between RBMs and Ising-type models in many-body physics (see also [52] which reveals this connection through the language of tensor networks) suggest that quantum annealers, which are based on interacting spins, can produce samples from such Gibbs distribution. The details, especially when it comes to real hardware, are non-trivial, but successful quantum-assisted training has been demonstrated for small applications [37]. An important question raised as a result of this strategy was how samples from true quantum distributions, such as the Ising model with a transverse field, can be used to train quantum RBMs [53].

Quantum Annealing Applications
For quantum annealers, the two most common approaches to machine learning involve mapping the problem into an optimization problem over the full dataset, and using the quantum device as a sampling engine to solve a difficult gradient calculation problem. The following papers provide examples of these paradigms.

Di-photon Event Classification
The classification of collision events as coming from an expected signal or known background is one of the main particle physics tasks to which machine learning is applied. The Higgs boson, until its discovery in 2012 [54,55], was the missing piece of the standard model. The authors of [56] propose the use of quantum annealing to classify events between a Higgs decaying to a pair of photons and irreducible background events where two uncorrelated photons are produced. To this end, eight high level features are measured from the di-photon system. With a view to using the method proposed in [51] -so called quantum adiabatic machine learning (QAML) -a list of weak classifiers is computed from those eight features. Using the features and their products as input, 36 weak classifiers (c i (x τ )) are computed with the output clipped to the range [−1, 1]the signal being represented by positive values. A strong classifier is then constructed from a binary linear combination of the weak classifiers (with parameter w i ∈ {0, 1} for each weak classifier i).
The parameters w i are then determined by optimization of a carefully crafted QUBO where C ij = τ c i (x τ )c j (x τ ) and C i = τ c i (x τ )y τ are computed from the values of the weak classifiers in the training set and their category (c i (x τ ) and y τ ). λ is a parameter penalizing for having too many weak classifiers participating. As described in Section 2.2, the QUBO is transformed in a problem Hamiltonian H P (see Eq. 2.2) with the change of variable σ z i ← 2w i −1, and further embedded in a machine Hamiltonian to be solved on the device. The final performance of the strong classifier is compared with two classical machine learning methods: boosted decision tree (BDT) and deep neural network (DNN). While SA is accurately finding the same ground truth found by QA, it is unable to reproduce the excited states measured with QA. To each of the lowest energy state found on QA corresponds a strong classifier, and the best performing one is selected.
The authors note that importance ranking can be obtained among the weak classifiers, by varying the parameter λ. The optimization is both run on the D-Wave 2X quantum annealer system and performed with simulated annealing [57,58] (SA) using variable fractions of the training dataset. The inclusion of the excited states in the construction of the strong classifier with QA brings a slight difference of performance compared to the one derived with SA, although not conclusive. SA and QA are typically on par, and not providing obvious classification advantage over BDT and DNN (see Figure 2), although a slight advantage with a small training dataset is noted.
The binary linear combination is extended to a continuous linear combination in [59] by running the optimization in an iterative manner. In order to take advantage of the continuous weights, additional weak classifiers, up to N w in total, are derived from the where s i (t) is the result of the optimization of the same Hamiltonian, evaluated under the change of variable A bit flip heuristic is introduced between each iteration, with decreasing probability, as a measure to regularize and prevent the procedure to land in a local minima. The authors note that there might be other such heuristic that could provide a better final accuracy. The size of the problem Hamiltonian compared to the connectivity of the hardware is such that the authors prune cross term with low values and use a procedure provided by D-Wave to partially solve the optimization. The proposed hybrid algorithm (so called QAML-Z) outperforms QAML while remaining without an accuracy advantage over classical approaches (see Figure 3). Here again results obtained using simulated annealing and quantum annealing are on par. The scheme under which a discrete optimization is used iteratively as an approximation of continuous optimization using quantum annealers opens new directions for future algorithms.

Classification in Astrophysics with Quantum Restricted Boltzmann Machine
Quantum annealers do not provide identical answers every time they go through an annealing cycle. For some applications it would be ideal if, for example, they always returned the lowest energy configuration, but instead they produce a distribution of states. In principle, these states are Boltzmann-distributed with a characteristic temperature related to the physical device temperature. In practice, the actual distribution of states deviates from a Boltzmann distribution (on the D-Wave 2000Q, for example, it is colder and tends to produce too many low-energy states). However, with some post-processing the sample distribution may be converted into a Boltzmann distribution. It may be also anecdotally observe that while the sampled distribution is not Boltzmann-distributed, simply applying the parameter update equations derived under the assumption of sampling from a Boltzmann distribution will generally allow the model to converge anyway. Taken together, these observations mean that quantum annealers may also be used as sampling engines to fuel certain classes of machine learning algorithms. Restricted Boltzmann Machines (RBMs) map well to modern quantum annealers for this purpose. They feature a bipartite connectivity graph that scales in embedding algorithms well compared to a fully connected graph. The tunable couplings between qubits function as graph connection weights and the annealing process natually samples from the graph configurations with clamped or unclamped values for the visible nodes in the graph as needed by the application.
RBMs are fundamentally generative models that approximate a target distribution over an array of visible binary variables ( v) as the marginal distribution of a bipartite graph that connects to a different set of hidden binary variables ( h). The distribution is described by for some parameter (bias) vectors b, c, and a connections weight matrix W . RBMs are trained by maximizing the log-likelihood of a data distribution by updating the bias and weights parameters. With the loss (L) defined as the negative log-likelihood , the derivatives for the model parameters are These derivatives form a gradient for use in gradient descent to adjustment b, c, and W . The expectations are computed over the data (the training set) with clamped values and over the model with unclamped values. These steps are also referred to as the positive and negative phases. See [60] for a particularly clear explanation.
While computing the expectations over the data is easy, computing the expectations over the model is costly, as that scales like 2 min nv,n h , with n v and n h equal to the number of visible and hidden units, respectively. There are a number of mitigation strategies to avoid this difficult computation, all discussed in [61]. Of particular relevance here, the expectations for a given set of model parameters using unclamped variables on a D-Wave can be computed, where each computed configuration is sample from the machine's output distribution. For small graphs this approach is impractical but it may eventually offer some computational advantage for very large graphs.
In practice, the distribution of states returned by the D-Wave 2000Q is not Boltzmann distributed, and significant post-processing is required to achieve a Boltzmann distribution. As observed in [61], the D-Wave offers essentially no sampling advantage over random string initial states if using only Boltzmann distributions for the optimization. However, it has been observed that RBMs may be optimized with imperfect gradients [62]. Therefore, it is possible to greatly reduce the amount of required post-processing and still train effective models.
For the task of galaxy morphology classification, in [61] it was observed that RBMs, regardless of the training methods, were less effective than gradient boosted trees (likely the best classical algorithm for structured data like the compressed galaxy images). Additionally, the best classical methods for discriminative training outperformed the quantum, generative training. However, regardless of training strategy RBMs offered a performance advantage for very small datasets that gradient boosted trees and logistic regression tended to badly overfit. Furthermore, early in the small dataset training runs, the quantum generative training outperformed the classical discriminative training.

Quantum Circuit Applications
As introduced in Section 3.1, circuits with varying parameters can be optimized to perform a specific task, classification for example. The parameters of these circuits can be determined with gradient-based optimization method. The following papers are following this approach for HEP specific classification tasks.

Quantum Graph Neural Networks for particle track reconstruction
Quantum computers promise to greatly speed-up search in large parameter spaces. Charge particle tracking -tracking in short -is the task of associating sparse detector measurements (a.k.a "hits") to the particle trajectory they belong to. Tracking is the cornerstone of event reconstruction in particle physics. Because of their ability to evaluate a very large number of states simultaneously, they may play an important role in the future of track reconstruction in particle physics experiments. Reconstructing particle trajectories with high accuracy will be one of the major challenges in the HL-LHC experiments [63]. Increase in the expected number of simultaneous collisions and the high detector occupancy will make tracking extremely demanding in terms of computing resources. State-of-the-art algorithms rely, today, on a Kalman filterbased approach: they are robust and provide good physics performance, however they are expected to scale worse than quadratically with the increasing number of simultaneous collisions [63]. The high energy physics community is investigating several possibilities to speed-up this process [65][66][67] including deep learning-based techniques. For instance, introducing an image-based interpretation of the detector data and using convolutional neural networks can lead to high-accuracy results [68]. At the same time, a representation based on space-points arranged in connected graphs could have an advantage given high dimensionality and sparsity of the tracking data. The HEPtrkX project [68] followed this approach and successfully developed a set of Graph Neural Networks (GNN) to perform hits and segments classification. In this approach, graphs of connected hits are built, features of the graph nodes and edges are computed and, finally, relevant hit connections are predicted. The dataset, designed for the TrackML challenge [69] contains precise locations of hits, and the corresponding particles. The classical GNN architecture consists of three networks organised in cascade: an input network encodes the hits information as node features, an edge network outputs edge features, using the start and end nodes, and a node network, that calculate hidden nodes features taking into account all connected nodes on the previous and next layers. The edge and node networks are applied iteratively after the input network (see [70] for more details). The work in [64] represents an exploratory look at this GNN architecture from a quantum computing perspective: it re-implements the input, edge and node networks as quantum circuits. In particular, the edge and node networks are implemented as tree tensor networks (TTN) -hierarchical quantum classifiers originally designed to represent quantum many body states described as high-order tensors [71]. The data points are encoded (see Figure 5) as parameters of R y rotation gates R y (θ) |0 = cos(θ/2) |0 + sin(θ/2) |1 (12) The TTN network consists of R y rotations and CNOT gates (see Figure 5) and its output is the measurement from a single qubit . The TTN has 11 parameters which are the angles of rotations in Y direction on the Bloch sphere. These parameters are optimized using the ADAM optimiser and a binary cross entropy loss function using Pennylane [43] and Tensorflow [72]. The model is trained on 1450 subgraphs extracted from the TrackML dataset. Although preliminary, the obtained performance (see Figure 4) is promising: the validation losses decrease smoothly and the accuracy increases with the number of iterations. At convergence, the accuracy value is still lower than for the classical case. This is however expected as the number of hidden features, and iteration are reduced compared to the GNN, because of computation issues.

Classification Using Variational Quantum Circuits
The method used in [73] and [74] is based on variational quantum algorithms for machine learning (called VQML hereafter). The VQML approach exploits the mapping of input data to an exponentially large quantum state space to enhance the ability to find an optimal solution. The data encoding circuit U Φ( x) maps the data x ∈ Ω to the quantum state |Φ( x) = U Φ( x) |0 . The quantum state with encoded input data is processed by applying quantum gates to create an ansatz state, which is then measured to produce the output. The variational quantum circuit W ( θ) parameterized by θ is applied [75] The probability of outcome y is obtained through whereas {M y } is the binary measurement. The optimization process consists in learning θ to minimize the loss quantified as a difference between the predicted p y ( x) and Figure 5. Tree Tensor Network representation of the Quantum Edge Network [64] the known classification label y. Different optimizers, such as COBYLA [76] and SPSA [77,78], can be applied.
In [73], the authors made some promising progress by obtaining preliminary results in the application of IBM quantum simulators and IBM Q quantum computer to ttH (Higgs coupling to top quark pairs) data analysis. The authors have measured the AUC (area under the ROC curve) with different numbers of events in the training dataset. With 5 qubits and 800 events, the VQML have obtained very close performance to the one obtained using the classical machine learning method BDT (see Figure 6). A preliminary test was to perform VQML on the IBM Q quantum computer with 5 qubits, 100 training events and 100 test events. Within the limited testing iterations, the performance of the IBM Q quantum computer is compatible with the one from the quantum simulator, which reaches a performance similar to the BDT method with enough iterations (see Figure 7).
In [74], the authors have attempted to use the VQML algorithm for the classification of a new physics signal predicted in a theory called Supersymmetry. Two implementations of the VQML algorithm are tested, the first one called Quantum Circuit Learning (QCL) [79], which is used with the Qulacs simulator [80], and the second called Variational Quantum Classification (VQC) [75], which is used with the QASM simulator and real quantum computing devices. The QCL (VQC) uses the combination of R Y and R Z (Hadamard and R Z ) gates for encoding the input data. For the creation of an ansatz state, the combination of an entangling gate and single-qubit rotation gates are used for both implementations. The QCL uses the time-evolution gate e −iHt with the Hamiltonian H of an Ising model with random coefficients as an entangling gate while the VQC uses the Hadamard and CNOT gates for that. The rotation angles used to create the ansatz are parameters to be tuned, and the number of parameters is chosen to   The experimental test of the quantum algorithm is performed in [74] with the SUSY data set in the UC Irvine Machine Learning Repositiory [81] using cloud Linux servers for the QCL and a local machine and the IBM Q quantum computer for the VQC. The performance of the quantum algorithm is compared with BDT and DNN optimized to avoid over-training at each training set. The QCL performance is relatively flat in the training size (see Figure 8)while the performance of the BDT and DNN improves with the size. The computational resource needed to simulate QCL with 10,000 events or more is beyond the capacity used in [74]. According to these simulation studies, the three algorithms appear to have a comparable discriminating power when restricting the training set to be less than ∼ 10, 000 events, with an indication that the quantum algorithm might have an advantage with a small sample of O(100) events. Figure 8 shows ROC curves obtained using the 3-variable VQC algorithm on the QASM simulator with different numbers of events in the training set. The over-training is clearly visible if  the training set contains only 40 events while it is largely gone when the training set is increased to 1,000. The small sample of 40 events is used to train the VQC model with IBM Q quantum computers as well. The AUC values from the QASM simulator and quantum computers are given in table 1. The results from the quantum computers appear to be slightly worse than those from the simulator, though they are consistent within the uncertainties (defined as the standard deviations of five measurements). The authors of [74] conclude that the variational quantum circuit can learn the properties of the input data with real quantum device, acquiring classification power for physics events of interest.

Applications Coming Soon
An interesting line of research concerns generative models, such as Boltzmann Machines, Variational Auto-Encoders and Generative Adversarial Networks, and their quantum counterparts. Classical generative models are being investigated by the HEP community as solutions to speedup Monte-Carlo simulation, because of their ability to model complex probability distributions, and the relative lower computation cost during the prediction phase. Training those models is, however, a difficult task, and computing intensive. Coverage is one of the major issues when training or validating generative models performance and it is related to their representational power and how it maps to the original probability distribution. From this point of view quantum generative models might show an advantage, while relieving the computational cost [82].
Quantum SVM (Support Vector Machine) is an attractive approche not fully exploited in HEP. SVM [83] is a supervised machine learning method which outputs an optimal hyperplane to categorize samples between two classes to classify data points. A quantum enhanced kernel for SVM [75] can map the input vectors to an exponential Hilbert space, which could make it easier to construct an optimal hyperplane and increase the classification performance. Additionally, to calculate the quantum enhanced kernel, the number of circuits is a function of the square of the number of input vectors, which may not be a good selection for classifying huge number of events. Multiple groups are actively exploring quantum kernel methods with gate-based quantum computers for event classification. Currently, these methods are limited by the compression required to make data compatible with modern hardware. Studying these algorithms provides new and different insights into the performance of modern computing platforms though. For example, they compute data element overlaps in Hilbert space, and the outcome state distributions are sensitive to device noise in different ways than variational algorithms like VQE or QAOA. New schemes for approaching quantum feature map in particular [39] are interesting directions.

Discussion and Outlook
When considering applications of quantum machine learning for another field such as high energy physics (HEP), the immediate question is whether we have reason to believe that quantum machine learning -for near-term or universal quantum computers -is particularly suited for this type of application. The truth is that it is simply too early to tell, and further investigation of the methods will provide the answers.
One feature of HEP data sets is that they are notoriously large. In principle, this makes quantum speed-ups attractive, as they could be crucial to analyse big amounts of data. But significant (that is, exponential) speed-ups in quantum machine learning are still controversial as to their scope [84] and in some cases, their true quantum nature [85]. They often rely on special properties of the data such as sparsity [86], or a special oracle or device that can load the data in superposition [33]. The appeal of nearterm approaches to quantum machine learning is without doubt that ideas can be easily tested on a small scale, using the rich landscape of quantum programming languages, cloud-based quantum computers, and quantum machine learning software packages. Even so, to encode large data sets into a quantum system to sufficient precision and to measure the outputs for every events in the dataset is a physical challenge that is significantly out of the scope of near-term quantum computing. Of course, in the age of Big Data, the large size of the data sets are not unique to HEP, and it needs to be further established whether the intersection discussed in this review poses any particular challenge to machine learning which would motivate the use of quantum computers.

Experimenting with Quantum Annealers
Despite continuous improvement of quantum annealers they remain noisy, with limited number of qubits, and limited connectivity.
Solver Heuristic. A huge challenge is to map the reformulated problem to an actual device with a limited connectivity [87], and it is often necessary to include connectivity constraints already into the loss itself. One alternative available in the D-Wave software stack is qbsolve [88], a heuristic that allows to split large problems in several smaller ones that in turn can be solve on the available hardware. This allows one to experiment with much larger QUBO than the one directly solvable with existing hardware, but in return requires additional computing resource. It also prevents from directly probing the sole capability of the device.
Digital Device. Digital annealers [89] offer the potential to prototype algorithms with large numbers of digital qubits. Using custom ASICs, digital annealers are capable of simulating fully-connected quantum annealers with 4,096 qubits (with 64 bit precision) or as many as 8,192 qubits (with 16 bit precision). In principle, a digital annealer cluster could offer up to 1,000,000 qubits using multi-chip support. While in the very long-run fully quantum annealers should be able to overtake digital simulators, in the near-term, these machines are exciting application test-beds and may even be able to deliver competitive results.

Experimenting with Quantum Circuits
Applying quantum algorithms on quantum hardware is the core aim at any research on quantum computing. But the scale of even state-of-the-art studies quickly reveals the limitations of current-day hardware. Typical implementations use only a few qubits and datasets of 4 features (for example, [47,90,91]). The limited number of qubits, connectivity and short decoherence time of the current quantum hardware make it difficult to experiment with large and long variational circuits.
Circuit Architecture. In the papers reviewed above, the quantum circuit architecture is fixed (type of gates) and only the parameters of the gates are optimized. In combination of this approach, search for optimal gate assembly is also possible. In [92], reinforcement learning is used to derive circuits to solve combinatorial problems. This technique might provide further handle at developing well performing quantum machine learning models.
Error Mitigation. Practically, circuit-based qubit devices allow only a few gates to be performed before a signal is drowned in noise. The fidelity of measurements on quantum device can be improved via error mitigation strategies [93]. Various techniques allow to experiment with an increased number of gates or better connectivity of qubits. In addition to techniques making explicit assumption on the form and origin of the noise, machine learning approaches can be used to learn directly from the device-dependent noise. The integration of such noise-modelling-cancelling technique of circuit compiler would help with experimenting on quantum device, at the cost of increased resources.
Circuit Simulation. Prototyping quantum algorithm with classical simulator is an important step for the development and testing of quantum algorithm. The classical simulator used for the VQML study in [74] has enabled the authors to test the QCL algorithm with up to seven variables or ∼ 10, 000 events for the training set size. The simulation time and memory usage increases exponentially with the number of input variables in the creation of variational quantum states with W ( θ).Despite continuous improvement on simulator [80], the experimentation with circuits with large number of qubits is still hampered by this computation requirement. On the other hand, it is expected that simulation of a quantum system with a classical algorithm is hard, and by definition would be exponentially faster when running on a quantum device.
Optimization in Quantum Machine Learning. There are two types of optimizer, the gradient-based and the derivative-free optimizer. For some derivative-free optimizer, it may require a lot of iterations to achieve a good tranining performance as the number of variational parameters increases. For the gradient-based optimizer, the number of iterations required may be less. However, to calculate the gradient is another difficulty [94] and numerical differentiation requires the circuit to be ran additional times as the number of variational parameters increases. Changing a circuit parameter for the evaluation of gradients through a cloud-based service can take of the order of many seconds, which quickly makes optimization of even a small system a matter of hours and days.
Data Ingestion. The required time to encode an attribute into a quantum system and the output time to measure a quantum state is still much larger than the real time spend on the quantum device. For HEP classification that requires a lot of encoding and measuring, further technological improvements, such as quantum memory, might be necessary to cope with big dataset.

Quantum data
Data generated by quantum systems is known as quantum data in the quantum machine learning literature [17,95]. There are two types of quantum data: data stored in a classical memory, but describing the result of a quantum experiment, as well as data stored in a quantum system as a coherent quantum state. Quantum data, for example from axion dark matter experiments [96], may offer the most promising longterm application of quantum machine learning. The question of how to exploit the quantum nature of the systems generating HEP data has not been prominent in the literature.
Quantum Measurement Data. Possible applications would have to establish a way in which measurement results -which are of course represented as classical information in a classical computer memory -can be fed back into quantum systems to be analysed by quantum machine learning. Whether the intermediate classical stage of the data still allows for a quantum advantage is an unresolved problem in quantum machine learning.
Coherent Quantum Data. There are two interesting outlooks. The first is to do quantum machine learning directly on the quantum objects measured in HEP. As an example, instead of processing the classical signal formed in photonic sensors, one could direct the photons into a photonic quantum computer and apply a variational circuit before conducting the final measurement. The circuit could be trained to extract important information from the quantum state, or to classify the state.
The second path follows the idea of quantum simulations [97], an important use of quantum computers in simulating complex quantum systems to determine their properties. If a HEP experiment could be simulated on a quantum computer [98], the simulation could be followed by a quantum machine learning routine executed on the very same device, and analysing the quantum states produced by the simulation.
Instead of costly state tomography to characterise the results, the wave function is directly accessed and important information extracted.
In both cases, an important insight from quantum machine learning -possibly the one with the highest future impact on other quantum disciplines -is the ability to differentiate through quantum computations. This includes wealth of knowledge and practical methods to get partial derivatives of a measurement result with respect to (classical) physical parameters of the experiment, such as a magnetic field strength or pulse length. Quantum differentiation opens a door to design experiments by adaptively optimizing some cost function, which is crucial for quantum data analysis.

Concluding Remarks
Overall, we are just at the beginning of exploring the intersection between quantum machine learning and high energy physics. The papers presented in this review therefore have to be understood as exploratory studies that propose angles to approach the problem of how to use the variety of quantum machine learning algorithms to understand fundamental particles.
We presented papers on performing classification using quantum machine learning with quantum annealing, restrictive Boltzmann machines, quantum graph networks and variational quantum circuits. The capacity of quantum annealers to perform classification is limited due to the restrictive formulation of the problem. Quantumcircuit-based machine learning is yet of limited performance due to the necessary downscaling of the problems, so as to fit on the quantum device, or to be amenable in simulation.
As an outlook, we discussed practical considerations of experimenting with quantum machine learning and the prospect of analysing quantum data. These challenges put quantum machine learning into a particularly difficult spot. The quality of a machine learning algorithm is usually estimated through empirical benchmarks on pseudorealistic datasets. Evidence from deep learning suggests that machine learning on big data behaves very differently from the small-data regime. And while consistently improving, the theory of machine learning is currently unable to explain the performance of algorithms such as neural networks. The challenges for practical experiments as well as fundamental limits of classical simulations restrict quantum machine learning benchmarks to small proof-of-principle investigations that may only say very little about their performance in realistic settings.
While technology is developing, more theory is therefore needed to understand the power of near-term quantum machine learning. While the current performance of quantum machine learning on high energy physics data is limited, there is hope that future advances on both quantum devices and quantum algorithms will help with the computation challenges of particle physics.