Predicting quantum advantage by quantum walk with convolutional neural networks

Quantum walks are at the heart of modern quantum technologies. They allow to deal with quantum transport phenomena and are an advanced tool for constructing novel quantum algorithms. Quantum walks on graphs are fundamentally different from classical random walks analogs, in particular, they walk faster than classical ones on certain graphs, enabling in these cases quantum algorithmic applications and quantum-enhanced energy transfer. However, little is known about the possible advantages on arbitrary graphs not having explicit symmetries. For these graphs one would need to perform simulations of classical and quantum walk dynamics to check if the speedup occurs, which could take a long computational time. Here we present a new approach for the solution of the quantum speedup problem, which is based on a machine learning algorithm that predicts the quantum advantage by just looking at a graph. The convolutional neural network, which we designed specifically to learn from graphs, observes simulated examples and learns complex features of graphs that lead to a quantum advantage, allowing to identify graphs that exhibit quantum advantage without performing any quantum walk or random walk simulations. The performance of our approach is evaluated for line and random graphs, where classification was always better than random guess even for the most challenging cases. Our findings pave the way to an automated elaboration of novel large-scale quantum circuits utilizing quantum walk based algorithms, and to simulating high-efficiency energy transfer in biophotonics and material science.

Quantum walks are at the heart of modern quantum technologies. They allow to deal with quantum transport phenomena and are an advanced tool for constructing novel quantum algorithms. Quantum walks on graphs are fundamentally different from classical random walks analogs, in particular, they walk faster than classical ones on certain graphs, enabling in these cases quantum algorithmic applications and quantum-enhanced energy transfer. However, little is known about the possible advantages on arbitrary graphs not having explicit symmetries. For these graphs one would need to perform simulations of classical and quantum walk dynamics to check if the speedup occurs, which could take a long computational time. Here we present a new approach for the solution of the quantum speedup problem, which is based on a machine learning algorithm that predicts the quantum advantage by just "looking" at a graph. The convolutional neural network, which we designed specifically to learn from graphs, observes simulated examples and learns complex features of graphs that lead to a quantum advantage, allowing to identify graphs that exhibit quantum advantage without performing any quantum walk or random walk simulations. The performance of our approach is evaluated for line and random graphs, where classification was always better than random guess even for the most challenging cases. Our findings pave the way to an automated elaboration of novel large-scale quantum circuits utilizing quantum walk based algorithms, and to simulating high-efficiency energy transfer in biophotonics and material science.
Computational speedup is one of the keystone problems both in classical and quantum computer sciences [1,2]. Although quantum parallelism, in general, represents necessary ingredient for an acceleration of computational algorithms on quantum "hardware", sufficient criterion is still unknown in many cases. Strictly speaking, speedup problem might be recognized for certain computational tasks for which definite classical and/or quantum algorithms are used, cf. [3,4]. In the paper we attack the speedup problem with random and quantum walks in a quite general form by using advanced machine learning approaches.
It is, however, not clear on which graphs and for which target vertices quantum advantage will be present. Given a graph, a standard approach would be to simulate quantum and classical dynamics, and observe in which case a particle would reach a target vertex faster. However, this approach has several limitations. First, although the propagation time scales polynomially in the size of the graph [34,35], the simulations can become computationally difficult for large graphs. Second, one is usually interested in a set of graphs rather than in a single graph, which makes the simulations limited by the set size. Third, results of the simulations do not reveal any pattern, or general laws, of quantum advantage in stochastic propagation. Given the results of the simulations on one set of graphs, it is not clear how particles propagate on another set of graphs. To overcome these limitations, one can follow a different approach of theoretically investigating the walk dynamics and obtaining analytical results. This was done, e.g., for line [36], cycle [37][38][39], hypercube [40,41], complete [42], and glued trees graphs [43], for certain target vertices. Knowing that one of these graph types is embedded in a larger graph structure can also give information about possible quantum speedups [44,45]. This analytical approach is, however, limited to known cases, which is only a tiny subset of all labeled connected graphs [46].
Combining the two described approaches to predict quantum advantage is potentially possible. An expert, looking at a graph, might recognize a known structure (e.g., a hypercube), and with a help of a limited number of simulations, draw a conclusion about a possibility of a quantum speedup. But an expert cannot possibly analyze a large number of graphs. Could a machine be able to do that and predict quantum speedups? We answer this question in the affirmative and extend the list arXiv:1901.10632v2 [quant-ph] 13 Dec 2019 of new machine learning techniques successfully applied in physics [47][48][49][50][51][52][53][54][55][56][57][58].
In this paper we take a supervised learning approach to predict a quantum speedup: a convolutional neural network (CNN) [59,60] learns from examples to recognize quantum speedup. CNNs are widely used for image classification [61], visual document analysis [62], face recognition [63], and video classification [64]. Here we use a CNN for graph features extraction and learning the most relevant features, which we apply to a classification problem defined within the quantum walk framework. CNNs were recently used with graph adjacently matrix input for predicting clinical neurodevelopmental outcomes from brain networks [65] and for classifying and predicting the presence of super-diffusion in multiplex networks [66].
Our results of using the CNN of special architecture, which we call classical-quantum convolutional neural network (CQCNN), demonstrate that the network can to represent the quantum speedup by quantum walk. CQCNN is able to generalize and correctly predict quantum speedup for unseen line graphs and random graphs with up to 25 vertices. The quantitative classification results differ depending on the type of a graph, on the type and quantity of training examples, and on the number of training epochs. Independent of the scenario and the difficulty, however, we observe that the CQCNN is better than a random guess. Importantly, we show that it is possible to extract the logic behind a classifier function constructed by the neural network, which lets us understand and verify how the classification works on small graphs.
We believe that the proposed learning model will be of a particular significance for physical implementations of quantum-enhanced transport systems. A physical implementation of quantum walks is not unique: it depends on measurement procedure properties, as well as on particular properties of the physical system including experimental imperfections 1 . In the case of such a quantum experiment, only a limited number of data points can be realistically obtained, which will make the proposed autonomous learning algorithm essential for successful implementation of the quantum-enhanced transport systems.

RESULTS
Quantum and classical random walk processes have different dynamics, which leads to a difference in how fast particles traverse graphs from an initial vertex to a target 1 Current photonic technologies [67] represent versatile platform for experimental studies of bosonic quantum walks and speedup prediction on the graphs with desirable sizes and topology [68,69]. Moreover, a design of CMOS-compatible large scale quantum photonic devices gives hope to a realization of quantum walks based algorithms in nearest future [70]. is the initial vertex, whereas "2" is the target vertex. (a) Inequivalent line graphs with three vertices are depicted in three different colors (blue, green, and gray). The graphs G q and G c graphs are modifications of the graphs G that take into account different aspects of the physical implementation of quantum and classical walks, respectively. (b) The quantum (solid) and the classical (dashed) walk dynamics on three different line graphs are shown. The black line at the value of 1/ log 3 ≈ 0.91 is the probability threshold at which particle is considered to be detected.
vertex. This difference depends not only on the nature of the particles, but also on the graph on which the particles walk. Importantly, the graph is specified not only by the way vertices are connected, but also by the positions of the initial and the target vertices. It is known that, e.g., quantum particles on line graphs reach target vertices on distance d quadratically faster in d [36]. But if initial and target vertex are not far from each other, it is not easy to determine which particle is faster. To give an instructive example, let us consider line graphs, as random walks on lines are one of the simplest and most extensively studied stochastic processes [5]. In the case of three vertices, there are three inequivalent graphs G shown in the first row of Fig. 1(a). Complementary to graphs G, two additional rows of graphs are depicted: G q and G c . These graphs are modifications of G, and correspond to the physical implementation of G for quantum (G q ) and classical (G c ) walks. In the classical case, the target vertex is connected to the neighboring vertices by directed edges. In the quantum case, the sink vertex 4 connected to the target vertex is used to measure the quantum particle, the rest of the graph is unchanged. The measurement process hence changes the dynamics The neural network takes a labeled graph in form of an adjacency matrix as an input. This input is then processed by convolutional layers with graph-specific "edge-to-edge" and "edge-to-vertex" filters (see Methods). The convolutional layers are connected with fully-connected layers that finally classify the input graph. The number of layers is the same for all graph sizes. Data and error propagation are shown with arrows. of the quantum system. Figure 1(b) represents the results on quantum (solid lines) and classical (dashed lines) random walk simulations for all three graphs (blue, green, and gray). We can see that in two cases the classical walker is faster than the quantum one (green and gray cases), and the quantum particle is faster in one case (blue). From this toy example it is clear that the quantum transport speedup is only present in case of the initial and the target vertices being on opposite sites of the graph; and the classical particles are faster if these two vertices are directly connected.
We next describe how the neural network, CQCNN, can learn this for larger graphs and show the results of the learning processes. The learning setup that we use in the paper is depicted in Fig. 2. Fig. 2(a) shows schematically how CQCNN is trained using examples of graphs. CQCNN at each step takes a graph as an input in the form of an adjacency matrix, and outputs a prediction about the class this graph belongs to (quantum or classical). Having a correct label, the loss value is computed. Fig. 2(b) depicts the testing procedure. The difference from the training process is that CQCNN does not receive any feedback on its prediction. In the testing process the network is not modified. The neural network architecture is shown in Fig. 2(c). CQCNN has a layout with convolutional and fully connected layers, and two output neurons that specify two possible output classes. The convolutional layers are used to extract features from graphs, and decrease the dimensionality of the input. By trying different approaches, we observed that relevant features are not in the small local blocks of the adjacency matrices, but in the rows and columns of these matrices. We therefore constructed filters in the form of "crosses" shown in Fig. 2(c) to capture a weighted sum of column and row elements. These filters act as functions of a weighted total number of neighboring vertices of each vertex. As we will show next, the cross "edge-to-edge" and "edge-to-vertex" filters demonstrate that the convolutional network can predict the quantum advantage by quantum walk.

Predicting quantum advantage for line graphs
We apply the described machine learning methodology to different sets of graphs. In order to understand how our approach works in a systematic way, we first analyze the neural network performance on line graphs. We take the simplest design of CQCNN in Fig. 2 Fig. 3(a) demonstrate the results of training the neural network on line graphs; each color corresponds to a specific size of a graph with n = 4, 5, 6, 7 vertices. For the simulations we used datasets with all possible line graph labeling: 90% of which is used to train (dashed lines) CQCNN, and 10% are used to test (solid lines) its generalization capabilities. The performance of CQCNN on the training graphs is defined by the cross entropy loss function. The loss on a test example i is defined relative to the correct class class i (classical or quantum, 0 or 1) of this example: where κ(class i ) is the total fraction of examples from this class in the dataset, x(0) and x(1) are the values of the output neurons. In Fig. 3 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2  over seen graphs to unseen graphs, as the classification accuracy 2 goes up (solid curves).
Our results in Fig. 3(a) demonstrate that it is possible for CQCNN to learn a function that maps graphs to their quantum walk properties. In order to understand the predictive capacity of CQCNN, we analyze the weights of the fully connected layer of the simple CQCNN employed for this classification problem. These weights are visualized in Fig. 3(b) as bars for different number n = 4, 5, 6 and 7 of vertices. In total, one can see 29 bars corresponding to 29 weights in the last layer of the neural network. These weights form a feature vector which we divide into 7 parts, each corresponding to a specific vertex of the graph and labeled as "vertex label" in Fig. 3(b). 2 Classification accuracy is the fraction of correct predictions.
Each vertex has 4 features labeled as 1 − 4, the zero-th component of the feature vector is the bias. All these features are outputs of the convolutional layers, and their values have, up to some learned coefficients, the following meaning. The first feature for each vertex corresponds to the number of edges this vertex has, the second feature -to the total number of neighboring edges of all edges leading to the vertex. The third feature gives one if the vertex is connected to the initial vertex by an edge, and zero otherwise. The fourth feature does the same relative to the target vertex. Note that some weights are learned to be zero for vertices that are not present in smaller graphs.
By looking at the weights of CQCNN, we observe that the designed neural network learned several properties of the quantum advantage on these line graphs. First, we observe that the contributions to the quantum (blue) An example of two graphs from the test set which were correctly classified by the neural network. On graph (b) the classical particle is faster, whereas on the graph (c) the quantum particle is faster. The initial and the target vertices are marked in yellow and red, respectively. and classical (red) classes are symmetric: whatever is a positive indication of the quantum class -it is a negative indication of the classical class. Second, the weights are different for different vertices, and this difference explains the classification outcome as we describe next. The graph shows no quantum advantage if the initial vertex is connected to the target vertex (the feature 4 for the vertex 1, and the feature 3 for the vertex 2). It is also discouraged if the target vertex is well connected to the rest of the graph (the features 1 and 2 for the vertex 2). And, although the weights of the other features do not strongly define the role of these features, the more connected these vertices -the better for the quantum speedup.
The landscape of weights changes when the graph size grows (growing n in Fig. 3(b)), but not drastically. The described correlations hold for all studied graph sizes. In addition to this consistency, we see that the deviation of weights from their average is quite small -all 100 CQCNNs learned very similar weights. By looking at vertices 3, 4, 5, 6, and 7, we observe that the weights are almost identical: all these vertices contribute identically to the classification. Indeed, as it turns out, the dynamics of particles is invariant under relabelling of the vertices apart from the initial and the target vertices. Hence CQCNN autonomously realized that many graph examples are isomorphic.
Learning all these graph properties helps the network to correctly classify graphs of the same size which were not seen previously. CQCNN can go a step further, and apply the learned data representation to graphs of larger sizes. This can be seen in Fig. 3(c) where the training is done on line graphs with n = 4 and 5, but tested on graphs with n = 6 − 10 vertices. The classification accuracy on larger graph sizes is between 60 and 85%, which is significantly better than a random guess. Note that the generalization performance is not 1, as we observed that for different graph sizes there are always new cases that are not derived from the smaller graphs. It is also the case that generalization performance goes down with n relatively fast, hence suggesting that at least more training examples will be needed for graphs of larger size. Importantly, as proven by the results in Fig. 3(c), the CQCNN approach is flexible to a change of graph sizes without being trained on all sizes.
Predicting quantum advantage for random graphs CQCNN was shown to be able to classify line graphs. Next, we estimate how well the presented methodology works on other graphs. In general, the more symmetries the graph has -the better we would expect CQCNN's performance is, as there are more ways to learn graph properties from examples. For this reason, random graphs should be one of the most challenging sets for our method. Especially for random graphs, we do not expect that training examples generalize well to test examples as both sets could be very independent. Even given enough training examples, we expect there always will be graphs that do not share common properties with any other graph.
We simulated CQCNN's learning process for random graphs, each sampled uniformly from the set of all possible graphs with n vertices and m edges. The learning performance results are shown in Fig. 4 for n = 15, 20, and 25, m is chosen uniformly from n − 1 to (n 2 − n)/2. In our simulations we observe that the loss after training is close to zero (below 3 × 10 −3 ) for all these random graphs. In Fig. 4(a) we see that both recall and precision 3 are 90% for the "classical" part of the set, and is in the range of 25 − 35% for the "quantum" part of the set. Overall, we see that our method made it possible to classify random graphs correctly much better than a random guess 4 without performing any quantum walk dynamics simulations. Examples of correctly classified graphs are shown in Fig. 4(b)-(c).

DISCUSSION
Recently speedup problem extensively has been discussed in the framework of quantum computation purposed to accelerate the solution of familiar optimization problems by using quantum hardware [71,72]. However, predicting a quantum speedup in this hardware represents a complex problem that depends on many physical parameters including size and topology of the system [73][74][75][76]. In this paper we proposed a new machine learning method to predict a speedup of quantum transport. This method is based on training a discriminative classifier, that is, a specially designed convolutional neural network (CQCNN). We have generated the training examples, each consisting of an adjacency matrix and a corresponding label ("classical" or "quantum"), by simulating the random walk dynamics of classical and quantum particles. The generated examples were used to train CQCNN with a stochastic gradient descent algorithm.
By training CQCNNs we demonstrated in Fig. 3 that the neural network is able to learn to classify the quantum speedup, and to match the results obtained by our simulations. First, CQCNN learns to approximate given examples very well by representing the quantum and classical properties of graphs in its weights: CQCNN compresses up to 2268 adjacency matrices with 49 entries each 5 into just 29 real parameters. Second, CQCNN automatically learns what graph features are important for quantum speedup. We identified that for line graphs these correlations correspond to well-explainable graph properties. Additionally, the neural network learns that many graphs are isomorphic, with no indication of overfitting on adjacency matrix features. Third, we demonstrated good generalization capacity of the constructed CNN. The neural network was correctly classifying not only previously unseen graphs of the same size, but also of sizes that were never given to train the network. For the line graphs of the same size the average accuracy was shown to be above 90%, and 60 − 85% in the case of the larger graph sizes. We believe that this performance is strong as we know that test examples do not necessarily share any structural similarities with training examples.
Finally, the presented approach was applied to random graphs with up to 25 vertices. Although the space of possible labeled graphs is more than 2 200 graphs (see Ref. [46] for 25 vertices), with only 1000 randomly generated training examples we proved that it is possible to significantly improve over the random guess. We, however, believe that this classification performance can be 5 Which is the case for line graphs with 7 vertices as the training set consisted of 90% of 7!/2 the total number of line graphs, see Fig. 3(b) for n = 7.
further improved by using more training examples and by optimizing over CQCCN's hyperparameters. The presented machine learning methodology can be used to find novel topologically large-scale graphs and circuits which exhibit maximal quantum speedup. At the same time our results might be specifically important in material science and biophotonics for a deeper understanding and designing of novel materials with unique quantum transport properties.

METHODS
In this section we give additional details on the machine learning methodology and the learning methods.

Quantum walks on graphs
In the following, we describe the quantum walk dynamics on graphs, and give more details on simulations that were performed in this paper.
We consider n × n adjacency matrices A ∈ A that describe undirected connected graphs G ∈ G with n vertices on which classical and quantum walks are simulated. A graph G is specified by the set of vertices V = {1, . . . , n} and the set of edges E. All edges (v, u) ∈ E are described by a pair of vertices v, u ∈ V. As the graphs G that we consider are undirected, (u, v) = (v, u) and all matrices A are symmetric: A ij = A ji . Without the loss of generality, we label the vertices v i = 1 and v t = 2 as the "initial" and the "target" vertices. Given an adjacency matrix A, we simulate classical and quantum continuous-time walks during the time t max , which depends on the probability of detecting a particle. The results of the simulations are classical and quantum dependencies of the probability of detecting a particle in v t at time t ≤ t max . From these two dynamics we obtain the information about the time particle is in v t with threshold probability p th = 1/ log n. Given the two time values we can predict if there exists some quantum advantage of using a quantum particle for reaching v t on a given graph.
The classical continuous-time random walk (CTRW) is simulated by solving the following differential equation where p(t) is a vector of probabilities p v (t) of detecting a classical particle in vertices v ∈ V of the graph; I is the identity matrix of size n × n. The transition matrix T is a matrix of probabilities T vu for a particle to jump from u to v. As we would like to "catch" the particle in v t , the edges (v, v t ) that lead to v t are made directed. This modification is implemented by introducing a new adjacency matrix A c which is equal to A apart from the column n: A c nv = 0, ∀v ∈ V \ n, and A c nn = 1. The transition matrix can be obtained from the corresponding adjacency matrix A c by dividing all entries in a v-th column of A c by the in-degree of the vertex v, for all v ∈ V. This introduced modification of A effectively makes the underlying graph G c directed such that a classical particle cannot escape v t once it is there.
The solution of the differential equation in Eq. (2) is where p(0) = (1, 0, . . . , 0) T is a probability vector corresponding to a classical particle initially located in v = 1. The dynamics in Eq. (3) is known as node-centric CTRW [77,78]. Node-centric CTRWs have a property that a particle leaves a vertex v with the same rate for all vertices u ∈ V. In the considered case the trajectories are statistically the same as those of the discrete-time random walk (DTRW), hense the dynamics of p(t) in Eq.
(2) can be viewed as a "continuization" of the DTRW dymanics. The continuous-time quantum walk (CTQW) dynamics is simulated by solving the Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) equation with the Hamiltonian H = A q . A q is an adjacency matrix of size (n + 1) × (n + 1) and is equal to A apart from adding an (n + 1)-th row and an (n + 1)-th column of zeros: The new A q matrix corresponds to a graph G q with an additional "sink" vertex v sink = n + 1. This sink vertex serves as an auxilary vertex where a quantum particle is kept captured once it ends there. The only way the particle can end there is by decaying from v t , this process is mathematically taken care of by the operator L = |n + 1 n|. Physically, L introduces incoherence in the unitary CTQW dynamics described by H, by moving the quantum particle from v t to v sink with rate γ. In general, the rate γ dramatically influences the CTQW dynamics: if γ = 0 -the dynamics is coherent and we will never observe the particle in v sink , if the value of γ is large (e.g., 10 5 ) -we might never observe the particle in v sink 6 . Because there is no universally best value for the γ parameter for all graphs G q , we use γ = 1 throughout the paper.
We solve to the GKSL equation numerically with the initial condition ρ(0) = |1 1| and observe the dynamics of ρ (n+1)(n+1) (t) that is equal to the population in v sink at time t. The function ρ (n+1)(n+1) (t) is a positive and an increasing function of time. Note that, opposite to the case of the CTRW, in the CTQW the probability of detecting the particle does not necessarily go to one with time.
We next compare p q (t) ≡ ρ (n+1)(n+1) (t) and p c (t) ≡ p n (t) against p th . The time at which p q (t) > p th or 6 This effect is known as the Zero effect, a vertex is measured to frequently so the particle never appears there. p c (t) > p th is called the hitting time for quantum or classical particle, respectively.

Convolutional neural network architecture
In this section we describe in detail how the convolutional neural network, which is used in this paper, is constructed.
We are using a specifically designed convolutional neural network, CQCNN, to learn from different graphs. The architecture of this neural network is shown in Fig. 2(c). CQCNN, which we specifically designed to work with graphs, consists of a two-dimensional input layer that takes one graph represented by an adjacency matrix A. This layer is connected to several convolutional layers, the number of which depends on the number of vertices n of the input graph. The first convolutional layer consists of six filters (or, feature detectors) that define three different ways of processing the input graph. These three ways are marked by different colors (green, red, blue) in Fig. 2(c). The weights and types of filters determine what specific features are detected. The first type of filters detects how well the v i vertex is connected to the rest of the graph by extracting features from the T k matrices, where k are integer numbers. The second type of convolutions detects the same, but for the v t vertex. The third filter type looks at connectivities within the graph and detects how well each vertex is connected to other vertices. These three filter types are applied in several layers together with identity filters that propagate extracted features further. These layers are followed by a filter that deletes symmetric parts of all the matrices. It is done to eliminate redundant information, as all the matrices are still symmetric after being processed by all these fixed filters. At the next layer we apply n filters of the fixed 3 × 3 size with variable parameters in order to find relations between different edges. The last layer of filters summarizes all the information about the edges in the vertices description, by that decreasing the number of neuron values to a polynomially smaller number of next layer's neuron values. The extracted features are next flattened and connected to two fully connected layers on neurons. Neurons in the first fully connected layer have a rectified linear unit (ReLU) activation function, which helps to construct a nonlinear function, and let the last layer map the learned features to 0 or 1 label (two output neurons in Fig. 2(c)).
CQCNN makes a choice between classical and quantum classes based on the values of two output neurons. The predicted class is defined as an index of a neuron with the largest output value: class = argmax m y(m).
The network learns by stochastic gradient descent algorithm that takes the cross entropy loss function in Eq. (1). The filters that we constructed in the described neural network architecture are essential to the success of learning. First, the edge-to-edge (ETE) filter allows the network to see how many neighboring edges each edge has. The process of obtaining a feature map from an input "image" using the edge-to-edge filter is shown in Fig. 5(a). Given an input matrix A the ETE filter outputs the following matrix with components: The second important filter is the edge-to-vertex (ETV) filter. This filter allows summarizing information about the edges in the vertices. The filtering procedure takes an input matrix A and outputs a vector with components: The working principle of this filter is visualized in Fig. 5(b).

ACKNOWLEDGMENT
This work was financially supported by the Government of the Russian Federation, Grant 08-08, and by RFBR grants No. 19-52-52012 MHT-a and No. 17-07-00994-a.

DATA AVAILABILITY
The developed algorithms and the generated datasets are available from the corresponding author on a reasonable request.

ADDITIONAL INFORMATION
Competing Interests: The authors declare no competing financial or non-financial interests.