Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Continuous-variable quantum key distribution (CV QKD) with discrete modulation has attracted increasing attention due to its experimental simplicity, lower-cost implementation and compatibility with classical optical communication. Correspondingly, some novel numerical methods have been proposed to analyze the security of these protocols against collective attacks, which promotes key rates over one hundred kilometers of fiber distance. However, numerical methods are limited by their calculation time and resource consumption, for which they cannot play more roles on mobile platforms in quantum networks. To improve this issue, a neural network model predicting key rates in nearly real time has been proposed previously. Here, we go further and show a neural network model combined with Bayesian optimization. This model automatically designs the best architecture of neural network computing key rates in real time. We demonstrate our model with two variants of CV QKD protocols with quaternary modulation. The results show high reliability with secure probability as high as $99.15\%-99.59\%$, considerable tightness and high efficiency with speedup of approximately $10^7$ in both cases. This inspiring model enables the real-time computation of unstructured quantum key distribution protocols' key rate more automatically and efficiently, which has met the growing needs of implementing QKD protocols on moving platforms.


I. INTRODUCTION
In recent decades, machine learning (ML) has gained impressive breakthroughs that deeply impact both industry and academia, including autonomous driving [1,2], natural language processing [3,4], protein structure prediction [5] and even proving mathematical conjectures [6]. ML aims to recognize patterns in data, especially multidimensional data, and generalize them to new instances, which contributes to automating tasks and reveals hidden patterns beyond humans intuition. This modern information-processing technology also benefits solving intractable quantum tasks, since quantum tasks are usually counterintuitive and involve high dimensions. Several significant advances have been made by applying ML to quantum physics, from classifying quantum states [7][8][9], quantum control [10][11][12] to quantum metrology [13].
Quantum key distribution (QKD) enables unconditional security between two legitimate users (Alice and Bob) against any eavesdropper called Eve [14,15], which is guaranteed by quantum mechanics laws [16]. According to different detection methods, QKD is currently divided into two categories: discrete-variable (DV) QKD [17,18] and continuous-variable (CV) QKD [19][20][21][22]. Between these two categories, CV QKD has unique edges on a higher secret key rate and excellent compatibility with standard communication components [23][24][25], which enables CV QKD to be competitive at a metropolitan distance [26]. To enhance the practicality of CV QKD, several works introduce machine learning-based methodologies to the CV QKD area, such as developing a novel CV QKD scheme [27,28], parameter prediction [29] and detecting quantum attacks [30].
CV QKD protocols with discrete modulation have attracted increasing attention for decades. Its appealing advantages include easier experimental implementation and higher error-correction efficiencies which promote CV QKD over longer distances [31][32][33][34]. These properties bring potential advantages in large-scale deployment in quantum-secured networks [35]. However, the security analysis of discrete-modulated CV QKD protocols is more complicated owing to the lack of symmetry [36]. Recently, some novel numerical approaches [37,38] have been proposed to analyze the security of discrete-modulation protocols against collective attacks, where key rate calculation involves minimizing a convex function over all eavesdropping attacks that are consistent with the experimental data. These numerical approaches achieve much higher key rates over significantly longer distances compared with previous security analyses. Based on these numerical approaches, a neural network model was presented to quickly predict the secure key rate of discrete-modulated CV QKD with high reliability (secure probability as high as 99.2%). This neural network model learns the mapping between input parameters and key rates from datasets generated by numerical methods, which supports the computation of secure key rates in real time [39]. However, the mapping complexity between input parameters and key rates depends on the solving complexity of discrete-modulated protocols' key rates through numerical approaches [40]. Selecting architectures and hyperparameters plays a critical role in the performance of a neural network. There-fore, to learn different mappings from different protocols, the architectures of neural networks and the corresponding hyperparameters should be adjusted carefully by humans, which comes at a great price [41].
Here, we propose a more flexible and automatic neural network model combined with Bayesian optimization [42], which maintains extremely high reliability and efficiency and reduces complicated manual adjustment. Our method is universal for a variety of unstructured QKD protocols that lack analytical tools and rely on numerical methods. We apply our model to two variants of discrete-modulated CV QKD protocols and acquire high secure key rates with considerable tightness in both cases. We then compare the time consumption of our model with the numerical method proposed in Ref. [37], which shows a great speedup of approximately 10 7 .
This paper is organized as follows. In section II, we introduce the numerical method for CV QKD with discrete modulation proposed in Ref. [37], and we rely on it to collect a dataset to train and test the model. In section III, we introduce more details about the Bayesian optimization used in this paper. In section IV, we demonstrate all the main results of this paper. Section V provides a discussion and concludes this paper.

II. NUMERICAL METHOD FOR CV QKD WITH DISCRETE MODULATION
In this work, we apply the model in two discretemodulated CV QKD protocols with different detection techniques to demonstrate the generalizability of our model. One is the quadrature phase-shift-keying (QPSK) heterodyne detection protocol [37], and the other is an improved QPSK homodyne detection protocol [43]. To collect a dataset for training neural networks, we generate secure key rates of both protocols by applying the same numerical method [37,40]. In the following, we briefly introduce how computing key rates can be transformed into a relevant convex objective function for numerical optimization. A more detailed description can be found in Ref. [37].
Here, we consider a CV QKD protocol with quaternary modulation that involves two parties: a sender Alice and a receiver Bob. During each time in an iteration of N rounds, Alice randomly prepares one of the four coherent states |α k = |α|e i(2kπ/4+π/4) , where k ∈ {0, 1, 2, 3}, and sends it to Bob via an untrusted quantum channel. Then, Bob uses either homodyne or heterodyne detection to estimate k. The secret key rate under collective attacks in the asymptotic limit is given by the following expression according to the Devetak-Winter formula [44] where H (Z | E) is conditional von Neumann entropy, which describes the uncertainty of the string Z in Eve's view. Eve's maximal knowledge of Bob's string Z requires the minimum uncertainty of Z under a certain density matrix ρ. Therefore, we need to find the optimum ρ * in feasible domain S to minimize H (Z | E), p pass is the sifting probability, and δ EC is the actual amount of information leakage per signal in the error-correction step.
To turn this problem into a convex optimization problem, the above expression can be reformulated as in which D(ρ σ) = Tr (ρ log 2 ρ) − Tr (ρ log 2 σ). As shown in Ref. [38], G is a completely positive and trace nonincreasing map that describes the postprocessing of different quadratures. Z is a pinching quantum channel that reads out the key information.
Since the term p pass δ EC in formula (2) is easy to compute, we can only consider the following relevant optimization problem: where k ∈ {0, 1, 2, 3}, q k , p k , n k and d k denote the expectation values of corresponding operators when Bob measures states labeled by k. These expectation values can be obtained through homodyne or heterodyne measurements. These first four constraints come from experimental outcomes. The next two constraints are natural requirements since ρ AB is a density matrix. The last constraint on the partial trace of system B comes from the fact that the quantum channel cannot influence system A of Alice. We can handle the above density matrix and operators in finite dimensions N c after imposing the photon-number cutoff assumption on this optimization problem [37,45]. Then, this problem can be solved numerically. Eventually, we solve this minimization problem by the numerical method proposed in Ref. [38]. The specific implementation of this numerical method in our work can be found in Ref. [43]. This method involves two steps: 1. Find a solution that is close to optimal, which gives an upper bound on the key rate.
2. Convert this upper bound to a lower bound on the key rate by considering its dual problem.

III. BAYESIAN OPTIMIZATION
In this section, we present a brief introduction to Bayesian optimization. Bayesian optimization is a powerful strategy for global optimization of objective functions that are expensive to evaluate [42,46]. This method is gaining great popularity in hyperparameter optimization. In particular, hyperparameter optimization in machine learning can be represented as follows: where f (x) : X → R is an objective function to minimize, x is a hyperparameter vector yielding the lowest value of f , and the dimension of domain X depends on the total type of concerned hyperparameters. In practice, the evaluation of the objective function is extremely costly, which leads to selecting proper hyperparameters by hand becoming intractable. Beyond the manual tuning method, grid search and random search [47] are two common methods that perform slightly better. However, these methods still waste a large amount of time evaluating poor hyperparameters across the entire search space, which is relatively inefficient. In contrast, Bayesian optimization estimates the true objective function with a probability model. Then, it utilizes Bayes' theorem to update this model based on previous results and chooses the next promising hyperparameters. In practice, this method can find better hyperparameters in less time. Figure 1 illustrates the Bayesian optimization procedure. Sequential model-based optimization (SMBO) algorithms are formalizations of Bayesian optimization. [46] These algorithms have two key ingredients: 1. A probabilistic surrogate model S. SMBO approximates the objective function f with a probabilistic model called a surrogate, which is cheaper to evaluate. This surrogate contains a prior distribution capturing beliefs  Return HN about the behavior of the objective function and is then updated sequentially after each new trial.
2. An acquisition function α : X → R. The acquisition function is the criterion by which the next vector of hyperparameters is chosen from the surrogate function.
For an SMBO algorithm at iteration n, the next location x n+1 is selected by optimizing α n and to evaluate the true f to obtain a result y n+1 = f (x n+1 ). The new tuple (x n+1 , y n+1 ) is appended to the historical set H. Then, the surrogate model S is updated incorporating the new results, which means that the prior is updated to produce a more informative posterior distribution over the space of objective functions. The pseudocode of this framework is summarized in Algorithm 1.
The most common choice of acquisition function is expected improvement (EI): Here y * is a threshold value of the objective function f , and p S (y | x) represents the surrogate probability model. If this expectation is positive, then the vector of hyprparameters x is expected to produce a better result than y * . There are several different strategies for constructing the surrogate model: a Gaussian process approach [48], random forests [49] and a tree-structured Parzen estimator(TPE) [46]. In this work, the TPE approach is adopted, which supports continuous, categorical and conditional parameters, as well as priors for each hyperparameter over which values are expected to perform best [50]. In contrast, the Gaussian process approach and random forests only support one or two types of the above parameters, which are not capable of our following task covering continuous, categorical and conditional parameters. Instead of directly modeling p (y | x), this method models p (x | y) using two such densities over the configuration space X : This algorithm chooses y * to be some quantile γ of the observed y values, which means p (y < y * ) = γ. So the The tree-structured form of and g makes it easy to draw many candidates according to g(x)/ (x). On each FIG. 2: Schematic diagram of our neural network model combined with Bayesian optimization. The dataset training neural network to predict key rate is generated by some numerical approach. Here, the hyperparameters related to the neural network's architecture are not determined by humans but updated by Bayesian optimization. Bayesian optimization primarily establishes a probability model for the distribution of neural network hyperparameters and performance. Then, according to the evaluation metric produced in each trial, such as validation loss, Bayesian optimization updates the probabilistic surrogate model and suggests the next choice of hyperparameters. After several trials, we can automatically obtain the best-performing neural network.  [37]. The neural network model we use here is a fully connected forward network. By fixing the number of neurons in the input layer and output layer, we search this neural network's architecture for hidden layers. For each of the three hidden layers, there are the number of neurons, activation function in this layer and the ratio of dropout layer following it waiting to determine, where the dropout technique [52] is used to prevent overfitting. The batch size of the training process and two essential hyperparameters γ and ε are searched as well. The brace {} refers to a finite set that contains all possible discrete values. Bracket () represents a continuous range. iteration, the algorithm returns the candidate x with the greatest EI. We implement this algorithm for the hyperparameter optimization of the neural networks predicting CV QKD key rates, by using a Python library called Hyperopt [51].

IV. METHOD
Artificial neural networks can approximate arbitrary bounded continuous mapping on a given domain, according to universal approximation theorem [53]. Therefore, we expect that the neural network can learn the mapping between input variables defined in the constraints of Eq.(3) and output key rates, which avoids solving the time-consuming optimization problem and computes key rates with low latency. We demonstrated this possibility of using a neural network to predict the key rates of discrete-modulated CV QKD in previous work [39]. In that work, we built a four-layer fully connected forward neural network holding a loss function designed specifically to predict the key rates of discrete-modulated CV QKD with homodyne detection. The objective loss function is the key ingredient to keep the output key rates reliable and tight. We retain it in this work but utilize the TPE algorithm to search other parts of the neural network to improve the network's overall performance. The specific formula of the loss function is as follows: For training inputs { x i } and corresponding labels {y i }, here n is the size of { x i }, e * i = y * p i − y * i is the residual error between preprocessed label y * i and the corresponding output of the neural network y * p i , where y * i = − log 10 (y i ).  [43]. Different from the QPSK heterodyne detection protocol [37], here, we search the number of hidden layers in 3 or 4.  There are two significant hyperparameters γ and ε contained in this loss function, the choices of which are crucial to a model's performance, as we presented in Ref. [39]. The meaning of hyperparameter γ is to force the predicted key rate as information-theoretically secure as possible, and ε is to force the predicted key rate as close to numerical results as possible. Here, apart from the input layer and output layer, we do not fix the structure of the neural network but utilize the TPE algorithm to search it efficiently in a set configuration space. An illustration of our model is shown in Fig. 2.

V. RESULT
After this training under TPE searching is complete, we obtain the resulting structures of neural networks in both cases, which are shown in Table 3-4. Then, we use the selected and trained network to predict key rates on the test set for both protocols. The predicted key rates that show security achieve as high as 99.15% for the QPSK heterodyne detection protocol and 99.59% for the QPSK homodyne detection protocol, which suggests that our method combining a neural network with Bayesian optimization is highly reliable. For those key rates predicted securely, namely, predicted results are lower than the true values, we plot their relative deviation distributions for both protocols in Fig. 3. Figure 3 suggests that our method has good tightness.
Before training the neural network under the TPE method, we generate datasets for two different protocols by the aforementioned numerical approach. To obtain datasets with diversity, for the QPSK heterodyne detection protocol, we generate 36 sets of data from excess noise ξ = 0.0045 − 0.0405. Each dataset contains 80 random samplings for ξ from an interval of length 0.001, for example [0.0045, 0.0055]. Under each random sampling, we generate data every 5km with the transmission dis-tance L up to 200km. At each distance, we generate data from amplitude α = 0.62 − 0.72 in a step of 0.01. The total datasets contain 809, 600 input instances { x i } and corresponding labels {y i }. For the QPSK homodyne detection protocol, excess noise is sampled randomly from ξ = 0.014 − 0.042, where the length of sampling interval is 0.002, for example [0.014, 0.016], and amplitude α is sampled from [0.60, 1.05]. The size of total datasets is 368, 116. For both protocols, each x i ∈ { x i } represents a vector of 29 variables, there are 16 variables that are the right parts of the first four constraints of Eq. 3, 12 variables are nondiagonal elements of the right side matrix of the last constraint of Eq. 3, and the remaining variable is excess noise ξ. Label y i ∈ { y i } represents the corresponding key rate. Therefore, we fix the neurons of the network's input layer in 29 and output layer in 1, and the search space of other hyperparameters can be found in Table 1-2. Before feeding data into neural networks, we split data into a training set and a test set and implement data preprocessing as in Ref. [39]. For the QPSK heterodyne detection protocol, the training set contains 769, 120 data instances, and the test set contains 40, 480 data instances. For the QPSK homodyne detection protocol, the training set contains 327, 636 data instances, and the test set contains 17, 244 data instances. For both cases, there is 10% of the training data split as the validation set. We generate the dataset on the blade cluster system of the High Performance Computing Center of Nanjing University. We consume over 250, 000 core hours, and the node we use contains 4 Intel Xeon Gold 6248 CPUs, which involves immense computational power. Under the TPE algorithm with max iteration 10, the Adam algorithm [54] is used to train neural networks for 200 epochs, and the initial learning rate is set to 0.001. It takes roughly 53 hours for the QPSK heterodyne detection protocol and 23 hours for the QPSK homodyne detection protocol on an Nvidia A100 GPU. Here, we also compare the predicted results with numerical results in key rates versus transmission distance for two protocols. The comparison is shown in Fig. 4. For this plot, we implement the same numerical approach to compute the best key rates of two protocols for different excess noises by optimizing the amplitude α of signal states in the range [0.62, 0.72] and [0.62, 1.03] with a step of 0.01. The choice of the excess noise range is consistent with the sampling interval of previous training data. The photon-number cutoff N c is 12, and the maximal iteration number of the first step in the numerical approach N i = 300. We record the corresponding 29 variables producing the best key rates as neural networks' inputs to predict key rates. As shown in Figs. 4(c) and (d), the predicted results are all secure and remain tight with relative deviations between 10% and 20% when the transmission distance is below 150 km for both protocols.
To show the efficiency of our method, we compare the running time between the neural network method and the numerical method on a high-performance personal computer with a 3.3 GHz AMD Ryzen 9 4900H 16 GB of RAM, as shown in Fig. 5. The results suggest that the neural network method is generally 6-8 orders of magnitude of the numerical methods. For example, when ξ = 0.025, the numerical method consumes approximately 850 seconds to calculate the key rate at 50 km for the QPSK heterodyne detection protocol. When ξ = 0.035, the numerical method consumes approximately 1260 seconds at 25 km to calculate the key rate for the QPSK homodyne detection protocol. However, we can use a trained neural network to obtain results in approximately 0.0001 seconds, which is almost real time.

VI. DISCUSSION AND CONCLUSION
To summarize, we develop a neural network model combined with Bayesian optimization to directly extract key rates with high reliability, considerable tightness and great efficiency. Beyond designing the neural network architecture by human and troublesome manual tuning of hyperparameters, we utilize a special Bayesian optimization method called the TPE algorithm to automatically search the structure and hyperparameters that are the best fit for a given dataset. We exemplify our method on two promising discrete-modulated CV QKD protocols varied by different detection techniques across a large range of excess noise and transmission distances. For both protocols, the neural networks selected by the TPE algorithm predict the information-theoretically secure key rates with great high probability(up to 99.15% for the QPSK heterodyne detection protocol and 99.59% for the QPSK homodyne detection protocol), and the results present considerable tightness. We show that our method can achieve approximately 10 7 faster than the numerical method, which completely satisfies the requirement of the QKD system in practice. In contrast, the numerical method takes several minutes to calculate a point of key rate, which is intolerable since many free-space sessions, such as satelliteground or handheld QKD might have a window of only minutes. While collecting enough data based on the numerical method to train the model consumes a large amount of computing power, we can consider these large computations offline. Once we obtain the trained neural network, it can be deployed on a certain device to infer key rates online in milliseconds by giving new inputs from the experiment. Ref. [55] demonstrated that a neural network method for parameter optimization of QKD can be deployed on various mobile low-power systems, which brings advantages of more power efficiency and low latency. We can also forecast that our neural network method combined with Bayesian optimization will play an essential role in free-space QKD scenarios such as handheld [56], drone-based [57] or satellite-ground QKD [58]. Several works have focused on machine learning for optimal parameters in QKD [29,55,59,60]. However, our work predicts secure key rates directly by automatically designed neural networks, which goes further than our previous work [39]. Based on our model, there are several directions worthy of investigation for future work. Up to now, we have only covered computing the asymptotic key rates. However, finite-size effects are practical issues consid- ered in discrete-modulated CV-QKD [61]. Note that a recent work has analyzed the security and performance of discrete-modulated CV-QKD under a finite-size scenario [62], which inspires us to improve our model. To address these issues, we also consider applying our model to other protocols in future work. Moreover, the issue of post-processing (notably the error correction part) still limits the overall time acceleration for a discretemodulated continuous-variable QKD system. Note that the error correction involving binary or quaternary errorcorrecting codes is less complex compared with the situation of Gaussian modulation. Therefore, we also consider developing an effective error-correction protocol for CV QKD with discrete modulation using machine learning techniques in the future.