Coherent Optical Communications Enhanced by Machine Intelligence

Uncertainty in discriminating between different received coherent signals is integral to the operation of many free-space optical communications protocols, and is often difficult when the receiver measures a weak signal. Here we design an optical communications scheme that uses balanced homodyne detection in combination with an unsupervised generative machine learning and convolutional neural network (CNN) system, and demonstrate its efficacy in a realistic simulated coherent quadrature phase shift keyed (QPSK) communications system. Additionally, we program the neural network system at the transmitter such that it autonomously learns to correct for the noise associated with a weak QPSK signal, which is shared with the network state of the receiver prior to the implementation of the communications. We find that the scheme significantly reduces the overall error probability of the communications system, achieving the classical optimal limit. This communications design is straightforward to build, implement, and scale. We anticipate that these results will allow for a significant enhancement of current classical and quantum coherent optical communications technologies.


Introduction
Nonorthogonality makes coherent states ideal candidates for various quantum and classical optical technology systems, such as quantum networks 1 and quantum metrology 2, 3 , as well as classical optical communications schemes [4][5][6] . Additionally, coherent states exhibit high spectral efficiency 7,8 , particularly in quadrature phase shift keying (QPSK), and are more tolerant to losses and nonlinearities in the communications channel 9 . However, the accurate discrimination of coherent states is always limited by a fundamental uncertainty 10 . This uncertainty is the basis of quantum key distribution (QKD) 4 and multiple other communication security designs [11][12][13] . While this is a benefit for quantum communications, the discrimination uncertainty is always a barrier for efficient classical communications, in particular where the receiver site detects weak, low signal-to-noise ratio (SNR) coherent states such as in deep space communications 14 . As such, an essential characteristic of these coherent communications schemes, in this case QPSK, is the capacity to efficiently discriminate the signals that are sent and detected at the receiver. Realistic coherent optical communications systems involve the propagation of such signals through various obstacles that may decrease the SNR 15 and thus increase the bit error rate. As a result, the noisy received QPSK signals can significantly limit the ability to establish such communications in a real-world environment. Here we make use of generative machine learning techniques 16 in combination with a convolutional neural network 17 to design a communications system that is robust to signal fading (or weak coherent signals) and demonstrate its ability to significantly reduce the error probability in discriminating between received coherent QPSK signals in a realistic simulated communications setting.
Generative machine intelligence systems have recently been applied to a variety of research areas [18][19][20][21][22] . Here we implement an unsupervised denoising autoencoder 16 as the generative neural network (GNN), a concept which has been demonstrated to be useful in multiple applications [23][24][25] . Recently, several research groups have shown the power of machine intelligence in the context of coherent optical communications [26][27][28][29][30][31] . Here we use a generative neural network, in combination with a convolutional neural network (CNN), to make clean reconstructions and accurate classifications of received QPSK signals even for extremely low SNR scenarios, which greatly reduces the overall error probability of the communications design, achieving the classical optimal (i.e homdyne limit 32 ) or standard quantum limit (SQL). In previous optical communications techniques using machine learning, unknown received optical signals (including images) have been classified with high accuracy using only a CNN as the classifier. It is, however, desirable for schemes with only a CNN as the classifier to have a training set with a large number of known homodyne measurements, which is always unbounded. This limits the discrimination efficiency of the communications design with respect to random signal (amplitude) fading of coherent states. Additionally, such setups with only a CNN as a classifier are based on supervised learning that requires a labelling for each training homodyne measurement. In practice, it is hard to accurately label the homodyne outputs of a received QPSK signal, particularly for very low SNRs. The present unsupervised learning strategy does not require any labelling for the weak (low SNR) coherent states, which are autonomously fed to the networks and reconstructed at the GNN outputs. Afterwards, the generated keys are classified by a CNN purely trained with the homodyne data associated with high SNR QPSK signals (desired), which are easily distinguished and labelled, prior to the start of communications (i.e., pre-trained before messages are sent and received). These novel aspects of the current design pave the way towards the robust and realistic implementation of homodyne receivers in efficiently demodulating signals in the coherent free space optical communications regime. Figure 1. Schematic of the robust coherent optical communications design with the machine intelligence aided homodyne receiver. In the transmitter, QPSK signals (Q 1 , Q 2 , Q 3 , Q 4 ) are simulated to propagate through paths I, III, and, IV. Prior to communication, the homodyne measurements of coherent states |α along the path I are used as the target sets and training sets for the generative neural networks (GNN) and classifying convolutional neural networks (CNN), respectively. Similarly, the coherent states |α along the path III are used as the training sets for the generative network, whereas the coherent states |α m along the path IV are the actual unknown message signals sent to and received at the receiver. Prior to communication, the neural networks setup at the receiver is shared with the neural networks state at the transmitter. Then, weak unknown signals |α m are sent along path IV and are efficiently demodulated at the homodyne receiver with the GNN and CNN. The overall communications system design is shown in Fig. 1. A laser beam is split into two paths, path I and path II, with a beam splitter. The path II is further split into path III and IV. Path IV is the communications channel, and paths I -III are used in training the networks prior to communication. Note that the QPSK signals |α , and |α on path I and III, respectively, share the same QPSK state (key value) so that the homodyne detectors on both paths always read the same key value regardless the strength of noise (SNR) in either branch. The QPSK signal |α m ("m" for message) on the communications channel, path IV, is assumed to be an unknown message key that has been sent and received at the receiver end as shown in Fig. 1. At the transmitter, the homodyne outputs of the QPSK signals |α on path III is fed to the input of the GNN, which reconstructs a new signal measurement at its output that is finally compared with the homodyne output of signals |α from path I. Next we optimize the reconstruction loss and update the GNN parameter space (see "Methods"). At the same time we use the homodyne measurements of the QPSK signals |α to train the classifying CNN network (see "Methods"). Note that the networks are pre-trained at the transmitter, which are assumed to be continuously shared with the networks at the receiver as indicated by dotted arrows in Fig. 1. This pre-training may take place locally, then be distributed prior to implementation of the communications (which is the weak signal |α m sent along path IV). Finally, the machine intelligence aided homodyne receiver classifies unknown QPSK message signals |α m at the receiver and the corresponding error probability is evaluated. Examples of the homodyne measurement of a received QPSK signal and corresponding clean reconstruction are shown in Fig.  2. Furthermore, we evaluate the error probability for various combinations of |α (or |α m ) and |α , various scanning phase ranges of the local oscillator, and various SNR values of the transmitted message signal |α m . Finally, we show a significant improvement in the overall error probability when the neural network system is used.

Results
Here we simulate QPSK coherent signals |α k , |α k and |α k m for k ∈ {1,2,3,4} with, where each coherent state has some mean amplitude (e.g. |α|) and phase (φ ). In order to simulate a balanced (one output from a 50:50 beam splitter is subtracted from the other as shown in Fig. 1) homodyne measurement, we use a local oscillator, β = |β |e iγ , with amplitude |β | >> |α|. Note that we set β = 100 for all of the simulations presented in this paper. As a result, the mean signal, n , and variance, (∆n) 2 , at the detector are given by where a † and a are the raising and lowering operators, respectively. Since the minimum error probability (P err ) of discriminating between the QPSK signals (keys) are bounded below classically by the homodyne limit 32 , and quantum mechanically by the Helstrom limit 33 , we design and demonstrate that the present communication setups achieve the classical minimum error limit even for extremely low SNR scenarios. In order to calculate the SQL (P HD err ) and the Helstrom limit (P Hel err ), we use the following relations, which are discussed in detail in 32,34,35 , where  Next, we implement a GNN consisting of three sections -an encoder, latent space, and a generator. The encoder learns the important features of the signals and stores them into a smaller dimension, which is a latent space. The latent space is then decoded through the generator which finally reconstructs the desired clean QPSK homodyne outputs. The implemented GNN uses convolutional units as the encoder and transpose-convolutional units as a decoder. The encoder contains two convolutional layers, a single max-pool layer, and a single fully connected layer (FCL) to the latent space, whereas the generator starts with a FCL followed by a convolutional layer, a transpose-convolutional layer, and again a convolutional layer as shown in Fig. 2. Similarly, in order to classify the reconstructed, as well as uncorrected (noisy) QPSK homodyne outputs, we use a CNN at the end. The CNN consists of a single convolutional layer followed by max-pooling layer, and two FCLs. Finally, the FCL is connected with an output layer where classification decisions are made as shown in Fig. 2. The architectures of the GNN and CNN are described in detail in the "Methods" section. The error probability introduced by the network is P network err , which is equal to the ratio of the total number of incorrect classifications to the total number of QPSK keys received, resulting in the overall error probability (P err ) in discriminating the received noisy QPSK signals at the receiver given by, We then evaluate and plot the relative error probability with respect to the corresponding Helstrom limit such that relative Helstrom limit is scaled to 0 for all scenarios, which is given by, P relative err = P err − P Hel err , and P relative−HD err = P HD err − P Hel err .
Throughout the rest of the manuscript, for convenience we plot/express both P relative err and P relative−HD err as the relative P err , and a signal strength |α| in decibels, |α| dB = 10log 10 (|α|). First we evaluate the relative P err as discussed in equation 5 with respect to the amplitude |α | along the path I, given different SNRs in path III. Here we scan the local oscillator phase from 0 to 2π with a grid consisting of 900 points. We randomly simulate 200 homodyne outputs for each QPSK signal for 17 different values of |α | from -15 dB to 9.08 dB, and set them as the target patterns at the output of the GNN. Similarly, we do the same for path III and randomly simulate 200 homodyne outputs for each QPSK signal for |α| = -12.0 dB, -10.5 dB, and -9.3 dB, and feed them to the encoder of GNN. Note that before feeding the signals to the GNN, we convert the homodyne outputs (from path I and III) into corresponding 30 × 30 pixels images as shown in Fig. 2. In order to optimize |α | at various values of |α|, we keep |α| fixed at -12.0 dB, -10.5 dB, and -9.3 dB, and vary |α | for all of them separately, such that 17 different GNN configurations are trained up to 150 epochs separately for each |α|, for a total of 51 pre-trained GNNs. Also, we use the same |α | data set, separately, to pre-train the classifying CNN networks at the end of the receiver. Next, we randomly simulate 360 homodyne outputs (90 per each QPSK key) for each |α m | = -12.0 dB, -10.5 dB, and -9.3 dB as the test sets. Note that training data and test data share no similarity as they are randomly simulated at different times. Finally at the receiver, the homodyne measurements of |α m | are fed to the shared pre-trained GNN which generates desired homodyne measurement as the output. Next, in order to calculate relative P err , the reconstructed images (generated homodyne measurement outputs) from the GNN are forwarded to a pre-trained CNN and the corresponding relative P err is measured, the results of which are shown in Fig. 3 (a). The relative P err for |α m | = -12.0 dB, -10.5 dB and -9.3 dB at various |α | are shown by the solid red, green, and blue curves, respectively, with the same color dotted lines representing the corresponding relative homodyne limit (P relative−HD err ), and black solid line representing the relative Helstrom limit as shown in Fig. 3 (a). We find an improvement in relative P err begins when |α | ≥ -4.5 dB for all |α m | = -12.0 dB, -10.5

4/9
dB, and -9.3 dB. As expected we obtain better reconstructions and less relative P err as we increase |α | up to -1.5 dB, 3.1 dB, and 4.6 dB for |α m | = -12.0 dB, -10.5 dB, and -9.3 dB, respectively, after which they begin to saturate. Additionally, we show that the system's relative P err achieves the corresponding homodyne limt for |α m | = -9.3 dB at |α | = -1.5 dB. Similarly we find a difference of 3.9 × 10 −3 , and 1.1 × 10 −2 , respectively, between the relative P err and corresponding homodyne limit for |α m | = -10.5 dB, and -12.5 dB when |α | = 0.05 dB, and 6.1 dB. Moreover, the corresponding relative P err for |α | from -3 dB to 3 dB at the given |α m | are zoomed in and shown in upper right inset of Fig. 3 (a). After taking into account the various SNR scenarios, we set |α | relatively high at 9 dB as the optimized value for all simulations and results discussed in the following paragraphs. Note that this signal is used in pre-training the neural networks prior to communciation, and is not the signal that would be sent through a realistic communications channel (which is |α m |, and is simulated to be much weaker as discussed in the following). Next, we investigate the relative P err as the phase scanning range of the detection system is varied. We use the same training-test data and network settings as discussed above. In order to simulate training and test sets at various phase scanning ranges, we slice the grid of the homodyne outputs into a number of segments, for example as discussed in previous paragraph, into a grid of 30 × 30 (i.e. 0 − 900 points) which represents a phase scanning from 0 to 2π. Here, we slice it into 28 × 28 (0 − 784 points), 26 × 26 (0 − 676 points), 24 × 24 (0 − 576 points) and so on up to 4 × 4 (0 − 16 points), for a total of 14 various scanning ranges. These correspond to phase scanning ranges from 0 − 1.74π, 0 − 1.5π, 0 − 1.28π, and so on up to 0 − 0.04π, respectively. Note that we separately train the GNN and CNN for each scanning range. Also we set the latent space dimension of the GNN to half of the input dimension for a given range. For example, the input of 28 × 28 and 12 × 12 points, respectively, have latent space sizes of 14 × 14 and 6 × 6 points. With the GNN pre-trained for an |α| = -10.5 dB, and -9.3 dB, and the CNN pre-trained with |α | = 9 dB, we calculate the relative P err for |α m | = -10.5 dB, and -9,3 dB at the given scannning range. Here we reconstruct the 360 homodyne measurements (90 per each QPSK) for the given test set |α m | and predict the corresponding QPSK value using the CNN, which is finally used to evaluate the relative P err . The relative P err resulting from the networks with and without the GNN versus scanning range is shown in Fig. 3 (b). The improvement in relative P err for |α m | = -10.5 dB is shown by red curves, where the dotted red curve, solid red curve, and horizontal red line represent the relative P err without the GNN, with the GNN, and the relative homodyne limit, respectively. Similarly, the dotted green curve, solid green curve and a horizontal green line represent the relative P err without the GNN, with the GNN, and the relative homodyne limit for |α m | = -9.3 dB, respectively. As discussed above, the black horizontal line is the relative Helstrom limit. We find a gradual improvement in the relative P err as we increase the scanning phase range of the local oscillator. Furthermore, with the aid of The GNN is pre-trained with the targets of strength |α | = 9 dB. Similarly, the same data set for |α | = 9 dB is used to pre-train the classifying CNN networks. The abbreviations HD-CNN and HD-GNN-CNN, respectively, show the results without and with GNN in the communications setup at the receiver.

5/9
the GNN in the network and a scanning range of 0 − 0.87π, we show significant improvement in relative P err from 9.5 × 10 −2 to 4.1 × 10 −2 , and 6.5 × 10 −2 to 1.9 × 10 −2 for |α m | = -10.5 dB, and -9.3 dB, respectively. Note that the relative homodyne limit for |α m | = -9.3 dB, and -10.5 dB are 1.5 × 10 −2 , and 1.1 × 10 −2 , respectively. Additionally, we find the relative P err with the aid of the GNN is minimized and achieves the corresponding relative homodyne limit when the scanning range is greater than or equal to 0 − 1.14π for |α m | = -9.3 dB. A minimum difference of 4.3 × 10 −3 between the GNN-aided relative P err and relative homodyne limit is achieved for |α m | = -10.5 dB at a scanning range of 0 − 2π. The relative P err corresponding to a scanning range of 0 − 1.28π, 0 − 1.74π, and 0 − 2π are zoomed in and shown in upper right inset of Fig. 3 (b).
We now turn to investigating the improvement of the relative P err at various SNR levels of transmitted message signals |α m |. In order to generate training sets for the GNN, we randomly simulate 200 homodyne measurements for each QPSK key for 15 values of |α| from -15.0 dB to -9.12 dB. Similarly, we randomly generate 90 homodyne measurements for each QPSK for 15 values of |α m | from -15.0 dB to -9.12 dB as the test set separately. As a result, test sets and training sets again share no similarity. As mentioned earlier, we set |α | to 9 dB, and the corresponding 200 homodyne outputs for each QPSK signal discussed above are used as the target for the GNN. The same |α | is used to train the CNN. Note that we train separately the GNN for each value of |α|, giving a total of 15 GNNs, while the classifying network (CNN) remains the same (as they share same target of the GNN). With the pre-trained GNN and CNN, we evaluate the relative P err with and without the GNN in the networks, results of which are shown in Fig. 4. The dotted red and solid blue curves represent the relative P err without the GNN and with the GNN in the networks, respectively. Similarly, a solid green and a horizontal black line represent the relative homodyne and Helstrom limits for various values of |α m |. With the aid of the GNN in the networks, we achieve a remarkable improvement in relative P err from 6.1 × 10 −2 to 1.1 × 10 −2 , which nearly reaches the relative optimal homodyne limit of 1.2 × 10 −2 for |α m | = -10.23 dB. Similarly, for |α m | = -9.27 dB, the relative P err achieves an improvement from 5.8 × 10 −2 to the optimal homodyne limit as shown in upper right inset of Fig. 4. Furthermore, even for a very low SNR of |α m | = -13.26 dB, we significantly reduce the difference between relative P err and the relative homodyne limit from 12.8 × 10 −2 to 4.7 × 10 −2 .

Discussion
In conclusion, we have designed a novel coherent optical communications setup that efficiently demodulates weak coherent QPSK states with a robust machine intelligence aided balanced homodyne receiver. The developed state-of-the-art GNN and CNN system, in combination, corrects for coherent QPSK signals associated with a wide range of SNRs, resulting in significantly reduced overall error probability of the communications system, either achieving or approaching the classical optimal limit. Additionally, by using the same network system, we also show an improvement in discrimination error probability for various scanning ranges of the local oscillator at different SNRs. Furthermore, with the aid of the GNN, which reconstructs a clean and desired quadrature measurement at its output, we have minimized the relative error probability with a CNN that is exclusively trained with homodyne measurements associated with high SNR (desired) coherent QPSK signals. This allows bypassing the need for extremely large CNN training sets that would require various noises which are not only difficult to label, but are also unbounded. The present advances in QPSK demodulation and classification are essential to the robust performance of realistic coherent optical communications systems, and we anticipate that the presented techniques may directly be applied to an enhancement of current classical and quantum communications protocols.

Architecture of the GNN
The GNN consists of three sections -an encoder, latent space, and a decoder (generator). The encoder begins with a two dimensional convolutional layer with a kernel of size [5,5], stride length of 1, batch size of 10, feature mappings of 20, same padding, and a ReLU activation function. After this we apply a dropout with a rate of 20% followed by a max-pool layer with a kernel of size [2,2] with a stride length of 2. Then, again, we connect a two dimensional convolutional layer to the max-pool with the same parameters and dropout as discussed above. Next we attach a fully connected layer with a number of neurons that is always equal to the size of the latent space. Note that for the consistency we always choose the size of the latent space as half of the input dimension. For example, input dimensions of 30 × 30, and 16 × 16, respectively, have a 15 × 15, and 8 × 8 sized latent space. Finally, the encoder convolutes the input dimension of 10 (batch size), width, height, 1 (input channel) to 10, latent space dimension . At the beginning of the decoder, the latent space is connected to a fully connected layer with width/2×height/2×20 (feature mappings) neurons, which is followed by a dropout with a rate of 20%. After this we apply a convolutional layer with stride length of 1 and transpose-convolutional (or deconvolutional) layer with stride length of 2, each with a kernel of size [5,5], feature mappings of 20, same padding, and a ReLU activation function. Note that each layer is followed by a dropout with a rate of 20%. Finally a convolutional unit with a single feature mapping generates a clean, or less 6/9 noisy, homodyne measurement as the output. Here, the generator decodes the latent space of size 10 (batch size), latent space dimension into the outputs of size 10 (batch size), width, height . In order to optimize the clean reconstructions, a square reconstruction loss (error) L GNN(α), GNN(α ) is evaluated, where GNN(α) is the output of the GNN, and GNN(α ) is the target. Finally, we minimize the average reconstruction loss given by equation (6) using adamoptimizer of tensorflow 36 with a learning rate of 0.008.
where θ and θ represent the encoder and decoder parameter space, respectively. The schematic of the GNN is shown in Fig. 2.

Architecture of the classifying CNN
The classifying CNN network begins with a convolutional unit with a kernel of size [2,2], batch size of 1, feature mappings of 10, same padding, stride length of 1, and a ReLU activation function. This is followed a max-pool layer with a kernel of size [2,2] and stride length of 2. Then we connect fully connected layers (FCLs) with 400 neurons and 50 neurons, which are consecutively followed by dropout layers with a rate of 80%, and 40%, respectively. Note that we use ReLU activations for both FCLs. Next we attach a final FCL with 4 neurons with a linear activation function to the end, followed by a softmax operation.
In order to train a CNN to classify the generated clean homodyne outputs, we always use the target set of the GNN i.e. the homodyne measurements at the signal strength of |α | as shown in path I of Fig. 1. We randomly simulate 200 homodyne outputs per each QPSK key for the given |α | as discussed in the results section, resulting in a total of 800 homodyne outputs. The data set is split into a training set with 170 measurements and a test set with 30 measurements, again, per each QPSK. The target of each QPSK is set as the one-hot vector output, for example, the first target key is [1,0,0,0], the second key is [0,1,0,0], and so on. After this the parameter space of the CNN is optimized by minimizing a softmax cross-entropy loss function using adamoptimizer with a learning rate of 0.001 up to 10 epochs. The neural networks' hyper-parameters are manually optimized as discussed in 17 . Note that pre-trained CNN network has unity accuracy with respect to the test homodyne measurements. The schematic of the classifying CNN is shown in Fig. 2.