Adaptive Secure MIMO Transmission Mechanism against Smart Attacker

original


Introduction
When a malicious attacker can cleverly switch its attack mode amongst eavesdropping [1], jamming [2], and spoofing [3] to obstruct the secure communication between a transmitter and its receiver, it will impose critical challenges on the transmission strategy design for secure transmission. Game theory provides us a useful framework to derive the optimal transmission strategy in the presence of uncertain attack modes [4][5][6]. A MIMO wiretap zero-sum game was formulated in [4] to assume the secrecy rate as the utility function to analyze the conditions of equilibrium outcomes with various strategies. An exemplary multichannel spectrum access game (SAG) with unknown environment dynamics and limited information of other players is considered in [5] to find the best communication strategy through the joint reinforcement learning and type identification algorithm. In Motivated by the aforementioned progresses, some research efforts strived to combine the game theory with reinforcement learning to devise the secure transmission strategy. In [10], the Q-learning-based strategy was developed to cope with the smart attacker. e reinforcement learning-based spoofing detection and the deep reinforcement learning-based authentication scheme were proposed to further enhance the authentication in [11]. e Q-learning was utilized. e reinforcement learning-based power control scheme was developed in [12] to resist jamming attacks for the communication between the inbody sensors and the WBAN coordinator. In [13], the interaction between the user and a smart interferer in an ambient backscatter communication network was formulated as a game, and the closed-form equilibrium of the Stackelberg game was obtained. Due to lack of information about the system SNR and transmission strategy of the interferer, the Q-learning algorithm was proposed to derive the optimal strategy in a dynamic iterative manner. A zerosum game between a base station equipped with multiple antennas and a smart jammer in the nonorthogonal multiple access (NOMA) system was formulated in [14]. e Stackelberg equilibrium of the antijamming NOMA transmission game was derived, and the reinforcement learningbased power control scheme was proposed for the downlink NOMA transmission without being aware of the jamming and radio channel parameters.
As a reinforcement learning (RL) algorithm to maximize the long-term expected reward in multistate environments, Q-learning provides us an effective technical solution for application in multiplayer general-sum games. e conditions that the Q-learning will converge with probability 1 to the optimal ones, the dynamic analysis framework, and the asymptotic convergence classes were addressed in [15][16][17], respectively. e MIMO transmission against a more powerful smart attacker, which can apply programmable radio devices (for instance, software-defined radios) to perform multiple types of attacks like eavesdropping, jamming and spoofing, was investigated in [18]. It is shown that the problem can be formulated as a noncooperative game, in which the power control strategy via reinforcement learning can be utilized to suppress the attack motivation of smart attackers in a dynamic MIMO transmission game without being aware of the attack and the radio channel model. Nonetheless, it should be addressed that most of the existing research efforts focus on how to derive the optimal power control strategy for suppressing the attack motivation of the malicious attacker by employing the reinforcement learning. Unfortunately, the power control strategy only will become no longer effective in terms of suppressing the attack motivation by the malicious attacker, especially in an adverse environment, as we will illustrate in this paper. In this case, it will become highly desirable to develop a new mechanism to make the MIMO transmitter better resist malicious attacker. And this is exactly the most important research motivation of our work in this paper.
Nash Equilibrium (NE) provides us the basis in the noncooperative game framework to determine the optimal solution, in which each player lacks any incentive to change his/her initial strategy, because a player does not gain anything by deviating from the initially chosen strategy, while other players keep their strategies unchanged. e NE analysis in [18] was presented to unveil that, when the transmit power of the MIMO transmitter is selected to be large enough, the game between the MIMO transmitter and the smart malicious attacker will incline to the MIMO transmitter, namely, the attacker tends to be idle. Although the proposed power control strategy via reinforcement learning can be utilized to cope with the smart attacker, one may readily observe the limitation of this design. In some adverse channel conditions, a very large transmit power might be infeasible, especially when the transmit power is limited. In this case, the challenges of the secure MIMO transmission arise. Exploring the critical factors that affect the strategy of both parties and developing an effective scheme to make the MIMO transmitter realize secure transmission against the attacker under an adverse channel condition are the other two motivations of our work in this paper. To this end, an adaptive secure MIMO transmission scheme was proposed in this paper, in which not only the transmit power of the MIMO transmitter but also the transmission probability will be adjusted in the proposed adaptive secure transmission scheme. And the proposed scheme can be interpreted as a generalized adaptive transmission strategy. When the adaptive transmit power policy is enough to suppress the attack motivation, the proposed scheme will be reduced to the adaptive power control scheme; otherwise, both the adaptive transmit power and the adaptive probabilistic transmission will be employed to suppress the attack motivation of the smart attacker. e contributions of this paper can be briefly summarized as follows: (i) A comprehensive Nash Equilibrium (NE) analysis of the noncollaborative game framework in [18] was presented to explore the critical factors that dominate the game decisions. In this way, we show that the power control strategy only is not enough if we wish to suppress the attack motivation by the malicious attacker in adverse channel conditions. (ii) A new probabilistic transmission scheme was proposed to realize a novel adaptive secure MIMO transmission scheme against the smart attacker. It is shown that, with both the probabilistic transmission control and the power control policy, we can improve the capability of the MIMO transmitter to resist the smart attacker, especially when compared with the original scheme in [18]. e remainder of this paper is organized as follows: e system model and the game problem formulation between the MIMO transmitter and the malicious attacker will be reviewed in Section 2. e comprehensive NE analysis will be presented in Section 3 to show that the game decision will not always be friendly to the MIMO transmitter. e proposed adaptive secure MIMO transmission mechanism will be addressed in Section 4. Numerical analysis was presented in Section 5 to show the applicability of the proposed scheme. Finally, we conclude our work in Section 6. (1) If Eve chooses to overhear the transmission by Alice (namely, the attack mode indicator q � 1), the received signal at Eve will be where n e represents the N-dimensional additive zero-mean white Gaussian noise vector, i.e., n e ∼ CN(0, I N ). In this case, the achievable MIMO secrecy rate R E will be [2] If Eve decides to block Alice's transmission with the jamming signal x J (namely, the attack mode indicator q � 2), and we assume E[x H J x J ] � P J , P J is the transmit power by Eve in the jamming mode, Bob will receive the following signal: where n b represents the N r -dimension white Gaussian noise at Bob, i.e., n b ∼ CN(0, I N r ). en, the achievable secrecy rate R J can be given by [2] If Eve chooses to spoof Bob by sending a fake signal x s with restricted transmit power E[x H s x s ] � P s /N when Alice is silent (namely, the attack mode indicator q � 3), P s is the transmit power by Eve in the spoofing mode and Bob will receive the following signal: e attack mode of spoofing aims at transmitting spoofing information to Bob only. e achievable secrecy rate of Alice when Eve spoofs Bob can be given by [18] where c represents the spoofing message utility coefficient.

MIMO Transmission Game Problem Formulation.
In order to cope with the smart attacker, the MIMO transmission scheme can be derived by employing a noncollaborative game framework, in which the following two utility functions of Alice and Eve are assumed [18]: where C a represents the unit power consumption at Alice,

Wireless Communications and Mobile Computing
Considering the secure MIMO transmission game denoted by G � 〈(Alice, Eve), (P, q), (μ a , μ e )〉 with the game participants (Alice, Eve), game strategy (P, q), and utility functions (μ a , μ e ), the Nash Equilibrium (NE) strategy (P * , q * ) should satisfy the following conditions: e NE condition that is friendly to Alice and the issue on how to suppress completely the attack motivation by Eve are highlighted in [18], in which the reinforcement learningbased power control strategy was proposed as well to realize the attack-free game results (namely, q � 0). Nonetheless, it should be addressed that there exist multiple NE conditions, in which the game results will not always incline to Alice. In this paper, our emphasis will be focused on the case, in which the game results incline to the malicious attacker instead of the MIMO transmitter. Obviously, it is highly desirable to develop an effective mechanism to cope with the smart attacker in this case for a better secure communication. In fact, this is exactly the problem that we would like to highlight in this paper.

Nash Equilibrium in support of Alice.
Considering the utility functions in (8) and (9), as well as the Nash Equilibrium condition in (10) and (11), one may readily derive that, the NE condition at (P, q) � (p * , 0) in support of Alice for the game G � 〈(Alice, Eve), (P, q), (μ a , μ e )〉 can be achieved when the following conditions are satisfied: In fact, three threshold values of θ E , θ J , and θ S can be interpreted as the minimum reward expected by Eve if he decides to overhear Alice, to jam or to spoof Bob. e NE conditions in (12) implies that, when the realized attack reward by Eve is less than the predefined minimum expected reward, Eve will give up attacking, namely, the game between Alice and Eve inclines to Alice. One may readily observe that for the given expected minimum reward θ E , θ J , and θ S , the achievability of the NE conditions at (P, q) � (p * , 0) will depend on the underlying channel λ ba i , λ ea i , λ be i . One may readily note that, the above NE conditions cannot always be satisfied, especially when the channel λ ba i , λ ea i , λ be i changes.

Nash Equilibrium in support of Eve.
In the same way, on the basis of (10) and (11), we may derive the Nash Equilibrium conditions in support of Eve for q � 1, 2, 3 in Table 1.
(1) Nash Equilibrium in Eavesdrop Mode. By comparing the NE conditions with those in (12), one may readily observe that the game decision will incline to the eavesdrop attack mode by Eve when the predefined eavesdropping reward θ E can be fulfilled and the relative eavesdropping gain of is the largest one among three attack modes. A further examination of the conditions unveils that better channel λ ea i between Alice and Eve will make the game have a larger opportunity to incline to the eavesdrop attack mode, which complies with the heuristics, because now Eve is closer to Alice and he is in a better condition to overhear the transmission by Alice.
(2) e Nash Equilibrium in Jam Mode. In the same way, by comparing the NE conditions with that in (12), one may readily observe that the game decision will incline to the jam attack mode by Eve if the predefined jamming reward θ E can be fulfilled and the relative jamming gain of ) − θ J is the best among three attack modes. By carefully examining the conditions, we may observe that reasonable channel λ ba i between Alice and Bob and reasonable channel λ be i between Eve and Bob will make the game have a larger opportunity to incline to the Jam attack mode (one may see this from (p * · p J /((MN/λ ba i λ be i ) + (Mp J /λ ba i ) + p * · (N/λ be i )))), which complies with the heuristics because now both Alice and Eve are in good conditions to transmit to Bob.
( 3) e Nash Equilibrium in Spoof Mode. Similarly, by comparing the NE conditions with that in (12), one may readily conclude that the game decision will incline to the spoof attack mode by Eve if the predefined spoofing reward θ S can be fulfilled and the relative spoofing gain of is the best among three attack modes. A further examination of the conditions unveils that better channel λ be i between Eve and Bob will make the gaming have a larger opportunity to incline to the spoof attack mode, which complies with the heuristics because now the Eve is closer to Bob and he is now in a good condition to spoof the reception by Bob.
In order to explicate more clearly the above NE conditions, let us summarize briefly their dependency on the underlying channel gains of λ ba i , λ ea i , λ be i . From the perspective of attack, a large λ be i (Eve-Bob link is good) is a beneficial situation for Eve to choose either jam or spoof mode. If λ ba i also becomes large (Alice-Bob link is good), Eve would incline to jam. e NE in support of Eve may happen when both the Alice-Bob link λ ba i and the Eve-Bob link λ be i are not in good conditions, while we have a 4 Wireless Communications and Mobile Computing reasonable Alice-Eve link λ ea i . In this case, the game decision may incline to make Eve overhear the transmission by Alice.
e above discussion clearly tells us that the game framework between Alice and Eve will not always incline to Alice. When the wireless environment becomes adverse, such that Eve inclines to attack, the power control strategy only would be no longer effective to suppress Eve's attack motivation. As we will address in Section 4, now the adaptive probabilistic transmission design can be utilized to discourage the attack motivation by Eve.

Probabilistic Transmission Policy.
In Section 3, we have shown that, the NE at (p * , 0) can be achieved when all the possible attack rewards by Eve are less than the predefined minimum expected rewards θ E , θ J , and θ S . In this case, a reinforcement learning-based power control strategy can be employed to approach the NE at (p * , 0) [18]. e paid cost may be some increase in the required transmit power at Alice. However, when some attack reward by Eve is larger than the predefined minimum expected rewards, in this case, the transmit power control strategy may be no longer effective in terms of the attack suppression. In order to better fulfill the secure transmission requirements and to realize reasonable secure transmission in adverse conditions, we proposed to use the probabilistic transmission strategy. Unlike the original transmission scheme [18], in which Alice will always transmit irrespective of the current channel conditions, in the proposed adaptive probabilistic transmission scheme, Alice may decide to transmit in a probabilistic manner, especially when the game decision inclines to Eve. To this end, the utility function of Alice can be modified as follows: where P T ∈ (0, 1] represents the transmission probability by Alice. One may readily observe that the above utility function will be reduced to the traditional utility function assumed in [18] when P T � 1. We will show later that the proposed adaptive probabilistic transmission scheme will subsume the original adaptive transmission scheme in [18] as a special case by letting P T � 1. Because Eve will try to obstruct the communication between Alice and Bob with a certain paid cost, we may assume the worst case that Eve knows the probabilistic transmission control mechanism at Alice; the utility function at Eve can thus be revised as follows: On the basis of the two updated utility functions in (13) and (14), we may derive the NEs in Table 2.
By comparing Tables 1 and 2, we may readily observe from the modified NE conditions how the proposed probabilistic transmission scheme is able to further suppress the attack motivations by Eve. By introducing the transmission probability control mechanism, now three possible attack rewards and the associated relative attack rewards will be reduced with a discount coefficient, which is proportional to the transmission probability P T < 1. And this explicates the philosophy that the proposed probabilistic transmission leads to the suppression of the attack motivation by Eve.
Of course, it should be addressed that the proposed probabilistic transmission scheme will incur degradation in the achieved secrecy capacity, in that now Alice will not always attempt to transmit, no matter how Eve reacts and what about the underlying channel conditions. Nonetheless, as we will illustrate in Section 5, in adverse channel conditions where the traditional game strategy (power control strategy) fails in suppressing the attack motivation by Eve, the proposed adaptive secure transmission scheme can still help to improve the secure transmission between Alice and Bob by discouraging the attack motivation by Eve.
us, some loss in the transmission opportunity is still worthwhile.

Reinforcement Learning-Based Adaptive Secure Transmission Scheme.
e Q-learning-based algorithm in [18] can be modified to derive both the optimal transmit power strategy and the optimal probabilistic transmission policy to realize the adaptive secure MIMO transmission scheme. As summarized in Algorithm 1, the Q-learning-based algorithm Table 1: NE conditions in support of Eve.

Attack mode Conditions
Eavesdropping mode, q � 1

Wireless Communications and Mobile Computing
can be utilized to derive both the optimal power control p * and the optimal probabilistic transmission P * T for Alice. Specifically, Q(P, P T , s) stands for the Q function of Alice, in which s represents the system state and P and P T denote two actions by Alice. V(s) indicates the maximum of Q(P, P T , s) over all possible actions, given the state of s. e learning rate α ∈ [0, 1] represents the weight of the current quality during the learning process, while δ ∈ [0, 1] is the discount factor that denotes the uncertainty of Alice about the future gains. Alice would observe the strategy by Eve in the (n − 1)-th slot q n− 1 and can assume it as its current state s * � q n− 1 . With the times going by, Alice is able to choose the optimal power control strategy p * ∈ (0, P max ] and P * T ∈ (0, 1]. In practical applications, we may consider to set a smallest probabilistic transmission parameter P T, min according to the least transmission rate requirement by Alice-Bob link.

Numerical Analysis
In order to show the applicability of the proposed adaptive probabilistic transmission scheme, we will focus on the conditions in which the traditional game results will incline to Eve. By doing so, we will show clearly that when the conditions tend to be in support of Eve, the traditional game will fail in guaranteeing secure transmission from Alice to Bob. On this basis, we then highlight how the proposed probabilistic transmission policy can be utilized along with the power control strategy to improve the secrecy transmission from Alice to Eve by suppressing the attack motivation of Eve. In all simulations, without statement, we assume M � 5, N � 3, N r � 3, Θ � [2.2, 3, 3.2], P J � 3, P s � 3.2, and P max � 10. In simulations, we assume C a � 0.1 and c � 0.5. P T ∈ [0.3, 1] is assumed to make sure that a very low transmission rate can be avoided. is not the best among the three links, the transmit power control strategy at Alice can successfully suppress the attack motivation by Eve, as shown in Figure 2(b). As illustrated in Figure 2(c), with some paid cost in the increased transmit power at Alice, secure transmission from Alice to Bob can be realized (see Figure 2(a)). For illustration purpose, the realized secrecy capacity, different attack mode probabilities by Eve, and the required transmit power at Alice in the same benchmark system are illustrated in Figure 3. One may readily observe that by introducing the probabilistic transmission control strategy, we can also guarantee the suppression of the attack motivation by Eve for a secure transmission from Alice to Bob. Meanwhile, less transmit power is required due to the probabilistic transmission strategy (Figure 3(c)). e paid cost is some loss in the achieved secrecy rate from Alice to Bob. en, in the following numerical analysis, we will show that the proposed probabilistic transmission strategy can be utilized with the transmit power control to formulate a more robust transmission scheme in the presence of malicious attack.

Ability to Suppress the Eavesdropping.
Let us consider the following channel conditions as a typical eavesdropping setup: λ ba 1 � 2.77, λ ba 2 � 1.81, λ ba 3 � 1.07 , λ ea 1 � 5.77, λ ea 2 � 3.78, λ ea 3 � 2.24}, and λ be 1 � 2.00, λ be 2 � 1.14, λ be 3 � 0.46 . Compared with the benchmark system in Section 5.1, now the Alice-Eve link is the best among the three involved links, the Eve-Bob link is the worst, and the Alice-Bob channel remains unchanged. According to our analysis in Section 3.2, now the traditional game decision will incline to the eavesdropping by Eve. As shown in Figure 4(a), now the power control strategy will only fail in suppressing Eve's motivation to overhear the transmission by Alice. In fact, the eavesdropping probability by Eve is now about 80%, as illustrated in Figure 4(b). We do not include the secrecy rate performance since now there is in fact no secure rate at all.
Obviously, it is highly desirable for Alice to figure out an effective mechanism to resist the attack by Eve to fulfill the secure transmission requirement. In the same eavesdropping setting, if the proposed adaptive probabilistic Table 2: NE conditions with the probabilistic transmission scheme.

Attack mode Conditions
No-attack mode, transmission scheme is utilized along with the power control strategy, the realized secrecy rate, different attack mode probabilities by Eve, the required transmit power, and the probabilistic transmission control at Alice are illustrated in Figure 5. We can see the game results now incline to Alice again, as illustrated in Figure 5(b). One may also note that by lowering the transmission probability of Alice, we can effectively suppress the attack motivation by Eve. As a result, (1) Initialize q 0 � 0, Q(P, P T , s) � 0, V(s) � 0, ∀s, P, P T .
(2) For n � 1, 2, 3, . . . do (3) Update the state s n � q n− 1 (4) Choose P n and P n T with the ε-greedy policy (5) Send signal with power P n and probabilistic parameter P n T over M antennas (6) Observe the attack type q n and μ a (7) Update the Q function and value function: (8) Q(P n , P n T , s n ) � (1 − α)Q(P n , P n T , s n ) + α(μ a (P n , P n T , s n ) + δ(V(s n+1 ))) (9) V(s n ) � max P,P T Q(P, P T , s n ), (0 ≤ P ≤ P max , P T, min < P T ≤ 1) (10) end for ALGORITHM 1: Adaptive secure MIMO transmission scheme via Q-learning. secure transmission from Alice to Bob can be realized, as illustrated in Figure 5(a). Since the Alice-Eve link is noticeably better than the Alice-Bob link, there is some expected loss in the realized secrecy capacity, when compared with the realized secrecy capacity in the benchmark system, in which the same Alice-Bob link is assumed. From Figure 5(c), we may see that the loss in the realized secrecy capacity can be explicated by the low transmit power at Alice and the low transmit probability, which is the result of the game decision when the proposed probabilistic transmission policy is utilized. 10 . Compared with the benchmark system in Section 5.1 and the eavesdropping system in Section 5.2, now the Eve-Bob link is the best among the three involved links, the Alice-Eve link is the worst, and the Alice-Bob channel remains unchanged. According to our analysis in Section 3.2, now the traditional game decision will incline to the spoofing by Eve. As illustrated in Figure 6(b), now the spoofing probability by Eve is about 50%. In order to resist the spoofing attack, Alice needs a relatively high transmit power to achieve the relatively small secrecy capacity, as illustrated in Figures 6(a) and 6(c). Here, we can clearly see that the power control strategy only cannot effectively resist the Eve's motivation to spoof the reception at Bob.

Ability to
In the same spoofing setting, if the proposed adaptive probabilistic transmission scheme is utilized along with the power control strategy, the realized secrecy rate, different attack mode probabilities by Eve, the required transmit power, and the probabilistic transmission control at Alice are illustrated in Figure 7. We can see that the game results now incline to Alice again. As illustrated in Figure 7(b), now the no-attack probability is about 90%. One may also note that by lowering the transmission probability of Alice, we can effectively suppress the attack motivation by Eve. As a result, improved secure transmission from Alice to Bob can be realized, as illustrated in Figure 7(a). Compared with the required transmit power in Figures 6(c) and 7(c), one may readily observe that, with the proposed probabilistic transmission control mechanism, less transmit power is needed to counteract the spoofing attack by Eve to realize a better secrecy capacity, which is obviously attractive for practical applications. Compared with the benchmark system in Section 5.1, the eavesdropping system in Section 5.2, and the spoofing system in Section 5.3, now we have a much better Alice-Bob link, the Eve-Bob link is in good conditions, and the Alice-Eve link is the worst. According to our analysis in Section 3.2, now the traditional game decision will incline to the jamming by Eve. As illustrated in Figure 8(b), now the jamming probability by Eve is about 50%. Alice needs a relatively high transmit power to resist the jamming attack, as illustrated in Figure 8(c). Here, we can clearly see that the power control strategy only cannot effectively suppress the Eve's motivation to obstruct the reception at Bob. In the same jamming setting, if the proposed adaptive probabilistic transmission scheme is utilized along with the power control strategy, the realized secrecy rate, different attack mode probabilities by Eve, the required transmit power, and the probabilistic transmission control at Alice are illustrated in Figure 9. We can see that the game results now incline to Alice again. As illustrated in Figure 9(b), now the no-attack probability is about 50%. One may also note that, by lowering the transmission probability of Alice, we can effectively suppress the attack motivation by Eve. As a result, improved secure transmission from Alice to Bob can be realized, as illustrated in Figure 9(a). Compared with the required transmit power in Figures 9(c) and 8(c), one may readily observe that, with the proposed probabilistic transmission control mechanism, less transmit power is needed to counteract the jamming attack by Eve to realize a better secrecy capacity, which is desired in practical applications.

Ability to
In summary, our analysis results confirm us that the proposed adaptive probabilistic MIMO transmission scheme do provide us an effective method to make the MIMO transmitter better resist malicious attackers in adverse channel conditions, in which not only the transmit power of the MIMO transmitter but also the transmission probability will be adjusted to suppress the attack motivation. Meanwhile, we can also conclude from the four typical settings that (1) when the adaptive transmit power policy is enough to suppress the attack motivation, the use of both the adaptive transmit power and the probabilistic transmission control will lead to a more energy efficient MIMO transmission scheme, but with some loss in the realized secrecy capacity; (2) when the adaptive transmit power policy is no longer enough to suppress the attack motivation, the use of both the adaptive transmit power and the probabilistic transmission control will be highly recommended, not only in terms of the attack motivation suppression but also from the perspective of the realized secrecy capacity and the improved energy efficiency. In terms of the realized secrecy capacity, one may also note that the proposed adaptive probabilistic transmission control policy seems to be very attractive in spoofing and jamming scenarios.

Conclusion
In this paper, we focus on the noncollaborative game between an MIMO transmitter and one smart malicious attacker, both of which try to maximize their predefined utilities. By carefully analyzing the Nash Equilibrium (NE) and the critical conditions that affect the achieved NE, an adaptive probabilistic MIMO transmission scheme was proposed to make the MIMO transmitter better resist the malicious attacker in adverse channel conditions. Compared with the existing game-based strategy, not only the transmit power of the MIMO transmitter but also the transmission probability will be tuned in the proposed adaptive probabilistic transmission scheme. And our analysis results unveil that the proposed adaptive probabilistic transmission can be regarded as a generalized version of the previous adaptive transmission scheme, which can significantly suppress the attack motivation by the smart attacker and improve the secrecy capacity, even if the adaptive power control strategy fails. Other sophisticated strategies can also be employed to further improve the adaptive secure MIMO transmission scheme. For instance, just like [20], when full-duplex (FD) Bob is assumed, some subsets of antenna at Bob can be utilized to send the artificial noise to obstruct the eavesdropping by Eve. We leave this in the next step work.

Data Availability
In our work, all the data are generated by the simulation platform developed by ourselves, instead of any other data set. Of course, in order to make sure that our data set is properly generated, we have verified the results by ensuring all the results comply with the existing publications, for instance, [18] and [2].