Antenna selection for multiple-input multiple-output systems based on deep convolutional neural networks

Antenna selection in Multiple-Input Multiple-Output (MIMO) systems has attracted increasing attention due to the challenge of keeping a balance between communication performance and computational complexity. Recently, deep learning based methods have achieved promising performance in many application fields. This paper proposed a deep learning (DL) based antenna selection technique. First, we generated the label of training antenna systems by maximizing the channel capacity. Then, we adopted the deep convolutional neural network (CNN) on the channel matrices to explicitly exploit the massive latent cues of attenuation coefficients. Finally, we used the adopted CNN to assign the class label and then select the optimal antenna subset. Experimental results demonstrate that our method can achieve better performance than the state-of-the-art baselines for data-driven based antenna selection.


Introduction
Antenna system has been widely used in many application fields, such as public transportation, shopping malls, smart building, automotive radar, satellite communications, airplane landing, and astronomy [1][2][3][4][5]. The multiple-input multiple-output (MIMO) system also has received increasing interest in the area of wireless communication over the past few decades. Due to the rapid increasing of cellular mobile device usage and the limitation of computing power, antenna selection has attracted more and more attention recently. Antenna selection can keep a balance between communication performance and computational complexity. It can reduce the hardware cost and computational complexity, and keep enough gain rate or signal-tonoise ratio (SNR) at the same time. Usually, obtaining the optimal antenna subset needs to compare all possible combinations by exhaustive searching. It takes great amount of calculation and is very time-consuming. Because the exhaustive searching methods are impractical, many suboptimal models have been proposed. In general, existing methods can be categorized as two types, optimization-driven methods and data-driven methods. data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Guangdong Grandmark Automotive Systems CO Ltd provided support in the form of salaries for author RXZ, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of this author are articulated in the 'author contributions' section.
The contributions of our work are listed as follows.
1. We introduced the DL techniques into the field of wireless antenna selection. An antenna section model based on deep CNN and channel capacity criterion was proposed.
2. We adopted an LeNet CNN and a ResNet CNN on the training channel matrices respectively. The deep CNNs were trained on massive training antenna systems. The output of deep CNNs were converted to the solution of antenna selection for test channel matrices.
3. The proposed approaches have better classification performance and communication performance than state-of-the-art methods. The proposed models can be implemented on real-life applications in future and have potential applications.
The rest of this paper is organized as follows. In Section 2, we discuss some related work. In Section 3, we describe the system model. In Section 4, we describe the labeling system and the network architecture. Section 5 gives the details of experimental setup and the analysis. Finally, in Section 6 our final remarks are presented.

Antenna selection
Optimization-driven methods. Optimization-driven methods employ suboptimal search algorithms to find the best subset. Padmanabhan et al. [6] considered the problem of receive antenna selection using the known temporal correlation of the channel symbols embedded in the data packets; the model was stated as a problem of minimizing the average packet error rate and converted to a partially observable Markov decision process framework which was solved by heuristic searching schemes. Gulati and Dandekar [7] stated the antenna state selection problem as a multi-armed bandit problem, and used the criterion of optimizing the arbitrary link quality metrics to solve it. Zhou et al. [8] introduced simple near-optimal Min-Max criterion and selected antenna combinations based on maximum sum-rate (Max-SR) and minimum symbol-errorrate (Min-SER) criterions. Yan et al. [9] modeled the trade-off between feedback overhead and secrecy performance by maximizing SNR of the transmitter-receiver channel.
Data-driven methods. Another way is the data-driven methods which employ supervised machine learning algorithms. Joung [15] used KNN and SVM for antenna selection in wireless communication. KNN and SVM were compared to two conventional optimization-driven methods, Max-min eigenvalue and Max-min channel norm. The experiments have shown that the communications performance of KNN and SVM exceed that of Max-min eigenvalue and Max-min channel norm. Additionally, the computational complexity of KNN and SVM is much lower than that of Max-min eigenvalue and Max-min channel norm.

Deep learning
Recently, DNNs have been employed in many application fields. It has been proved that deep network architectures have amazing processing power even superior to human. DNN [16] with multiple middle layers, has achieved great success on a wide variety of multi-media classification such as image recognition, video classification, speech recognition and natural language processing. CNN [17], an alternative type of DNN, is a more effective network. CNN can capture spatial and temporal correlation of data with little parameters. Recurrent Neural Network (RNN) [18], and its effective variant, Long Short-Term Memory Network (LSTM) [19], are also popular in language modeling.

System model
Consider an MIMO system with N t transmit antennas and N r receive antennas. The channel matrix is denoted as H ¼ ½h ij � 2 C N r �N t , where h ij is the attenuation coefficient between the jth transmit antenna and the ith receive antenna. Let rðkÞ ¼ ½r 1 ðkÞ; r 2 ðkÞ; . . . ; r N r ðkÞ� T denote the received signal, tðkÞ ¼ ½t 1 ðkÞ; t 2 ðkÞ; . . . ; t N t ðkÞ� T denote the transmitter signal, and wðkÞ ¼ ½w 1 ðkÞ; w 2 ðkÞ; . . . ; w N r ðkÞ� T denote the white Gaussian noise. Here k denotes the time count of a discrete time signal. Denote the SNR as ρ, then the MIMO system model can be presented as follows.
rðkÞ ¼ Assuming the channel matrix is known at the receiving terminal, and unknown at the transmitting terminal. The aim is to select N s receiving antennas from the N r receiving antennas so that the channel capacity is maximal. Let B 2 C N s �N t denote the partial channel matrix whose rows are selected from original channel matrix H. The objective function can be written as follows.
where C(B) denotes the channel capacity of B, and I N t is the identity matrix of size N t . B H denotes the Hermitian conjugate of B.

Learning deep CNN for antenna selection
Label generation Data normalization. A channel matrix is seen as a data sample. Firstly, the complex channel matrix H is converted to a real-value matrix by substituting the (i, j)th element h ij with its amplitude |h ij |.
Because each row of H denotes a receiving antennas, we need to perform normalization for each row vector. So the channel matrix is normalized to obtain a scale invariant feature as follows.
Here h i denotes the i-th row vector of H. Channel matrix labeling for training samples. Suppose there are W ways to select N s rows from N r rows. Denote every combination of the way to select N s receiving antennas from N r receive antennas as a pattern class. Each combination is mapped to a pattern class. Then the total number of class labels is W ¼ C N r N t . Some examples of one-to-one match between selected antenna indices and class labels are shown in Table 1 There are seven layers contained in LeNet, including two convolutional network layers, two pooling layers, a fully-connected network layer, a dropout layer, and a soft-max layer. The input of the Table 1

Selected antenna index
Corresponding class convolutional neural network is an 8 × 8 channel matrix H. The first convolutional layer filters the 8 × 8 input channel matrix with 32 kernels of size 3 × 3. Then the first pooling layer is introduced to take the response of the first convolutional layer as input. It normalizes and pools the input into a 3 × 3 × 32 output response. The max-pooling kernels have a size of 2 × 2 and a stride of 2. The second convolutional layer filters the input response with 64 kernels of size 3 × 3. Then the second pooling layer converts the input response to a 2 × 2 × 64 output response. The max-pooling kernels have a size of 2 × 2 and a stride of 2. The fifth layer is the full-connected layer, which is a dense layer accelerating the convergence. It has 1024 full-connected kernels of size 1 × 1. Then a dropout layer, which randomly reset the output of each hidden neuron to zero with probability 0.5, is added behind the full-connected layer. The dropout layer prevents hidden units from relying on specific inputs and solves the over-fitting problem by employing ensemble learning technology. The response of dropout layer is fed to a soft-max layer, which produces a class label. There is an one-to-one correspondence between a class label and an antenna selection result. So we can use the output class label to find the solution of antenna section. The Rectified Linear Unit(ReLU), which form is f(x) = max(0, x), was employed as the nonlinear activation function of all convolutional layers and the fully-connected layer. The cross entropy was employed as the loss function. The gradient descent optimizer was employed. The  batch size was set as 10. The learning rate was set as 0.001. The training loop number was set as 1000000.
ResNet architecture. We have also employed a ResNet architecture for antenna selection. The input is an 8 × 8 input channel matrix. It is connected by a convolutional layer which has 64 kernels of size 3 × 3. Then a bottleneck convolutional layer is connected. The number of layer blocks is 3; the number of convolutional filter of the bottleneck convolutional layer is 16; the number of convolutional filters of the layers surrounding the bottleneck layer is 64. After that, the second bottleneck convolutional layer is connected. The number of layer blocks is 1; the number of convolutional filter of the bottleneck convolutional layer is 32. the number of convolutional filters of the layers surrounding the bottleneck layer is 128. Then the third bottleneck convolutional layer is connected. The number of layer blocks is 2; the number of convolutional filter of the bottleneck convolutional layer is 32; the number of convolutional filters of the layers surrounding the bottleneck layer is 128. After that, the fourth bottleneck convolutional layer is connected. The number of layer blocks is 1; the number of convolutional filter of the bottleneck convolutional layer is 64; The number of convolutional filters of the layers surrounding the bottleneck layer is 256. Then the fifth bottleneck convolutional layer is connected. The number of layer blocks is 2; the number of convolutional filter of the bottleneck convolutional layer is 64; The number of convolutional filters of the layers surrounding the bottleneck layer is 256. The response is then taken into an global average pooling layer. Then a fully connected layer is connected.
The ReLU unit was employed as the activation function of all convolutional layers. The soft-max function was employed as the activation function of fully connected layer. The cross entropy was employed as the loss function. The momentum optimizer was employed to train the network. The learning rate was set as 0.1. The batch size was set as 1280. The training loop number was set as 20.

Implementation
The proposed method was coded in Python on a Windows 7 SP1 OS. CNNs were implemented using the TensorFlow framework. The experiments were performed on a computer with CPU Intel Xeon E5-2660 @ 2.2 GHz, GPU NVIDIA GTX1080Ti, and 64 GB of RAM.

Experimental results and analysis
Experimental setup Simulated data. To evaluate the performance of the proposed method, we randomly generated 500000 channel matrix samples by i. i. d. sampling from a complex Gaussian distribution with mean 0 and variance 1. The number of transmit antennas N t was set as 8. And the number of receive antennas N r was set as 8 too. The number of selected receive antennas N s was set as 2. The SNR was set as 10 dB. The heuristic searching method was used to generate the true class labels of all channel matrix samples. The data are public and can be download at https://pan.baidu.com/s/1O8G29t6IcvwfChr-DAQ1Cw.
Compared methods. KNN and SVM were employed as the compared methods. In the KNN classifying experiment, Euclidian distance was used as the metric measurement. KNN classifier labels the query sample by assigning the most common class among its K nearest neighbors to it. The parameter K was tuned by exhaustive searching. The classification results on the test set were recorded to obtain optimal solutions of antennas selection. For SVM, the radial basis function (RBF) was employed as the kernel function; the parameter gamma in the RBF kernel function and the cost parameter were tuned by grid searching; the one-vs-rest strategy was employed to extend the binary-class classification problem to the multi-class classification problem; the classification result of SVM was recorded to get the optimal antennas subset.
RNN and LSTM were also employed as the compared methods. For RNN, the learning rate was set as 0.001, the training loop number was set as 500000, the batch size was set as 100, the time-step was set as 8, and the number of hidden states was set as 100. For LSTM, the learning rate was set as 0.01, the training loop number was set as 500, the batch size was set as 100, the time-step was set as 8, and the number of hidden states was set as 100.
Other CNN architectures, AlexNet and VGG-16, were also employed as the compared methods. For AlexNet, the learning rate was set as 0.001, the training loop number was set as 20, the batch size was set as 1280, and the "momentum" optimizer is employed. For VGG-16, the learning rate was set as 0.001, the training loop number was set as 20, the batch size was set as 1280, and the "rmsprop" optimizer is employed.
Evaluation. For LeNet, KNN and SVM, the five-fold cross-validation strategy was employed to tune the parameters and compute the evaluation results. The original data were randomly divided into five equal-sized groups. A single group was chosen as the test set, and the remaining four groups were employed as the training set. The cross-validation process was repeated five times so that each of the five subsample groups was employed exactly once as the test set. The classification accuracy was computed from the folds. Then the accuracies of all test folds were averaged to produce an overall accuracy, which provided an evaluation measure of different classifiers.
For VGG-16, AlexNet and ResNet, 1800000 samples were assigned to the test set, and the remain samples were assigned to the training set. For LSTM and RNN, 2000000 samples were assigned to the test set, and the remain samples were assigned to the training set. We computed the accuracy on the test set to evaluate the classifiers.
Moreover, we computed the loss of channel capacity after antenna selection. The channel capacity loss can be employed to evaluate the communication performance. For a test channel matrix H, the partial channel matrixB was generated according to the classification result. Then the channel capacity loss of H can be computed as follows.

LðHÞ ¼ jjCðHÞ À CðBÞjj ð6Þ
We computed the channel capacity loss of all test channel matrix samples, and then calculated the average channel capacity loss of the test set. The average channel capacity loss was employed as the criteria to evaluate the communication performance. Due to the computation of capacity loss cost much more time than that of accuracy, we use the accuracy to tune the parameters of CNN models, such as training loop number and batch size. Table 2 shows the classification accuracies of CNNs and other compared methods. As seen in Table 2, ResNet provides an accuracy of 79.16%, and LeNet provides an accuracy of 49.21%. ResNet has an appealing accuracy and outperforms other listed methods. ResNet can effectively train a very deep network by residual learning. The increase of depth can bring a promising accuracy. The accuracies of AlexNet and VGG-16 are very low. AlexNet and VGG-16 architectures work well for image classification. However, it seems that they do not suit antenna selection data. The accuracies of RNN and LSTM are 24.00% and 60.00% respectively. The results imply that the antenna selection data can be treat as sequence data and are suitable for sequence model. The accuracy of RNN is low, but LSTM greatly increases the performance. That is because LSTM can remember the long-time information and solve the "vanishing gradient" problem. The accuracies of KNN and SVM are 8.29% and 22.12% respectively. Because the size of dataset is very large and the problem is highly non-linear, it is very hard for simple classifiers such as KNN and SVM to achieve a good accuracy.

Results and analysis
Capacity performance are more important than accuracy. We compared the channel capacity loss of CNN methods to that of compared methods. The comparison results are showed in Table 3. As seen in Table 3, the average channel capacity loss of ResNet is 6.24 with variance 0.13; the average channel capacity loss of LeNet is 3.63 with variance 0.49; the average channel capacity loss of AlexNet is 6.76 with variance 0.18; the average channel capacity loss of VGG-16 is 6.78 with variance 0.16; the average channel capacity loss of RNN is 6.72 with variance 0.27; the average channel capacity loss of LSTM is 6.78 with variance 0.27; the average channel capacity loss of KNN is 7.08 with variance 0.47; the average channel capacity loss of SVM is 6.95 with variance 0.34. LeNet do not has the optimal accuracy, but it has the minimal channel capacity loss. Although ResNet has the best classification performance, it do not have much advantage in communication performance. Specially, AlexNet and VGG-16 have very low accuracies but real good communication performance. It shows that AlexNet and VGG-16 try to get sub-optimal labels for test samples in most cases. However, these sub-optimal labels correspond to an enough low channel capacity loss. Overall, the comparison results confirm that LeNet is better than others for antenna selection task. The results prove that using LeNet for antenna selection is a competitive and acceptable choice.
For machine learning based antenna selection, the most difficult thing is that the map between channel matrix and the best antenna index is highly non-linear. Another difficulty is that the similarities between different samples are very small. So it is very hard for pattern classifiers to separate samples of different classes and get high accuracy. CNN provides an efficient way to mine the deep latent meanings of channel matrices. The strength of association between channel matrix and the best antenna index can be enhanced in the deep network architectures. So the deep representation will help the classifiers to approximate the complex map from channel matrix to the best antenna index. The accuracy of LeNet is 49.21%, which seems to be low. However, the real purpose of antenna selection is to achieve the best communication performance instead of classification performance. For a test channel matrix, the true label means the optimal antenna selection scheme. However, for wireless communication, a suboptimal antenna selection scheme is still acceptable if the channel capacity loss is low. ResNet has the minimal misclassified samples. But ResNet has higher channel capacity loss on the misclassified samples than LeNet. LeNet is a less accurate approach, but enable the samples which are difficult to be classify to get a goodenough suboptimal label. Compared to other listed methods, LeNet has better communication performance and is more robust. In this sense, LeNet is better than ResNet and other compared methods for antenna selection task.
We have also analyzed the trade-off between accuracy and speed. Firstly, we have analyzed the relation between classification accuracy and training loop number of CNN (LeNet). The results are listed in Table 4. As shown in Table 4, the accuracy of CNN achieves 42.05% when the training loop number is set as 100000. If the training loop number is set as 500000, then the accuracy arises to 48.68%. That is a great improvement. However, the accuracy only slightly rises to 49.21% when the training loop number arises to 1000000. Increasing the number of training loops will cost more computation power in the training stage. However, the computation speed of real antenna selection system mainly depends on the test stage. And increasing the number of training loops will not lead to the computational cost increase of the test stage. The experimental results indicate that setting the training loop number as 500000 is an acceptable option. It can produce enough antenna selection performance and will not increased unnecessary time cost.
Secondly, we have also analyzed the relation between accuracy and number of samples. For one dataset, 50000 samples are randomly sampled from a complex Gaussian distribution with mean 0 and variance 1, and the five-fold cross-validation strategy is employed to set the test sample size as 20% of the whole dataset. For another dataset, 2000000 samples are randomly sampled from a complex Gaussian distribution with mean 0 and variance 1, and the test sample size is set as 5% of the whole dataset. The comparison results are showed in Table 5. Experimental results show that large training sample size will lead to better antenna selection performance. However, a dataset of 500000 samples is sufficient for building a CNN based antenna selection system.
We have also analyzed the training time and test time of CNN, RNN and LSTM. The training time on the whole training set are listed in Table 6    relation between channel capacity loss and SNR. And Fig 6 shows the relation between variance of channel capacity loss and SNR. If SNR rises from 10 dB to 50 dB, the accuracy and channel capacity loss will descend, and the variance of channel capacity loss will increase. However, the channel capacity loss of CNN is always less than that of SVM and KNN under the same condition. And the channel capacity loss variance of CNN is less than that of KNN and SVM in most cases. Experimental results demonstrated that CNN outperforms KNN and SVM for antenna selection. This work is a preliminary work for validating the idea of selecting antennas in MIMO system by DL. The scale of test antenna system in our experiments is not large. If the number of transmit antennas and receive antennas increases, then the dimension of channel matrix will increase fast. It means that we should generated a larger dataset for training CNN. However, it requires a more expensive GPU with larger memory or even a GPU cluster. Our laboratory cannot afford the cost of a stronger computing hardware, so we only set the number of receive antennas to 8 and the selected receive antenna number to 2. However, the application of CNN based large scale antenna selection still has good future to expect.
Our work focus on data-driven antenna selection. So we only compared our method to state-of-the-art data-driven methods. We have not compared the proposed methods to optimization-driven methods. Detailed comparison between data-driven methods and optimizationdriven methods has been presented in [15]. Joung [15] has proved that KNN and SVM can achieve reasonable performance for antenna selection in wireless communication. And our experiments have shown that the capacity performance of our method exceed that of KNN and SVM. Obvious, the computational complexity of CNN is much larger than KNN and SVM. Although we have not performed experiments on large scale antenna system, it can be convinced by the experience of digit image processing that the test time of CNN would be acceptable and much lower than that of optimization-driven methods on a large scale antenna system.
Another limitation of this work is that we used the simulated antenna data instead of real antenna data to test the deep learning based selection algorithms. However, real antenna data are expensive to obtain, and a rich supply of real antenna data is inaccessible for researchers. This is a common problem in the field of antenna selection. Most works in the field of antenna selection used the simulated data to test the effectiveness of antenna selection algorithms.

Conclusion
This work introduced a receiving antenna selection framework based on deep CNNs and the channel capacity criterion. The proposed methods used convolutional structure to extract rich features from the channel matrices. CNNs were used to train powerful classifiers for selecting antennas. The proposed approach was validated on simulated antenna system data. The proposed method outperformed the state-of-the-art baselines. Our future work will include the improvement of deep networks and the evaluation on real-life antenna systems.