A Real-Time C-V2X Beamforming Selector Based on Effective Sequence to Sequence Prediction Model Using Transitional Matrix Hard Attention

For C-V2X systems, the selection of the best beam in a real-time mode becomes an increasingly critical and yet open topic. Most of the existing approaches adopt either conventional ARIMA or ANN. Recently, there has been research on adopting sequence-to-sequence (Seq2Seq) predictors with attentions to extract time series features and emphasis on critical information to achieve data prediction. In this paper, a Seq2Seq predictor integrating with a Transitional Matrix based Hard attention is presented and validated through an artificial test dataset with predefined transitional states. At first, the transition probability matrix is generated from previous time series data and fed into the “hard” attention module of Seq2Seq predictor to determine the weights during the training phase. Secondly, the presented Seq2Seq predictor was implemented and adopted to predict the best beams of a C-V2X beamforming selector built up by the authors. Experiments were conducted and captured data were used to validate the performance of the predictor. When compared with baseline models, the presented predictor can achieve an enhanced prediction accuracy in a gain of 10-12%.


I. INTRODUCTION
Time series prediction using Neural network has been long studied in many fields [1], such as stock forecasting [2], Weather forecasting [3], traffic flow forecasting [4], Global positioning prediction [5], Wireless Channel prediction [6] and most of the time series prediction follows the existing Neural network methodology such as ARIMA [7], SVR [8], traditional ANN [9] and hybrid neural networks [10], [11]. Traditionally statistical methods such as ARIMA, exponential smoothing was often used for time series forecasting. Armstrong et al. [23] proposed 28 golden rules for time series prediction where ARMA and ARIMA is judged as the best time series prediction method. With the growth of Deep The associate editor coordinating the review of this manuscript and approving it for publication was Sangsoon Lim .
Neural network, there has been only a few time series classification algorithms have been proposed [24]. Wang et al. [36] proposes a combination of Markov-LSTM where the multi-step Markov transition matrix is defined and then the LSTM is introduced to combine multiple first-order Markov chain.
Recently the Neural Machine Translation (NMT) has achieved state of the art performance using various methods such as Encoder Decoder [12], Encoder Decoder with Attention [13], [14] and Transformation [15]. These methods have been used by various researcher for language translation and these methods has been researched for time series prediction [16]. In the Encoder Decoder with Attention, the encoder and decoder are designed using various Neural Network such as RNN, LSTM, GRU. The Attention is the key mechanism which provides improvement from Encoder Decoder model. The Attention mechanism provides information to which input sequence are relevant to each word in the output. Attention is proposed as a method to both align and translate.
Xu et al. [28] proposed hard attention where it attends to exactly one input state for an output, [29], [30] shows a sequence-to-sequence prediction with the hybrid of hard and soft attention. Reference [31] provides a modified hard attention called Saccader for vision by requiring only class labels for initial attention, whereas [32] provides a multiscaled hard-attention architecture for image classification. Reference [33] presents the ''soft'' and ''hard'' attention on Q learning which is based on feature extracted by CNN at different image regions, [34] presents variational attention which is considered as an alternate to both ''soft'' and ''hard'' attention where the attention is set with tighter approximation bounds based on amortized variational interference, [35] shows the use of hard attention by exploring various image attention mechanism to locate regions that are relevant to the question, [36] presents ''hard'' attention for image classification but based on the Bayesian optimal experimental design which helps in the speed up of the training process. The various presented methods are focused on vision, image and textbased classification and prediction. And these methods have proposed either a hybrid of ''soft'' and ''hard'' attention or focus on a single feature based on ''hard'' attention. There has not been much focus on the time series prediction and understanding the relationship between the time variables.
In a time series prediction such as Beamforming selection in a C-V2X system, where the beam must be chosen based on previous input, the above-mentioned methods does not provide any attention to the transitional values. In response to that, the contribution of this paper is as follows: • We showed a ''hard'' but deterministic attention mechanism trainable by pre-determined transitional states • We show how we can gain insight in ''which'' attention is focused on • Finally, we demonstrate the effectiveness of our model by testing on actual measured data in the field, and the experiment results showed that our model has the best prediction performance compared to the baseline methods. Wireless channel status and characteristics demonstrate similarity once the environment is in the same category such as urban, rural, residential, and hilly areas. The main feature of each category channels can be real-time learned and modelled by a transitional matrix and used as a foreknowledge to neural network.
The rest of the paper is arranged as follows: Section II gives the background about problem statement and the motivation, Section III expounds on the experimental measurement data, Section IV analyzes the overall architecture of the encoder decoder with attention based on transition states framework and describes the relevant theory and process details. Section V is the theoretical analysis and Section VI is the experimental analysis content and concluding the paper on Section VII.

II. PROBLEM STATEMENT AND MOTIVATION
With the arrival of big data era, every industry would like to utilize the advantage of neural network for better prediction. Automotive industry has been focused on using advanced neural network for various reasons such as path prediction [17], language recognition [18] and many more in automated driving. Cellular Vehicle to Everything (C-V2X) has been emerging technology within Automotive world which encompasses Vehicle to Vehicle (V2V) connectivity, Vehicle to Infrastructure (V2I), Vehicle to Pedestrians (V2P) and Vehicle to Network (V2N). C-V2X communication is envisioned to enhance the safety of drivers, passengers, and pedestrians. C-V2X system is governed by the National Highway Traffic Safety Administration (NHTSA) and Department of Transportation (DOT). In 2017, the NHTSA and DOT issues Notice of Proposed Rulemaking (NPRM) [25] for the V2V communication by then V2V communication is like to be based on the DSRC defined in SAE J2735 [26]. The technology behind V2V communication expects an implementation of 360 degree ''awareness'' and a range of 300 meters where omnidirectional antennas are adopted.
Omnidirectional antenna gives a complete coverage of 300 meters but increases congestion factor, which is regulated in SAE J2945/1 [27]. In a highly congested vehicular location, a network experiences high data loads which requires reduced radiation powers. On the other hand, reducing power reduces the coverage. An effective way to communicate in longer range without increasing the congestion is implementing beam i.e., beamforming.
Beamforming is a technique in which an antenna array can be steered in a desired direction. The input RF signal is fed to the antenna array in parallel and signals are added constructively and destructively, depending on the phases, in such a way that they concentrate the energy into a narrow beam. In both Wi-Fi and 5G standards, during the antenna training phase of each beacon interval (BI) scanning is performed across all the beams and the optimum one is chosen and adopted during the whole BI. If the same method is performed in the C-V2X system, it will lead to medium or significant non-optimum selection of beam due to rapid variation of direction of arrival (DoA) of multipath signals.
There has been various research going on using Machine learning in Vehicular network [19], most of them focused on channel estimation [20], distance estimation [21], Vehicle trajectory [22] but very minimal in beam prediction [39] and only using traditional methods and nothing on beam prediction using deep neural network. Our research is focused on real-time beam prediction model.

III. SYSTEM MODEL
Beamforming antenna arrays have attracted increasingly attention recently and well found their applications ranging from Wi-Fi, 5G and Internet of things (IoT). In this project, a 4-element uniform linear array (ULA) receiver antenna array built upon a 4 × 4 Butler matrix which will be used to collect the data for the machine learning algorithm so VOLUME 11, 2023 that we can achieve a real time beamforming selection for C-V2X system. Shown in Figure 1 is the system design of the 4 × 4 beamformer designed for the C-V2X system where we have four 5.9 GHz whip antennas, separated by quarter of wavelength (λ/4) which are connected to a 4×4 Butler to form a ULA. A switch box containing SPDT (ZFSWA2R-63DR+) and SP4T (ZSWA4-63DR) is used to select one of the outputs of the Butler switch. the signal between two adjacent antennas within the array creates a phase difference of φ = kdcos θ, where wave number (k) and Array Factor (AF) is given by (1) and (2) respectively, N is the total number of antennas α n is additional phase shift For a broad side antenna array, the AF can be further written as (3), Finally, the beamforming radiation pattern is given by (4), The radiation pattern of the ULA is shown in Figure 2 where the radiation pattern for all the 4 ports is shown. The receiver antenna is connected to the C-V2X onboard unit and the receiver module also has a Raspberry Pi which is used to command the radio as shown in Figure 3. Both the C-V2X onboard unit and the Raspberry Pi is powered using a portable battery (XTPower MP-10000). The receiver unit is placed on top of the car and the entire unit is shown below. The transmitter which is placed on a fixed location has a single antenna which is omni directional and connected to a C-V2X onboard unit and powered by a portable battery as shown in Figure 4. The test is performed at the university campus shown in Figure 5, where the transmitter is placed on one of the parking decks (2nd floor) as shown in figure and the vehicle with the receiver module is driven around the campus and the data is collected throughout the campus which is used for the machine learning validation.

IV. FRAMEWORK MODEL
In this section, we discuss the implementation details of machine learning and the training methodology. We split the dataset into three sets such as: • Training sets: 80% of data set • Prediction sets: 20% of data set In our implementation, for the given data set, we use a sliding window input so that we achieve maximum overlap of sequences and in our training method we use the guided training methodology. In the guided training we feed the actual data as the next input which aims to achieve faster convergence by guiding the model towards the local minima. Whereas during the prediction we use the unguided methodology where we feed the predicted data as the next input as we don't have access to the actual data set during these stages.
Before diving into the details of our framework model, we first brief the limitation of traditional standard beam selection technique.

A. WHY MACHINE LEARNING MODEL?
The straightforward implementation for choosing the beam would be adaptive antenna selection i.e., scanning for the strongest signal on all the beams and sticking to a beam which has the strongest signal until the next Beam Interval. The adaptive antenna selection is implemented in Wi-Fi routers and is being used to extend the range of the signal and for better coverage. In the CV2-X system, the adaptive antenna selection implementation chooses to select a beam every 100msec i.e., every 4λ where λ is the wavelength of 5.9GHz (Change of beam interval is every 100 msec which translates to the length of 4λ). Considering a vehicle speed of 60 mile/hour, the distance moved in every 4λ i.e., 100 msec, approximately 3 meters. In the simulation, 3 meters reflects to 3 data points and a beam was chosen based on the next 3 consecutive data points. For example, if beam 1 is selected during the initial scan, the next three packets will be using beam 1 to receive the signal. Observed from simulation data, implementing adaptive antenna selection has an evident data loss resulting in only 29.41% accuracy. This motivates the effort to use the machine learning in predicting the beam, which aims to achieve an enhanced efficiency of data reception.

B. ENCODER DECODER WITH ATTENTION
Encoder Decoder was developed to address the sequence-tosequence machine translation with a set of input sequence and a set of output sequence. Attention is a mechanism that was developed to improve the performance of the Encoder Decoder RNN on machine translation. From a high-level, the Encoder Decoder model is comprised of two sub models.
• Encoder -The encoder will perform the act of stepping through the input series and encoding the entire sequence into a fixed length vector called context vector • Decoder -The decoder will perform the act of stepping through the output series while reading from the context vector This approach has issues while decoding longer sequence and hence Attention is introduced.
• Attention -Instead of encoding the input sequence into a single fixed context vector, the attention model develops VOLUME 11, 2023 a context vector that is filtered specifically for each output time step. With the introduction of Attention as shown in Figure 6, the decoder output is more specifically focused which provides better prediction. The score is calculated in the Attention model which helps to relate the encoder's all hidden states and the previous decoder's output. The two important scores are proposed by Bahdanau (6) and Luong (5).
where ht is the Encoder all hidden states and hs is the decoder output The weights are learned during the backpropagation i.e., during the training. The weights are normalized and then the context vector is calculated (7).
After calculating the context vector, we will concatenate the context vector with the previous decoder hidden state which will be the input for the next decoder output.
It shall be noted that during the score calculation, the weights are learned during the training i.e., the weights are set as random and then trained during the backpropagation. This method doesn't provide us any insight on how the weights are calculated and in the time series calculation this creates a randomness on the focus in the attention sub model.

C. ATTENTION WITH TRANSITION STATES
In our model, we represent a transition matrix TM, which helps the model where to focus the attention when generating the next time sequence data. The transition matrix is probability of transition from one state to another state which shall be generated from the given data set i.e., given certain state what is the probability of moving to another state or staying in the same state. A method of representing the Transition states is shown through the matrix in Table 1. This transition probability values shall be used in the scores during the attention sub model which shall provide the information of where the focus needs to be for the decoder during the prediction of the tth time series.
When the score (8) is calculated, the weights are determined based on the transition matrix TM.
score h t,hs = h t Wh s (8) where W is the Transition Matrix The weights are determined based on the encoder input time series (a i , a i+1 , a i+2 . . .a i+t ) data and the last predicted The weight matrix is determined based on (9). This provides us the insights on what is the highest probability of time series decoder output which is provided by the previous output and is known to the next decoder state. This also ensures that the conversion is not the traditional language prediction method which is a one-to-one translation. The Weight matrix provides us the time series prediction.
An example is shown in Figure 7 how the Weights W is chosen in the score calculation of the attention sub model. Considering the encoder input time series data with 4 sets of data as ai, ai+2, ai+1, ai and the first decoder loop output as bt-1 and as the decoder output is a subset of the input, we consider bt-1 as ai+3. Considering the input and output, the weights of the score would be P (a i | a i+3 ) = P4, P (a i+2 | a i+3 ) = P12, P (a i+1 | a i+3 ) = P8 and P (a i | a i+3 ) = P4 i.e., it would be P4 P12 P8 P4.
In our model, the encoder part will act like traditional encoder, where it receives the input data and process it. It outputs its last hidden state along with the last cell state to the decoder as input. It also stores all its hidden state of every encoder block which shall be used in the context vector. The decoder initial input is sent by the encoder and the decoder runs in loops. At each time step, the decoder consumes its inputs and states and outputs its last hidden state and last cell state. Decoder uses its last hidden state as the next input to the attention sub model which shall process the data as an input to the next decoder time step. It also uses the last hidden state for the prediction for the current time step.
In the attention sub model, the encoder hidden state is used as one of the inputs for the score along with the weights from the transition matrix TM, and the decoder output. Using the score, the context vector is calculated which shall be concatenated with the decoder output and provided as an input to the next decoder state.
The transition matrix illustration is similar to the state space model, as both are time varying system. But the state space model has the ability to change the number of states, observation, disturbance i.e., a state space model is a dimension varying model and also the state space model can handle the system with nonzero initial condition. On the other hand, transition matrix proposed in this paper is not a dimension varying model incapable of handling the nonzero initial condition because the matrix will be skewed.
The adaptation of transitional matrix in principle is to add statistics information over long term data to attenuation and thus change attenuation from blind unsupervised learning to supervised or semi supervised learning. The transitional matrix and attenuation are added with tunable and time-varying weights during the training to achieve better performance.

D. WHY ATTENTION WITH TRANSITION STATES
The attention mechanism has been developed to improve the performance on long input sequence and especially for image recognition and Natural Language Prediction. The idea behind the attention mechanism is its ability to access encoder selectively during the decoding process achieved by the context vector. The context vector defined by (7) is calculated based on the score given by (8) using the probability distribution as shown in (10).
In image classification and Natural Language Prediction, the weights in (8) are calculated throughout back propagation during the training. In a time-variant system, the back propagation suffers from vanishing gradient problem. The LSTM uses the concept of Backpropagation Through Time (BPTT) to avoid the vanishing gradient problem, but the context and attention block is not part of the LSTM structure and suffers from the vanishing gradient problem. To this end, the transition matrix are formulated to provide the statistical information over long term data for the score and thereafter context vector calculation.

V. THEORETICAL ANALYSIS
To validate the proposed model, we generated a theoretical data set of Antenna Beam 1 to 4 with a total data set length of 1500 with the following probability conditions.  Table 2 shows the condition of how the data set has been generated to validate this model. For example, if Beam 1 is present beam, the probability of next data to be Beam 1 to Beam 4 are 0.1, 0.2, 0.3 and 0.4 respectively.
The generated dataset is uniformly distributed i.e., if a random number is chosen as a prediction, there is a 0.25 probability that the random number is correct i.e., the accuracy is 25% If the transitional matrix is known and is still applicable to future dataset, maximum likelihood estimate can be adopted to achieve the best estimate. Based on the generated dataset the theoretical maximum likelihood is 0.4 i.e., 40% accuracy. This estimate is based on the factor that the previous estimation Beami is correct, or we provide the actual data (Beami) for every Beami+1 prediction. Whereas in the prediction method we always feed the predicted value to predict the next Beam i.e., Beami is predicted and the predicted Beami is fed as an input to predict Beami+1.
Simulation is performed to see the performance of the maximum likelihood where the input Beami is also predicted value which is considered as a known value to predict Beami+1 i.e., unguided methodology. The total dataset is 1500 and we considered the last 200 as the test data. The last known value i.e., dataset 1300 is Beam 3 which is considered as Beami to predict Beami+1. Based on the table Beami+1 would be Beam 2 due to 0.4 probability. For the next prediction we used Beam 2 as the input and predicted Beam 1 based on 0.4 probability. This has been simulated and the accuracy is calculated as 26.5%. Figure 8 shows an example for the difference between guided and unguided methodology based on the Table 2 prediction. It's shown that in the guided methodology, the probability of next Beam is always based on the true data (Example Data) whereas in the unguided methodology, the probability of next Beam is based on previous estimate. Based on the generated dataset, the analysis is performed on the most ''naïve'' forecast which is the persistence algorithm or Walk-Forward validation. The persistence algorithm uses the value at the previous time step (t-1) to predict the expected outcome at the next time step (t+1). We have also performed analysis on our proposed Attention with Transition model and compared with Encoder Decoder with Attention model, both Dot product and Luong's method of implementation. In the decoder model, during the prediction of the test data, the input provided to the attention sub model is the actual predicted values i.e., unguided methodology. Based on this method, the percentage of accuracy is calculated to show the improvement of results. It can be noted that in the theoretical maximum likelihood has 40% prediction accuracy, but it's a theoretical analysis and there are other factors which contribute to this method. We need to know the input to have the better prediction. When we compare the actual prediction model, the analysis showed significant improvement in the accuracy of prediction, where we see close to 12% (28.35 / 23.65) improvement than Encoder decoder with Attention method.
Along with the percentage of accuracy, we also performed Mean Squared Error (MSE) (11), Mean Absolute Error (MAE) (10) and Mean Absolute Percentage Error (MAPE) (13) metric to see the performance of the proposed model. MSE captures the difference between the original the predicted value whereas MAPE captures the absolute error of the prediction and MAPE captures the percentage error. From Table 3, it can be noted that MSE, MAE and MAPE is lowest in our proposed method. The improved performance of the system is because the weights are determined by the transitional state matrix. During the attention part, the transitional state value provides input to the attention where the focus of the decoder should be. In the traditional encoder decoder with attention, the training part determines which encoder part the decoder should focus on, so that the decoder decodes the data based on the attention value. Whereas in our method, the transitional state provides input to the attention state which provides the focus to the decoder and providing the information of which encoder the attention or focus needs to be for the decoder so that the predicted value is similar to the actual value. By providing the attention weights the prediction results are much better than the traditional method.
The main motivation of the attention is at different steps, the decoder needs to focus on different source which are relevant at that step. The attention score is the ''relevance'' of the encoder state to the decoder state. The attention score transforms to attention output which is the weighted sum of the attention weights. The variability in attention score adds up for the attention output. The lesser in variability provides clear definition of which transition encoder to focus on. When the attention score is taken closer look as shown in Figure 9, it can be noted that the variability of the attention score is very small in our Attention with transition method compared to the Luong's method. The variability of the attention score for the Luong's method is 237.4 with the lowest value to be −217.01 and the highest value to be −20.42 whereas in Attention with Transition the variability of the attention score is 30.9 with the lowest value to be −20.01 and the highest value to be 10.89. The reason for the variability is the weights being assigned randomly in the Luong's method whereas in our Attention with Transition method, the weights are determined based on the known data of transition which provides better relevance of the encoder to the decoder state. The attention score provides better capability for the decoder to focus on the right source and leading to better predictability. The analysis is also performed using the actual measured data as explained in Section III.

VI. EXPERIMENTAL ANALYSIS A. QUALITATIVE ANALYSIS
The experiment is performed over the collected data sample as described in Section III. The algorithm is compared with the Encoder decoder with Attention model, both Luong's and Dot product to show the improvement of our system compared to the Luong's method of implementation. The analysis is performed like the theoretical analysis and shows a consistent performance i.e., improved results in the Attention with Transition model on both theoretical and measured data.   Figure 10 shows the prediction results of various models. It can be noted that our proposed method has significantly better performance of predicting the beam compared to the traditional Dot product method and the Luong's method. We show an improvement of 11.5% from the traditional dot product and 10.5% for the Luong's method. The loss curve shown in Figure 11 indicates that the training is better and attains better stabilization quicker using our proposed model. The Attention vector is the score of the corresponding value within the source sequence which tell the decoder VOLUME 11, 2023 what to focus on at each time step. A huge variability in the Attention score provides lower confidence in the decoder which results in choosing the wrong encoder to focus the prediction on. In our test data analysis, the variability of the attention score is considerably lower when compared with the Luong method as shown in Figure 12. The variability in attention score for Luong's prediction is 128.5 whereas the variability in attention score value for Attention with Transition prediction is 35.8, which provides us the better confidence of predicting the value by focusing on the right encoder during the prediction.
The accuracy plots show in Figure 13 indicate the accuracies from the dot product, Luong's method, and attention with transition as 35.5%, 40.3% and 42.1% respectively. This is during the training phase over 50 epochs where the losses have achieved its lowest levels and the accuracies are at their peaks. Along with the percentage of accuracy, we also performed MSE, MAE and MAPE metric to see the performance of the proposed model. From Table 4, it can be noted that MSE, MAE and MAPE is lowest in our proposed method. The prediction results shows that Attention with Transition has a better prediction accuracy compared to other traditional prediction methods.
If a dataset is uniformly distributed, then the random selection of data will result in 25% accuracy i.e., if a data is chosen randomly the probability of getting the right Beam is 25%. Based on this, we can say that if the dataset is uniformly distributed, then the accuracy of random selection would be 25%. In our dataset, the Beam data are not uniformly distributed, and the accuracy will not be 25%. In this dataset, as shown in Figure 14, the total number of Beam 1 is 17% of the data set, Beam 2 is 37% of the data set, whereas Beam 3 is 21% of the data set and Beam 4 is 25% of the data set. If the random selection is Beam 1, the probability of getting it correct is 17% and if the random selection is Beam 2, the probability of getting it correct is 37% and so on with Beam 3 is 21% and Beam 4 is 25%. When this accuracy is compared with our prediction method, we should outperform these accuracies or else the random selection is a better method than the machine learning prediction. When we analyze our predicted data, the probability of Beam 1 prediction is 80% i.e., 80% of the Beam 1 prediction is correct whereas when we randomly choose there is a probability of only 17%. Similarly, the probability of Beam 2 is 55% whereas the random selection is 37%, probability of Beam 3 is 29% whereas the random selection is 21% and the probability of Beam 4 is 68% whereas the random selection is 25%. Table 5 shows the prediction probability comparison between the random selection, and our prediction method which shows that our prediction method performs better than the random selection in all individual beam selection method.  During the test data prediction, instead of feeding the predicted values as input to the next decoder loop, if we provide the actual data to the next decoder loop i.e., guided methodology, the accuracy percentage improves and provides us an accuracy of 46.9%. This method will provide better efficiency of prediction if we know the output values during the testing stage.

B. QUANTITATIVE ANALYSIS
To validate the model across various dataset, we also collected data from different drive zones around the campus as shown in Figure 15 and shown the analysis of the various dataset across the different encoder decoder models. Table 6 shows the performance comparison of various zones. The analysis indicates that Attention with Transition (Our proposed) model performed better than the traditional Encoder decoder model. It shall be noted from Table 7 the performance improvement from the Dot product Vs. Attention with Transition and Luong's Method Vs. Attention with Transition. The performance improvement is calculated from the accuracy percentage as explained in (14). The variance in the improvement as seen is dependent on the dataset. Based on our dataset the variance is between 10 to 12% improvement.

Performance Improvement
= Accuracy of Attention with Transition Accurancy of Dot productorLuong Method (14)  The accuracy of the prediction depends upon the dataset and the prediction accuracy falls with the entropy of the dataset. The entropy provides the information about the randomness on the dataset and our model prediction result follows the entropy of the dataset as well. The entropy is VOLUME 11, 2023  calculated as shown in (15).
The entropy is calculated for the theoretical data and for all the three measured zones and their corresponding accuracy is plotted in Figure 16. It shall be noted that as the Entropy increases the accuracy of prediction decreases which correlates to the Shannon theory.

C. REPOSITORY ANALYSIS
The dataset used to validate the model is the measured dataset around the campus. To validate the model on a 3rd party dataset, we use the Occupant Detection Data Set [37] from the UCI Machine Learning Repository database. The dataset contains the occupied status in a room i.e., if the room is occupied which is recorded as 1 or not which is recorded as 0. We trained the dataset using Dot product, Luong's method, and the Attention with Transition method to see the prediction accuracy. From the prediction results as shown in Table 8, we noticed that our method prediction result has slight improvement compared to the other method. The prediction dataset doesn't have much variability i.e., its either 0 or 1 and most of the values are in 0 and hence the improvement in accuracy is small considered to other prediction method as shown in Figure 17.

VII. CONCLUSION
In this paper, a new Encoder Decoder modified hard attention is shown resulting in enhance performance than the conventional one including Encoder Decoder with Attention (Dot product and Luong's method). The effectiveness of such model is verified using actual test data which was taken at the university campus using the antenna array which was designed for this application. We hope that the results of this paper will encourage future work in using modified hard attention. We also expect that the modularity of the encoder-decoder approach combined with modified attention to have useful applications in other domains. The future work would focus on multi-variate attribute to improve the accuracy of the prediction system.