An astute LVQ approach using neural network for the prediction of conditional branches in pipeline processor

Nowadays, microprocessors use the deep pipeline to execute multiple instructions per cycle. The frequency and behavior of conditional instructions mainly affect the performance of instruction-level parallelism. However, recent processors still have problems with the correct prediction of conditional branches. Firstly, the perceptron neural network and global-based perceptron prediction has been exploited and implemented. Further, a new approach, linear vector quantization (LVQ) neural network, is explored and implemented to see its possibility and potentiality as a branch predictor in terms of accuracy rate. Simulation is performed by varying the parameter of hardware budget and the length of history register using different trace files for identification of the best branch predictor technique. The proposed LVQ perceptron branch predictor achieves an 85.56% accuracy rate using a hardware budget and an 86.36% accuracy rate in terms of history length by comparing the simulation results.


Introduction
In a microprocessor design, the pipeline is a prime highperformance technology as it enables high clock rates and instruction-level parallelism (ILP). The recent generation of processors like Pentium has been towards deeper pipelines to allow the increased clock speeds. As the pipeline becomes more in-depth, the controlling hazard due to conditional branches incurs the instruction's execution flow. In this case, the pipeline would have to wait for the branch output before fetching the next instructions. Precise prediction of branches with high predictive accuracy is required to resolve the problem in the pipeline. The conditional branch to be identified in the pipeline during the real-time prediction process. During the fetch step in the pipeline, the branch result and the target address must be predicted when the branch is found. However, the accuracy of the correct prediction of conditional instructions can lead to better performance of the processor.
The branch prediction methods fall into two categories.
• Static Branch Prediction This prediction is the simplest of all methods as it has a predetermined branch action during the entire process. In this, the prediction is fixed during the compile time.
• Dynamic Branch Prediction In this prediction, the processor uses hardware to store information about recently executed branches and their outcomes. Mostly, dynamic branch prediction techniques are based on pattern history tables (PHT) of saturating counters. Saturating counter prediction is limited with the branch history register and local information of the recently executed branch.
However, there are many algorithms like bimodal prediction, index sharing prediction, global and local based prediction, and a hybrid-based prediction that is extensively implemented to predict conditional branches. These algorithms are straightforward and achieve a normal prediction accuracy range.
It is recently possible to use machine learning techniques to improve the performance of the processor by replacing the saturating counter with the number of the perceptron since neural networks are known to provide better prediction accuracy using artificial neurons. The use of artificial neurons in perceptron is relatively better as their training process is speedy.
The rest of the paper is as follows: Section 2 gives a literature survey of the different techniques used to predict branches. The machine learning-based branch predictors are proposed in section 3. Section 4 describes the simulation framework for the branch predictors. The simulation results of each branch predictors are presented in section 5. Finally, in section 6, the research paper concludes with future scope.

Contributions
In this paper, the concept of machine learning is explored and tested to predict conditional branches.
• Firstly, the perceptron based neural network and the global-based perceptron prediction has been exploited and implemented. • Secondly, a novel approach linear vector quantization (LVQ) neural network is proposed and implemented to see their possibility and potentiality as a branch predictor in terms of accuracy rate. • Simulation is performed by varying the parameter of hardware budget and the length of history register using different trace files for identification of the best branch predictor technique. The rest of the paper is as follows: Section 2 gives a literature survey of the different techniques used to predict branches. The machine learning-based branch predictors are proposed in section 3. Section 4 describes the simulation framework for the branch predictors. The simulation results of each branch predictors are presented in section 5. Finally, in section 6, the research paper concludes with future scope.

Related Work
Many researchers have tried to compare various branch predictors to highlight the efficiency of their approaches. So, this section presents the state-of-the-art of different branch prediction strategies for the prediction of branches.
Calder [1] provided the prediction of conditional branches using static and compiler-time branch prediction. This prediction is based on the preliminary information about the program that the compiler can readily determine. The significant drawbacks of Calder prediction are unable to use the dynamical prediction of branches. However, his static compiler optimization scheme providing extra information to dynamic branch predictors.
Franklin et al. [2] used the local and global history for the identification of branches. This concept tracks the run time behavior of an instruction in the front-end of the pipeline. This mechanism has been established for each dynamic branch predictor that regulates the computation that affects the expected branch outcome.
Vinten and Florea [3] implemented the conditional branch uses a back-propagation algorithm as a multilayer perceptron. Based on the same prediction information, this approach predicts the target address for indirect jumps. Moreover, the author suggests this approach to be more efficient for the prediction of branches but increases the hardware cost and complexity.
Tarjan and Skadron [4] introduced the hashed concept using the combination of gshare and perceptron branch predictor. This proposed predictor reduces aliasing, having a low hardware budget, and increase the accuracy in correlating predictors.
Peram and Sudhakar [5] presented a piecewise neural branch predictor for improving the perceptron branch predictor's accuracy. In this predictor scheme, a hyperplane is utilized to choose the conditional branch prediction. The main feature of this predictor scheme is to remove the complexity and give rise to more accurate results for improving the processors' performance.
Smith [6] proposed a feed-forward network based on the concept of a machine learning method. Further, a combined predictor using a saturating counter is also analyzed to compare accuracy and mis-prediction rate. Unfortunately, this scheme enhances accuracy, but tradeoff occurs between the number of hidden units.
Mao et al. [7] proposed a deep learning-based algorithm for branch prediction. The author considers branch prediction in this paper as a classification problem and contrasts deep learning efficiency with current branch predictors.
Su et al. [8] proposed a correlation-based hybrid branch prediction for the conditional branch. This approach combines the concept of static as well as dynamic branch prediction. To significantly boost the branch prediction accuracy, the dynamic branch predictor uses the branch correlation data. At the same time, the static profile based correlation is used to identify the branches.
Shah and Prabhu [9] implemented a hybrid branch predictor with higher predictive capabilities than global branch predictors. In neighboring branches and local branches, the branch prediction accuracy is enhanced by basing prediction. Jimenez [10] described two versions of perceptron predictors by taking the parameter of long history lengths. To explore the feasibility of predictors, a circuit-level design of the perceptron predictor is designed. Further, it shows that in modern CPUs, the complex perceptron predictor can be used by providing a simple CPU., quicker and more feasible than the hybrid predictor.
In literature, the branch predictors lack performance because of the utilization of smaller history length, lack of hardware budget, and counter-based system. For this, there is a lack of fast fetching and executing the process of instructions. The proposed method is intelligent, contributing to producing beneficial and accurate results by including mathematical based calculation and using the training module. Improve the performance by a small increment, but it does not use the store history tables of an instructions [11], [12] Two-level predictor Uses two separate levels of branch history tables. However, the trade-off between sizes of two-tables occur [13], [14] Index sharing predictor The size of the history table is large as compared to the twolevel predictor's. Hashing together branch history register and PC leads to better accuracy in processor performance [15], [16] The agree Predictor Reduce destructive aliasing interference by reinterpreting the pattern history table counter [17], [18] Hybrid branch predictor Combine two or more predictor's to make one final prediction but sometimes partially misunderstand the hybrid path at the time of prediction [31] [21] The piecewise linear neural branch predictor It provides much greater precision but dramatically increases the overhead of control pointing and recovery and the number of adders.

Machine Learning Branch Predictors
Recently, machine learning becomes a new research focus and significantly improves the performance of processors. In this section, the description of the machine learningbased branch predictors is explored with their algorithm of how they can be used to predict conditional branch instructions.

Perceptron based branch predictor
A perceptron is one of many processing elements within the artificial neural networks. A perceptron is a learning device that takes input values and combines them with weights to produce an output. Figure  x is always set to 1 represent bias input. This allows the perceptron to learn its activation threshold.
The output y is expressed in mathematical: Here, The output of the perceptron is a dot product of data and weight. The perceptron's output is y ; if the 0 y ≥ branch is predicted to be taken, else branch is expected to be nottaken.

Figure 1. A simplest perceptron structure
The basic block diagram in figure 2 represents the role of perceptron for the prediction of branch instructions. For the prediction of conditional branch instructions, a perceptron uses a table in which n number of a perceptron is stored. A training algorithm is used to train the module when the outcome is not equal to the actual result. For the prediction of the branch using perceptron branch predictor, the following steps are to be taken: (i) The branch address is hashed to the table of the perceptron.
(ii) Select the perceptron for computing the output.
(iii) Compute the branch prediction y if y ≥ 0, prediction result to be saved if y ≤ 0, prediction to be updated using train function (iv) Train the selected perceptron using branch outcome.
(v) Update the trained perceptron back to the table.

Algorithm: Perceptron Based Branch Predictor
Step 1: Use the program counter to select the input branch; Step 2: Get the weight vector of each input branch ; Step 3: Compute output using weight vector and input branch; Step 4: Make prediction based on output; Step 4.1 If (prediction = incorrect or below threshold); then (adjust weight vector using train function); Step 5: If (prediction = correct or above threshold); then { increment weight and return 1, if taken else decrement weight and return 0, if not-taken } When the actual performance of the branch is known, a training algorithm is used to update the predictor. The training algorithm uses a threshold parameter to control the magnitude of the weight value. The threshold value is optimal to be [1. 93 14] h θ = + , where h represents the duration of the history bit. The following algorithm is used to train the value of the perceptron.

Algorithm: Train Perceptron function
With this algorithm, the perceptron trains its weight table, achieving more accuracy while predicting the branches. One of the limitations of using perceptron is they are only capable of learning linear separable function. The global perceptron branch predictor overcomes this limitation by using the linear inseparable function.

Global perceptron branch predictor
The global branch predictor is one of the best prediction schemes among the correlating branch prediction schemes. The prediction of this scheme is based on the history table of recently executed predicted branches. To index the bits in the history table, XOR to be used of the least significant bit of the currently executing branch address and the history of the recently completed branch instruction. In this predictor, the perceptron table is indexed by the correlation of bits assigned by the XOR of branch address and the speculative global history register. The branch address holds the address of currently executing conditional instructions, whereas the global history register holds the instruction's prior information. The perceptron is trained according to their cumulative prediction. The great advantage of using this predictor for the branch prediction is that each weight is fetched with a different mapping to get a more accurate prediction.

Algorithm: Global Perceptron Branch Predictor
Step 1: Use the program counter to select the input branch; Step 2: Get the weight vector of each input branch ; Step 3: fetch the address in table of perceptron using xor function Step 4: Make prediction based on output; Step 4.1 If (prediction = incorrect or below threshold); then (adjust weight vector using train function); Step 5: If (prediction = correct or above threshold); then { increment weight and return 1, if taken else decrement weight and return 0, if not-taken } Table 2 depicts the example of XOR used in the global perceptron predictor. Using this indexing, they can predict some linearly inseparable branches and overcome the perceptron branch predictor problem. The table utilization is significantly improved, which makes global predictors achieve higher prediction accuracy with the same hardware storage as compared to the perceptron branch predictor.

Proposed Linear Vector Quantization Neural Predictor
A proposed linear vector quantization (LVQ) predictor is based on the supervised competitive artificial neural network. This technique is associated with the neural network class of learning algorithms. LVQ consist of codebooks class of different parameter to refine the statistical analysis of any complex problem. In conditional branch instructions, the first codebook vector vt represents the branch taken outcomes, and the second codebook vector vnt represents the branch not-taken outcomes. For the correct predictions, the vector value is to be increased when the prediction to be accurate else the vector value is to be decreased. For computing the output outcomes, the results are based on the hamming distance between the input vectors and the codebook vectors associated with that particular input. The hamming distance is calculated as : Here, y = Prediction Outcome , i x = Input Vector , i v =

Codebook Vectors
To train the codebook vectors, the particular vector value is adjusted as : • If the target value is equal to the prediction then update the codebook vector by • If the target value is not equal to the prediction then update the codebook vector by Here, The LVQ neural model has continuously trained the weights and provide faster processing than other neural models. This model's main advantage is to reduce the more massive data sets into a smaller number of codebook vector for the easy classification.

Simulation Framework
In this section, we describe the details of the simulator, trace-files, and parameters used to predict branches.

Simulator and trace file
Each branch predictor scheme's simulation is implemented on python numpy and pycharm simulator for visualization and computing the input trace files. The trace file is the text files with space-separated branch addresses and their actual outcomes. The trace files are trace1k, trace2k, trace5k, trace10k, trace20k, trace40k and trace files, including different instructions.

Influences of parameters
To evaluate each predictor scheme's performance, the varying range of hardware budget and history length is taken. The hardware budget is related to the memory size of the processor in terms of kilobytes. The hardware budget range is 4kb, 8kb, 16kb, 32kb, 64kb, and 80kb. At the same time, the history length relates to storing information on instructions in terms of bits. The range of history length is 4bit, 8bit, 16bit, 24bit, 28bit, and 32bit. The different size of trace files is used as input contains the branch address and the outcomes of the instructions. To identify the accuracy rate, each branch predictor is analyzed by varying the hardware budget and history length parameter using each trace file. The hardware budget-related to the memory size of the processor in terms of kilobytes. At the same time, the history length relates to storing information on instructions in terms of bits.

Experimental Results
This section quantifies the performance of branch predictors and compares the results in terms of their accuracy rate. For performance assessment, a set of trace files was used. To test the branch predictor's performance, we evaluated the impact of changing branch hardware budget and changing history length value with different track files. The statics results of accuracy rate is termed as : TotalNumberofInstructionExecuted ≡ Here, Total number of address hit = Prediction on conditional branch instruction is equal to their actual address path. Total number of instruction executed = Number of conditional instruction taken as an input.

Prediction based on hardware budget in term of accuracy rate
To evaluate the performance of branch predictors, different trace files are tested with varying budgets of hardware. The hardware budget related to the memory size of the processor in terms of kilobytes. The range of hardware budget is 4kb, 8kb, 16kb, 32kb, 64kb, and 80kb. Each predictor's accuracy is based on the actual prediction of the branch instruction accurately match with the predictor outcomes. The results of each branch predictor in term of accuracy rate is presented in table 4 to table 6. The comparison results show that the proposed LVQ branch predictor is more accurate and provides a higher accuracy rate when the hardware budget is varying.   Figure 4 shows how the accuracy rate is changed by increasing the hardware budget's size for the branch predictors. Prediction accuracy is the number of branches correctly predicted over the total number of branches. The size of the hardware budget increases with the amount of pattern history tables in which the information is processed.
The prediction accuracy is varied between 79.63% and 81.93% in perceptron based branch predictor. In the global perceptron branch predictor, the accuracy range varies between 81.47% to 85.94%. In the proposed LVQ branch predictor, the accuracy range varies between 85.12% to 86.57% and provides a better accuracy rate than the other two predictor schemes.

Prediction based on history length in term of accuracy rate
The impact of history length on the prediction accuracy has been studied for a while. So, to evaluate branch predictors' performance, different trace files are tested with varying history length. The range of history length is 4bits, 8bits, 16bits, 24bits, 28bits, and 32bits. Each predictor's accuracy is based on the actual prediction of the branch instruction accurately match with the predictor outcomes. The results of each branch predictor in term of accuracy rate is presented in table 8 to table 10.    Figure 5 shows how the accuracy rate is changed by increasing the branch predictors' number of history lengths. The overall prediction accuracy is improved as the number of entries in the history length increase.
The prediction accuracy is varied between 80.83% to 82.92% in perceptron based branch predictor. In the global perceptron branch predictor, the accuracy range varies between 83.63% to 85.97%. In the proposed LVQ branch predictor, the accuracy range varies between 83.90% to 87.04% and provides a better accuracy rate than the other two predictor schemes.  The proposed LVQ branch predictor improves the accuracy rate by varying both parameters (Hardware Budget and History Length) over the perceptron based branch predictor and global perceptron branch predictors.

Prediction Result in term of Confusion matrix and F-Score
To obtain the precision results, all the input trace files are profiled to determine each predictor scheme's branch decision. A confusion matrix for all the branch predictor scheme in each input trace files give some interesting insights of its actual behaviour. The confusion matrix for each branch predictor according to the input trace files has been represented in this way: Actual Value Taken (T) Not-Taken (NT) Prediction Value Taken (T) TT TNT Not- Taken (NT) NTT NTNT Here, Actual value means the actual outcome of the branch instruction.
Prediction value means the prediction uses during the run time on the branch instruction. Taken means the prediction has been applied on the branch instruction. Not-Taken means the prediction has not been applied on the branch instructions. Further, the F-score is also calculated and it is a harmonic mean of precision and recall. The F-score is calculated using the given below formula: _ (2 * Re    The confusion matrix and the f-score of input trace file 1k and trace file 2k is presented in figure 10 and figure 11 respectively. The result is changing according to the branch predictor and the input trace files. The similar results has been obtained using other trace files for global perceptron branch predictor. The proposed linear vector quantization branch predictor gives better results as compare with the perceptron based branch predictor and the global perceptron branch predictor. It improve the fscore by 0.04 and 0.10 using trace file 1k and trace file 2k respectively over the perceptron based branch predictor and 0.03 and 0.07 using trace file 1k and trace file 2k respectively over the proposed linear vector quantization branch predictor.    Table 12 shows the comparative analysis of different methodologies used for the prediction of the conditional branch. It clearly shows that: • The accuracy rate of the proposed LVQ perceptron branch predictor is 4.32% higher than the perceptron based branch predictor and 1.98% higher than the global perceptron branch predictor as the effect of hardware budget is varying.
• Further, the results in varying-parameter history length also improve the accuracy rate of the proposed LVQ branch predictor by 3.28% higher than the perceptron based branch predictor, and 1.24% higher than the global perceptron branch predictor.

Conclusion and Future Trends
In this paper, the concept of artificial intelligence based neural networks is explored and tested to predict the conditional branches. Firstly, the perceptron based neural network and the global based perceptron prediction has been exploited and implemented. To add more preciseness, a novel approach LVQ neural network is proposed and implemented to see their possibility and potentiality as a branch predictor in terms of accuracy rate. This neural based branch predictors replace the saturating counters into the training function. Furthermore, the propose LVQ approach achieves better accuracy results than traditional branch predictors. Simulation is performed by varying the parameter of hardware budget and the history length using different trace files for identification of the best branch predictor.
The obtained results suggest that the proposed LVQ perceptron branch predictor provides increased accuracy rate of 85.56% by using a hardware budget and an 86.36% accuracy rate in terms of history length. The accuracy rate of the proposed LVQ perceptron branch predictor is 4.32% higher than the perceptron based branch predictor and 1.98% higher than the global perceptron branch predictor as the effect of hardware budget is varying. Further, the results in varyingparameter history length also improve the accuracy rate of the proposed LVQ branch predictor by 3.28% higher than the perceptron based branch predictor, and 1.24% higher than the global perceptron branch predictor. These improvements make this predictor a more promising choice for future processors.
According to this research paper, the concept of neural predictors could be a useful approach for understanding the process of branch predictors. Further, this concept can be used by using some other methods like backpropagation, support vector machine algorithm for better improvement in the accuracy rate.