Region-Aware Trace Signal Selection Using Machine Learning Technique for Silicon Validation and Debug

: In today’s modern design technology, post-silicon validation is an expensive and composite task. The major challenge involved in this method is that it has limited observability and controllability of internal signals. There will be an issue during execution how to address the useful set of signals and store it in the on-chip trace buffer. The existing approaches are restricted to particular debug set-up where all the components have equivalent prominence at all the time. Practically, the verification engineers will emphasis only on useful functional regions or components. Due to some constraints like clock gating, some of the regions can be ignored during execution. Likewise, some of these regions can be verified deeply and have minimum errors compared to other control regions. The proposed system focusses on random signals that identify more errors which are prone to signal selection technique with low area overhead. To enhance the observability, a machine learning technique is developed. Based on the training samples of smaller designs, a model is developed to find out the contiguous neighbours of each flip-flop. This can eliminate the obstacles of unknown signals. This system demonstrates using Opencores and ISCAS’89 benchmark circuits that result in easy and fast error detection compared to the state-of-the-art of other methods. This is also verified using gate-level error models by cross-validation of each debug run.


Introduction
As mentioned earlier, the main challenge of verification or validation is to observe and control the internal signal states. To improve those signal states, only a small set of signals [Ko and Nicolici (2009) ;Liu and Xu (2012)] are traced and stored it in the online trace buffer because of design overhead constraint. These stored data can be helpful for debugging by off-line debug. The problem of limited observability is more challenging so that the restoration ratio technique has evolved ; Liu and Vemuri restoration, not on error detection. The timing errors have been detected recently and these approaches are based on the same set of signals (i.e., static signals) for complete execution of the system. Without considering static signal, an alternate signal method was developed by Mitra et al. [Mitra and Park (2008)] which is very specific to temporal distribution of errors. After that, a multiplexed based signal selection method was developed. However, their approach doesn't consider the heuristic approach of both temporal and spatial distribution of errors. The dynamic method describes how to generate the input sequence and forgets to focus on trace signal selection that results in an error. Later, this dynamic method [Yigit, Zhang, Li et al. (2017)] considers only on state restoration rather than error findings. Finally, it concludes that these approaches are based on test vectors, not on active regions. Later the dynamic method focused on active region but it forgets to concentrate on dynamic errors. Hence, the proposed method focused on the random signal selection that considers a different set of active regions which helps to detect the dynamic errors easily. Instruction Footprint Recording and Analysis (IFRA) technique was proposed [Rahmani and Mishra (2017)] to detect the errors using on-chip observability mechanism with minimum cost. This mechanism contains thousands of logic gates that results difficulty in debugging at the gate-level. For bug localisation, the Machine Learning Method (MLM) have been used recently  ;Jindal, Kumar, Jindal et al. (2018)] in both the stages of verification and validation. During testing, a large amount of information can be collected using some learning techniques like regression, clustering, etc. Based on the failure states the clustering technique is applied. Hence, this method helps to extract the bugs from the obtained results. This paper focused on observability expansion of debug data through gate-level error localization technique. These data can be accessed via onchip trace buffers. The contiguous neighbor helps to make a decision on unknown signal states by increasing the restored signal states and with the traced data. This methodology gives a reasonable solution compared to other methods. The first approach helps to find the neighbors using distance calculation. The next approach helps to find out the feature sets based on the physical structure. These approaches can be extended to localize the errors in electrical nature. Finding the neighbors through the distance calculation and through the machine learning model helps to achieve accuracy. It will be a promising way for further investigation.

Signal restoration vs. error detection
Signal restoration plays a vital role in signal selection for the past few years. Let's contemplate the two-input OR gate, having the inputs of x, y and its output is z. If the output is 1, then any one of the inputs should be 1. Fig. 1 shows an example circuit that comprises 12 number of flip-flops, the signals are traced between input and output. Tracing of flip-flops between A and C in the first cycle, it aids to reinstate the D flip-flop value in the subsequent cycle since the OR gate input is detached. Likewise, I and K are connected through a NOT gate and it can be easily traced in t-1 cycles. From this, we can understand that the tracing can be done in both the forward and backward direction. It is evident that backward restoration is meaningful for error detection. For e.g., When the flip-flop D is having an error, then it will propagate through forward direction from F to G flip-flops of fan-out cone. To identify these bugs, those flip-flops should be traced in a backward direction. Tracing the fan-in cone of D is not useful to find out those errors. The errors can be detected with the help of restoration property. Concentrating only on restoration performance does not help to improve the internal signal observability and controllability.

Figure 1: An example circuit
As discussed earlier in this section, observability and controllability are the two main obstacles for error localisation. This is the most stimulating part at the granularity of FF or at the gate-level visibility improvement that delivers very few FFs. There is a large mismatch between the simulation phase and validation phase that results the debug process more difficult. To manage the disparity of signal value, a matrix completion is used based on the machine learning problem. This supports the obtained signal states with the known flip-flop values. If the contiguous neighbour of a particular flip-flop is identified then its corresponding state also can be identified easily. The state may be either 0 or 1 that aids to find all other neighbour flip-flop values. This can find all the signal values in each clock cycles of signal restoration that outcomes full internal signal visibility. The flip-flop states may not be identified in all the states. Those states are mentioned as X. Because of excessive storage requirements, it is common to assume in the debug stage that not to trace the inputs. The contiguous algorithm decides the contiguous flip-flops based on some features that are related to the particular circuit design. It may be associated with the design structure or simulated values. The netlist is used to perform the simulation of 5 clock cycles to identify its behaviour and state restoration compared to original values.  6 whereas Tab. 2 shows the proposed state restoration that considers gate value gives 3.8 as the restoration ratio. One can think using of flip-flops at the input and output side can be an area overhead. In Fig. 2 and Tab. 2, it is added for better understanding that considers G3=0.  [Jindal, Kumar, Jindal et al. (2018)]

Problem Formulation
The main aim of this paper is to progress the signal selection process in a random manner to maximize the dynamic error detection in each active region. The error detection can be done by where i is the number of signals that can be detected using Di with the trace buffer width 'w'. According to several industrial studies, errors are not equally disseminated in the circuit, whereas they are grouped in several regions. Assume that these regions in the post-silicon validation are similar to the pre-silicon stage. The circuit is divided into several regions (called functional regions) either based on the highest node value or by its components and each region has one or more error zones (Ez). Assume that pre-silicon engineer gives the information. For example, in a multi-core System-on-Chip (SoC) each core can customize the area. If one region has more error zones, then it can be divided into several regions provided some functional boundary and these zones are present inside the circuit. If the fetch and the decode units are in one region then remaining things are in another region. There is a tradeoff between error-prone zones and number of regions. If the circuit is divided like one error zone for a single region then there will be too many regions and thus creates a computational complexity with area overhead. Conversely, a larger region having many error zones can reduce the complexity of random signal selection. Fig. 3 shows the graphical representation of Fig. 2 and it is divided into q regions and each is having one or more error zones and here we consider single error per single region. Contemplate 'n' signal states can be stored in the trace buffer for the width 'w'. Some of the functional regions are being active for any specific cycle due to clockgating or due to some other cause. If the signal transition does not happen in some regions, then that region is considered as an inactive region. These are the two extremes. If all the regions are active then the useful signals are traced with the help of tracing algorithm and it gives proportional importance to all the regions whereas if one region is active then only that region will be focused to trace the signals. This methodology follows that select the best of 's' signals from q regions from the total set of s×q signals. It should be noted that the error detection in Ezi from any of 'q' regions. These signal selection techniques were implemented in efficient hardware for without area overhead. Fig. 3 shows the region separation of Fig. 2 based on highest node value. After separating the regions, we can get the restoration ratio in 3 clock cycles because of parallel computation. Each region may depend on the other region or an independent region like Fig. 4. Each region may have the same or different error zones. These are interconnected to any other regions.

Figure 4: Illustrative of regions with error zones
The contiguous algorithm determines the metric between the points (node). One such metric is Euclidean Distance (ED) that measures the dissimilarity between the two points in Jindal et al. [Jindal, Kumar, Jindal et al. (2018)]. The distance between the two points a and b having the line segment of . The coordinates of a and b are (a1,a2,….an) and (b1,b2….bn) respectively. The distance between these two points are in Pythagorean format is given by, Since ED is a straight line distance calculating method, it is not suitable for finding the length between the two points of the circuit. Because the circuit on the chip may not be connected in a straight line. For finding the adjacent neighbours, the existing method used K-D tree, Ball tree, etc. But these methods perform poorly for the high dimensional matrix (>60) than a brute force linear method. So, the proposed system uses locality sensitive hashing that effectively used for the high dimensional matrix with the Manhattan Distance (MD) for accurate length/distance calculation. This metric gives accurate measures which are having less complexity. For this, we need to pre-process the points in a graph, and the new point is assigned to k-nearest neighbours. To find the contiguous neighbour, start from a random point and move to the neighbour nodes closest to your target. When its distance cannot be improved then that node is your nearest target. Manhattan that finds absolute differences between the coordinates. It is also recognized as taxicab, city block distance and rectilinear distance. It defines the sum of the length of the projections between the coordinates of the line segment. The calculation is given by, Figure 5: Difference between Manhattan distance & Euclidean distance The Manhattan distance between the gates and the flip-flops are identified using a contiguous neighbour algorithm. Fig. 5 shows the basic difference between the two metrics. The distance between the flip-flops and the gates has identified in the forward direction for G3 and it is shown in Fig. 6 using the Manhattan metric. Based on this calculation the neighbours are chosen. This distance helps to determine the unknown values of X. For e.g., if 3-input AND gate is having the inputs of 1,X,X respectively the output cannot be determined. To trace the unknown value of this gate can be identified using the algorithm. If the second input distance is longer than the third input, then the third input value can be identified and taken first. If the value of third input is 0, then the output can be traced as 0. For cases like this, the proposed technique will be useful. If the third input is 1, then the second input should also be identified. In most of the cases, the minimum distance will find the value of unknown, so that more number of signals can be traced within the limited bandwidth. Fig. 6 has 4 neighbours and the signal states are depicted by alpha. But the derived states vary with the neighbours and it is depicted by beta. Tab. 3 shows the expanded internal signal state values that illustrations number of disparate states values are fewer for less neighbours.
The signal states of the neighbours of gates and flip-flops will be updated dynamically. However, the distance will justify the choice of the neighbours. It is difficult to vitrine all the merits using this simple example here since most of the signal states are restored through state restoration. We found around 60% of internal signals are unknown in larger circuits. This indicates the necessity of improving the observability even after state restoration.

Proposed methodology
The first step is to build a graphical representation of a circuit (Fig. 2) to find the error propagation probability from each node to the other node in the same region or from a different region. The profitable set of signals are selected from each q region using Algorithm 1. The edge between the nodes is regardless of the type of node. G3 is the node that probable sources of error propagation in the forward direction. Any error that propagates through this node in region 1 which are regardless of fan-out cones. Consequently, error from gate 3 propagates through all the gates and the flip-flops towards the output. The probability-based node value calculation can be done based on dependent and independent paths of each node. However, based on the number of input and output connections the highest node value has found and it also gives the same restoration value like Pedregosa et al. [Pedregosa, Weiss and Brucher (2011)]. The regions are separated with the help of highest node value. The only difference is that the computation time is less when we select the highest node value selection than the probability approach. Hence, the proposed method achieves less time compared to the existing method in this step. The signal selection can be done using the following algorithm. Algorithm 1: Region aware signal selection In: Circuit, TBW, E z; Out: list of s selected signals (s1…si) 1. Generate a graphical illustration of a circuit, then split it into q regions having an error zone Ez 2. Initialize Si=null 3. Calculate the probability of an error for each node s that propagates to Qi 4. While the region Qi is empty and signal Si is not having i number of signals do 5. Select the highest node value j and add this to the list Si=Si U j 6. Remove the node j from the list after computation to prevent overlap from Qi 7. for each node in Qi, calculate the error propagation probability of Ezi and then sum it 8. end 9. return the list with selected signals

Error propagation computation
It defines the probability of an error that propagated from input to output of a circuit/ gate. For multiple gates, it is essential to describe the dependent and independent paths. This paths are well-defined by the sequence of logic gates from the source to destination. These probability of an error occurs in the single gate of an AND and an OR gates are 1 1 = 2≤ ≤ 1 and 1 0 = 2≤ ≤ 0 respectively. In detail, for any error (0/1 or 1/0) that propagates towards an output, then all the remaining inputs of AND gate must be 1. For example: if the output of AND gate is 1, then all the inputs of AND gate must be 1, if the output is 0 then any one of the input must be 0 and it is irrespective of i1. So the error may be undetected and we can assume that all the remaining inputs of AND gate are independent. The probability of all the remaining inputs is represented in the above equation that the probability of an error i1 will propagate to the output o1. Similarly, the calculation of NAND gate can be done. In place of an OR gate, the tactics is same but the state may be held at 0 and the probability of getting an error i1 with respect to 0 is mentioned in the above equation. Similar computations for NOR gate. These computations are only for an individual gate. For multiple gates, we need to consider both dependent and independent paths.  In Fig. 7, from a source to destination, the internal signal probability is 50% since the state is 0 or 1. This is calculated for Fig. 1 with region Q1 for better understanding. For example, there is a single path from A to F and the probability of getting an error through D is 0.25 since there is an OR gate between A and D, AND gate between D and F. The probability of getting an error between these paths are 0.5 and 0.5 respectively. By multiplying these paths, the overall probability of getting an error is 0.25. So, the overall error propagation

Multiple gates by independent path
Like the above-said statement, there are two paths between A and G. From those, one such path is through A,E,G and A,D,F,G. The computation is similar to the previous one. The overall probability of getting an error in the path A,E,G is 0.25 and for A,D,F,G is 0.125. The maximum probability is always dominated the detection of an error. So it can be calculated as ( , ) = ( ( , , ) , ( , , , ) ). In general, the error propagation probability is demarcated as ( , ) = ( ( 1, 2…… ) ) . From this, the proposed system finalizes some of the features that can calculate the error propagation score of each gate. Because the training model must be generic and it is applied to Design Under Test (DUT) to get the best results. For modeling technique ML, the following features are considered.
1. 1(fan-in): number of gates that connected to its input. 2. 2(fan-out): number of gates that connected to its output. 3. 3(connectivity): number of flip-flops and combinational gates connectivity in both forward and backward direction. 4. 4(input distance): minimum distance between the primary input and flip-flop/gate. 5. 5(output distance): minimum distance between the flip-flop (gate) and primary output. 6. 6(ed score): ∑ ( 7. 7(signal restoration): flip-flop or gates is defined as the number of restored states over the process of t cycles of trace signals. It should be noted that the features mentioned above are fully based on the physical structure of the circuit and how fast it can calculate. Using this structural relationship, the contiguous neighbours are obtained. Observability expansion can be made either by structural analysis or by error signatures obtained from the netlist. Based on these two the neighbour states of all the flip-flops and gates are identified. The main criteria is to find the optimum neighbour (ON) to figure out the internal signal states for better visibility. To iterate this, the minimum and the maximum number of neighbours have taken as 5 and 500 respectively. To compute these, we have implemented K-D tree, Ball tree using locality sensitive hashing to find the neighbours in the fastest manner. Note that flip-flops are derived from error blocks obtained from the debug procedure. We define a localization function to quantify the efficacy of gatelevel error localization with the proposed debug methodology. If the error injection and subsequent localization experiment is carried out K times, the function has the maximum value of K. We report for the two training methods (training_1 and training_2) and various error injection scenarios. By normalising the parameter values, the variants are denoted by Vi. By training the smallest circuits that help to identify the combination of neighbours (фj). The depth of the trace buffer is denoted by Cy cycles, the number of iteration is denoted by k and the proposed learning model is ML. As mentioned earlier, the training can be performed by two ways. The algorithm is given by, To find out the nearest neighbour is totally depends on the error signature (CGe) and its learning model (Vi). After finding the neighbour of a particular node, the traced and restored values are identified for observability expansion. The minimum difference between the expanded signature and CGe gives the best companion. The Neighbor score gives the summation of actual score and observed score of фj and Vi. The second model is fully based on the netlist and independent error signatures. Based on the features as mentioned above, the estimation of error transmission via each input gate can be specified as (1/Ninputs). Once the gate value score is calculated, the error probability score can find out easily by calculating all the input paths to each flip-flop. Here, the number of gates are G and the number of paths are P to estimate the training model. The second training model is developed based on the number of features listed without restoring the error signatures from the simulation. The training model is given by, Algorithm 3: 1. Proceduretraining_2 (input: netlist, n, k; output: ML) 2. ML∅ 3.
Ttraced and restored stated for each cycle 5.
Error signature of each gate (CGe) N-states of CGe for Cy cycles 6.
for each CG 1 to xdo 8. compute all the features 9.
feature_set feature of each CG 10. end for
for y=1  Hence, the resultant model will be M (training_1, training_2) to find the neighbours that reduce the requirement and memory space. The missing values are identified with the help of neighbours. This helps to increase the observability that debugs efficiently. This is verified by a higher level of abstraction that localises the error exactly. The error localisation is very difficult at the gate level, so the proposed method tries to achieve at the granularity of each gates/flip-flop. The basic principle to debug is that the difference between the golden reference and obtained signatures. So the problem of unknown signal values has been resolved using the random signal selection algorithm. From this, we can understand that for localising the smaller circuit, the error signatures are divided into a certain number of blocks. Based on the dissimilarity, the error signatures are localized. It can be done by ranking of all the flip-flops or gates out of which the top most thing should be chosen.

Random signal selection
Algorithm 4: In: Circuit, tbw, Ar, related region ar, selected signal list; Out: List of n traced signals 1. Signal to be traced (ST)∅ 2. Let ar = ∑ = =1 ri 3. Find the influence from Ar, Cr = 4. Select Cr from selected list 5. Place the selected list in ST and then repeat Step 4 and 5 6. Return selected ST During the active region, the duration depends on (1≤k≤m) for m x n1 signals. The total number of possible states is 2 m-1 .For example, the active region of size m=2 and n1=2, the two selected signals from each region (RA, RB) is represented as A0, A1 and B0, B1 respectively. If the current state is (0,1) then the selected signals will be (B0, B1) of Fig. 8 that uses 'x' multiplexers in Random Signal Tracing (RST). Likewise, if it is (1,0) and (1,1) then the signal states are (A0, A1) and (A0, B0) respectively. The output is given to the trace buffer and the controller that controls the signals towards multiplexer that operates under the same clock. An external knob is given to the active error zone information of a circuit for validation. Note that, modeling 1 is not been probable for larger extent. So the design bugs are mapped into netlist level [Choi, Jung, Oh et al. (2018)] such as unintended wire exchange, gate level random gate replacement, etc. If the two wires are exchanged due to the design error and it slips to fabricated silicon, then that error will propagate to the logical cone of a connected path. If there is an unintended inverter is present in the design then this also can propagate towards the output cone.

Experimental setup
We have verified the region-aware trace signal selection and random signal tracing algorithms with the help of ISCAS'89 benchmark and Opencores circuits. Consider each region has one error-prone and we have introduced 50 random errors to the active regions and the error density is directly proportional to region size. Here, we executed two kinds of simulation one with an ideal case and the other with the erroneous signals. We have pragmatic our algorithm to various regions and active regions. We have obtained the results for two regions to different scenarios. In global signal selection, they assumed that the errors are uniformly distributed without considering active regions and error zones [Jindal, Kumar, Jindal et al. (2018); Kumar, Jindal, Fujita et al. (2017)] whereas the proposed technique considers error detection and restoration. After this, error-one technique has evolved and this approach was focused only on static region that considers all the regions are active. But the proposed technique focused on the active region, error zones, different or random region that can reduce the time and memory space. The comparison for error detection ratio for a single region is active is shown in Fig. 9.

GSS-Gate-level Signal Selection E-GSS-Error zone based Gate-level Signal Selection RASS+ML+RST-Region-aware Machine Learning Random Signal Tracing
A Verilog module is developed that parameterised m regions and n1 traced signals. Then the proposed module is synthesized using Synopys design complier using lsi_10k technology. A typical size of trace buffer size is 32×1024 bits. Usually, the debug technology occupies more space than a controller, whereas the proposed technique uses only 5360 µ while using the same library. Therefore, this approach detects two times more errors compared to existing methods. With the help of Scikit-learn [Pedregosa, Weiss and Brucher (2011)] library, the above-mentioned the proposed learning model implements benchmark circuits. The training_1 model has used single design error (e.g., exchange of wire). For the same configuration of trace buffer size (32×1024 bits), the neighbours (фj=10) are identified using the learning model (Vi). Using the training model 1, the observability expansion is achieved. When the number of neighbours are less or if it is larger, the accuracy falls. So the neighbours should not be less than (like 5 or 10) or greater than 500, this condition occurs. Then the accuracy range lies between 8% and 15% in Fig.  10. Using the training_2 algorithm, the accuracy increases by more than 25% for 50 neighbours when choosing the combinational gate rather than the flip-flop with the help of training_2 model the accuracy increases. This simply justifies the simulation based observability technique works better than the usual reconstruction method. However, the rate of accuracy (increases/decreases) is lower compared to training_1. After analyzing the model, the proposed method hooks the least but it catches the most important aspirants. There is a smaller variation in obtaining the error localisation between the two training models. Correspondingly, there is a smaller variation when the number of neighbours and the learning model that is used for contiguous neighbour mode M. The methodology is verified with the help of cross-validation of the debug data on designs injected with gate-level error models. To avoid over-fitting in our model, we have used 5fold cross-validation technique (20%-test/validation vectors; 80%-training vectors). The problem of over-fitting occurs when the model is too specific to the training data that results in low accuracy of the new data and high accuracy in training data. Here we have used the smallest set of ISCAS'89 benchmark circuits to train our signal selection model and for running the modeling techniques, 5-fold cross validation have been used. This system is validated by comparing the obtained results with the Icarus Verilog simulator output.

Conclusion
Limited observability and controllability is the basic problem of post-silicon validation and debug. The rudimentary hypothesis of existing methods was developed using uniform errors that distributed over the circuit. The proposed technique selects the most beneficial KD_error Ball_error set of signals in each divided region based on error zones. The regions which are active are traced dynamically at a certain time with the help of contiguous neighbours. This methodology uses Manhattan distance for the distance calculation from simulated values for neighbour findings. Finding of neighbours using machine learning model helps to detect two times more errors than the existing methods.