Process Variation-Resistant Golden-Free Hardware Trojan Detection through a Power Side Channel

School of Microelectronics, Tianjin University, Tianjin 300072, China Beijing Engineering Research Center of High-Reliability IC with Power Industrial Grade, Beijing Smart-Chip Microelectronics Technology Co., Ltd., Beijing 100192, China School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China Key Laboratory of Aerospace Infomation Security and Trusted Computing, Ministry of Education, Wuhan University, Wuhan 430072, China


Introduction
With the development of global outsourcing manufacturing services, an emerging security problem has emerged in the field of Integrated Circuit (IC) manufacturing, that is, potential chip modification in uncontrolled chip manufacturing [1]. ese modifications, maliciously and intentionally applied to the circuit, are called the Hardware Trojans [2]. e hardware Trojan can be divided into the Always-On Hardware Trojans (AHT) and the Triggered Hardware Trojans (THT) according to the different trigger mode. An Always-On Trojan can cause harm as soon as the power of the circuit is on. A Triggered Hardware Trojan contains two parts: the trigger circuit and payload circuit, as shown in Figure 1. e trigger circuit starts running after the power is on, but it does not show malicious behaviour. It only monitors some signals or a series of events in the circuit, and its output is connected to the load circuit of the Trojan. e payload circuit is usually in a silent state. Once triggered, it shows malicious behaviour. Compared with Always-On Trojans, Triggered Trojans are more dangerous because the adversary can control when the Trojan is detonated. e process of chip design and manufacture can be divided into specification, design, fabrication, testing, and assembly. It is possible to inject hardware trojans in design, fabrication, and assembly [3].
For these different Trojan injection stages, a variety of hardware Trojan detection methods have been proposed. Li et al. [4] divided detection methods into two categories.
Detecting the HTs inserted by the EDA tools or brought in the IP cores is called pre-silicon detection and finding the HTs inserted during the assembly stage and the manufacturing stage is called post-silicon detection. e objects of pre-silicon detection are RTL code, netlist, domain, and so on. Pre-silicon detection can be further divided into (a) detections based on formal verification and functional simulation methods [5][6][7], (b) detections based on the information flow [8], and (c) detections based on the analysis of Trojan characteristics [9,10]. e objects of postsilicon detection are actual IC manufactured by untrusted founders. Widely used post-silicon detection technologies include (a) logic testing [11], (b) side channel analysis [12][13][14][15][16][17], (c) detection combining logic test and side channel detection [18,19], and (d) chip reverse engineering [20].
Existing detection techniques have great limitations. Presilicon detection is often used to detect the Trojan with a specific structure or function because most pre-silicon detections require prior knowledge of the Trojan. For example, the detection based on the analysis of Trojan features requires modelling in advance. Formal verification and functional simulation detection are only effective for the Trojan with specific behaviour. Among post-silicon detection, logic test and other technologies which need to activate the Trojan for detection have great limitations. Because of the rarity of Trojan triggering events, it is difficult to activate the Trojan in physical detection. Trojan detection based on side channel analysis is a widely used post-silicon method [21]. e SCA method can detect the Trojan even when it is not triggered because the trigger circuit of THT keeps running as soon as the power is on. However, most of the previous SCA approaches rely on a golden chip which is usually hard to obtain. Procuring a golden chip may require destructive reverse engineering through decapsulation, delayering, and imaging of the chip [9]. e related work of golden chip-free detection can be divided into two categories. In the first category, a golden template is simulated for detection through the netlist file or layout file. He et al. proposed a novel strategy for HT detection using electromagnetic side channel-based spectrum modelling and analysing [22]. ey utilize the design data at the early stage of the IC lifecycle, and the generated spectrum can serve as the golden reference. Rad et al. proposed a method which does not need a golden chip, but a Trojan-free layout is required to serve as the trusted model [23]. e second category is self-reference hardware Trojan detection based on the spatial or time similarity of circuit parameters. Du et al. proposed a self-reference method to compare the characteristics of transient current between two circuit blocks [24]. However, the method requires a set of golden chips to effectively eliminate process variation. Hoque et al. proposed a time self-reference TeSR method [25], in which the current signature of a chip at two different time windows is compared to isolate the Trojan effect. Zheng et al. proposed an IC integrity analysis SeMIA based on spatial selfsimilarity [26]. SeMIA compares the side-channel signature of one block with another self-similar block on the same chip. e key idea is that different self-similar blocks (i.e., parts of an adder, comparator, memory, and logical datapath) experience different stresses due to widely varying levels of activities, or exhibit asymmetric side-channel signatures due to HT attacks. is paper proposes a golden chip-free hardware Trojan detection scheme based on the power side channel. In order to overcome the situation that the similar structure is easily bypassed by the adversary and cannot cover the whole circuit, our scheme modifies the original design. We take advantage of the fact that the physical power consumption is proportional to the number of logic gate toggles. Under certain inputs, we construct two circuits with the same toggles for self-reference detection. It is theoretically guaranteed that the complexity of the adversary bypassing our detection scheme is O(2 (n/2) log2 (n/2) ). rough simulation experiments, we demonstrate the ability to reduce process variation. e rest of this paper is organized as follows. In Section 2, we introduce the toggle count power model, discuss our detection principle, and analyse the method to reduce process variation. In Section 3, we describe the detection scheme in detail. In Section 4, we construct both simulated and physical experiments. Section 5 concludes this paper.

TC-Based HT Detection
e detection technology studied in this paper is aimed at the THT injected during fabrication. e trigger part of the THT is always active, no matter whether the Trojan is triggered. erefore, they will generate additional power consumption outside the original circuit and will be reflected on the power side channel. We can determine whether a hardware Trojan is injected in the circuit by detecting the extra power consumption. To effectively evaluate power consumption, we introduce the toggle-count power model firstly.

Toggle Count in the Digital Circuit.
e digital circuit consumes power whenever they perform computations. e total power consumption of a CMOS circuit is the sum of the e power consumption of a logic gate is [27] where α indicates the number of logic gate toggles per unit time. α is related to the data and operation of the circuit. Other parameters have nothing to do with the operation or data of the circuit. ey are only affected by the electrical characteristics. Equation (1) shows that if the electrical parameters are determined, the leakage power has a direct relationship with the toggles of the logic gate. Mangard et al. mapped the toggles to simulate power consumption and successfully recovered the key from the AES encryption with mask protection [28]. is power model is called a toggle count (TC) model. e value of TC is calculated by the following equation: where t is the time point, m is the total number of logic gates in the circuit, g i represents the TC of the ith gate during [t, t + t ′ ], t ′ is the delay of the last toggle in the combined circuit [29], and (X 1 , X 2 ) is the input vector pair, and we denote it as tb.
According to equation (2), TC changes with tb. e TC model only takes TC into count rather than the function of the circuit. erefore, different circuits may have a specific tb so that their TC is the same. And, the same circuits can share the same TC under different values of tb.
Definition 1 (pair circuits). for the input vector tb i and tb j , if TC O 1 (tb i ) � TC O 2 (tb j ), then the circuits O 1 and O 2 (O 1 and O 2 can be the same circuit) under tb i and tb j are pair circuits.
According to equation (1), when the electrical parameters are determined, the power consumption of two circuits with the same TC is equal. So, the power consumption of pair circuits is equal in reality.

Process Variation in TC-Based HT Detection
2.2.1. Detecting HT with TC Directly. In Section 2.1, we explained that pair circuits share the same power consumption. erefore, given two tb, if the two circuits can be configured as pair circuits, we can achieve self-reference detection through them.
When pair circuits are activated by tbs which meet Definition 1, they generate equal TC. In this situation, if one Trojan is injected in the pair, the TC of the two circuits will differ from each other caused by TC of the Trojan trigger circuit unless the adversary can guess the used tb and make the Trojan generate no additional toggles. To obtain this correct tb, the adversary needs to exhaust the space of tb (2 n * 2 n , where n is the length of input vectors). Based on this idea, the basic detection algorithm is given in Algorithm 1.

RandomSelect.
Choose two tb for two circuits separately. CircuitExpand. Add redundant circuit into original circuit with fewer TC so that O 1 and O 2 become pair circuits at the design stage. After CircuitExpand, O 1 and O 2 can generate the same power consumption. erefore, as long as the relationship between P actu− O 1 (tb i ) and P actu− O 2 (tb j ) is compared, it is possible to identify whether a hardware Trojan is injected into pair circuits. TapeOut. Stand for chip manufacturing. Before TapeOut, we add auxiliary detection circuit. Circuit after TapeOut is the object of physical detection.

Problems Caused by Process Variation.
In the physical environment, the side channel information will be affected by process variation and measurement noise during the measurement process. It means that the physical power consumption is not completely equal to the simulation power obtained by the power model. In Algorithm 1, the presence of noise will bring great challenges to the detection. It is generally believed that when the amount of data is large enough, measurement noise can be reduced by statistical methods. However, as the offset is introduced by the circuit in the production process, process variation is a fixed value even in multiple measurements and cannot be eliminated statistically.
Process variation refers to the deviation of threshold voltage and gate capacitance of the transistor [30] caused by the difference between gate length, oxide thickness, and channel doping during the production of transistors [31]. Process variation will eventually be reflected on side channel information such as power consumption and time delay. e lightweight characteristic of Trojan makes the Trojan circuit's proportion in the original circuit very small. So, using Algorithm 1 directly will make the differences caused by Trojan to be overwhelmed by noise. e detection ability of the hardware Trojan cannot be guaranteed. Let the power consumption deviation caused by logic gate g i be P pv (g i ).

Reduce Process Variation.
In order to be similar to measurement noise and statistically reduce the influence of process variation, the concept of pair circuits is extended to multiple groups of tb.
e new idea is shown as follows. We construct two sets of vectors, TB 1 � (tb 1 1 , tb 1 2 , . . . , tb 1 n ) and TB 2 � (tb 2 1 , tb 2 2 , . . . , tb 2 n ). e logic gate which toggles under TB 1 will also toggle under TB 2 , and the corresponding TC is equal. We add the physical power consumption of the circuit under TB 1 and TB 2 separately and compare the overall power consumption.
Firstly, because the total TC is same, the overall power consumption is also theoretically same. Process variation caused by different processes for each logic gate may meet a certain distribution. But, when the logic gate is produced, the variation is a fixed value. For the same gate, each P pv (g i ) is the same even if running multiple times. We can get Security and Communication Networks 3 at is, the total power consumption deviation due to process variation is same. Under this detection scheme, even process variation is considered, and the accumulated power consumptions are still equal. If a hardware Trojan is injected into the circuit, the TC generated by the Trojan trigger circuit will make the two sets of power consumption unequal.

Feasibility Improvement.
When the number of logic gates of the circuit is extended to the order of billions or millions, it is a hard task to ensure that the TC of each gate is the same. Every time a logic gate is added to the operation, it is equivalent to an increase in the calculated dimension by one. In order to reduce the analysis complexity, we divide the circuit into several small regions, and each region is a square with side length r, as shown in Figure 2. Define each square as a grid, denoted as N i (r). We make the logic gates toggle the same numbers in each grid. e partition proposed in this article refers to the physical partition rather than the circuit logical segmentation and has nothing to do with the specific circuit function. Once the length r is determined, the circuit layout can be partitioned, and the time complexity is O(1). e purpose of dividing the layout is to normalize the process variation of every gate in the same grid so that the same TC of each gate is converted to the same TC of each grid. In this way, we can reduce the complexity of the problem. Later, we discuss the rationality of such partitioning. Note that, when the circuit is segmented logically, the combination circuit between registers is generally divided into one part. However, in our method, a combination circuit may also be divided into two or more grids (when r is small enough).
Wafer is the basic material used in the manufacture of silicon semiconductor-integrated circuits. e wafer can be oxidized and etched to produce various circuit element structures. After the etching and other steps, the wafer is divided into individual die and becomes an integrated circuit product with specific electrical functions. In these circuits, intradie variations exhibit spatial correlation [32]. ere are perfect correlations among the devices in the same grid, high correlations among those in close grids, and low or zero correlations in far-away grids.
Under Hypothesis 1, if the TC of two TB in one grid are the same, for this grid, the total process variation generated by two TB is the same. For the given TB l and TB 2 , if the TC of any grid is the same, the process variation in the two power consumption is equal.
Trojan detection under Hypothesis 1 can effectively reduce the complexity of the problem and make the detection scheme feasible because we degenerate the dimensions from each gate to each grid. However, the feasibility improvement also reduces the detection accuracy. When r reaches the minimum value (as shown in Figure 2(b)), each grid contains only one logic gate. It is assumed that Hypothesis 1 is true. At this time, the highest detection accuracy can be obtained, and the effect of process variation is completely avoided. When each grid contains multiple logic gates, there are differences between each gate in the grid. But, they are highly correlated. erefore, power consumption analysis under Hypothesis 1 can effectively reduce the impact of process variation, though it still exists in the end. e larger r is, the greater the influence of process variation will be. At the same time, the larger r makes the condition that TC of each grid reaches the same faster and easier. e balance between detection accuracy and time overhead can be dynamically adjusted according to the designer/detector. (10) return TRUE; (11) else then (12) return FALSE; denotes the TC generated by the circuit under tb i , where TC i,t denotes the TC at the t clock and I is the maximum clock cycle of the circuit. en, the TC of the total circuit under TB can be recorded as an n × τ matrix T. And, the physical power leakage corresponding to T is a matrix L: For any element in the matrix T, i,j represents the TC of the kth grid at the jth clock when the input is tb i and n g is the number of grids. If the circuit does not run for more than τ under one tb, the elements in T are padded with zeros.
Definition 2 (pair TC sets). For a circuit divided into n g i,j , then P and Q are pair TC sets for the circuit. P, Q ⊂ T, and |P| � |Q|.
i,j , and the total TC under P and Q are the same. According to equation (1), activating the circuit with P and Q separately will generate the same dynamic power consumption.
e static power consumption of the circuit has nothing to do with the data at runtime. Moreover, we make as many elements in P and Q as possible to guarantee the same static power. From Sections 2.2.3 and 2.2.4, the same TC of each grid ensures a balance of process variation in the overall power consumption of two sets. erefore, if P and Q are pair TC sets, the sums of the physical power consumption corresponding to them are equal.

Build Pair TC Sets
Definition 3 (multidimensional 0/1 knapsack problem). Given an n × m matrix A and an m-dimensional column vector b, determine whether there is an n-dimensional binary vector X � x 1 , x 2 , . . . , x n making the equation Proof.
e construction of pair TC sets is divided into two stages: (1) selecting elements from the matrix T to join the P set and (2) for a given P set, finding the paired Q set. In the second stage, for a given P, TC i,j ∈P tc (k) i,j , k � 1, 2, . . . , n g in every grid constitutes a column vector of n g dimension, corresponding to the column vector b in the MKP problem. e remaining elements after removing the set P in the matrix T form a new matrix T ′ : corresponding to the matrix A in MKP. e goal of building pair TC sets is to find a column vector in T′ that satisfies is corresponds to the binary vector solved in MKP. In the MKP problem, there is no stipulation on the number of 1 in the binary vector, and it can be the sum of any number of a i,j equal to b j . However, when building pair TC sets, it is necessary to have the same number of elements of the two sets. When corresponding to the MKP problem, we add an additional constraint: the number of 1 in the binary vector is equal to the number of elements in the P set. is constraint can be transformed into an additional dimension and added to the original T ′ matrix. e new row vector in the matrix is 1 1 . . . ...

Design of the Scheme.
e detection scheme proposed in this paper associates the detector with the circuit designer and assists the detection by inserting circuits into the original design. e detection target in this article is the hardware Trojan injected in fabrication, and the Trojan is not always on but needs to be triggered.

Flow of the Scheme.
e entire detection process is shown in Figure 3. Our scheme receives the RTL design or netlist design of the circuit and finally determines whether there is a hardware Trojan in the physical chip. e plan can be divided into three phases. e first phase is the preprocess stage, which is used to generate the data structures required for the subsequent steps. Algorithm 2 first selects a set of input vectors X in the overall input space for testing. e selection of the input vector needs to cover the normal circuit functions according to the original circuit structure. A TB vector is then generated from X. TB is used to simulate the TC to obtain the TC matrix T. ere are many different levels of simulation of TC [33], and we use the lowest level simulation to meet the actual running state. e second phase is the circuit design modification stage. e target of this stage is the circuit after Place and Route. e layout circuit is divided into n g grids, and the domain length of each grid is r. In the matrix T generated in the first stage, n elements are randomly selected and put into the sets P and Q, respectively. For each grid, Algorithm 3 calculates the sum of the TC. If there is TC i,j ∈P tc (k) i,j � TC i,j ∈Q tc (k) i,j , k � 1, 2, . . . , n g , then Algorithm 3 outputs the design layout, P, Q, and TB. Otherwise, Algorithm 3 performs circuit design modification. Finally, the modified PR-level design will be output. e circuit modification strategy is described in detail in Section 3.1.2. e third phase is the Trojan detection stage. At the end of the second phase, we consider the design of the circuit to be reliable and Trojan-free. As a potential hardware attacker, the chip manufacturer can inject a hardware Trojan in the circuit tape-out link. e target of the detection phase is a completed circuit. We run the circuit according to TB and collect the power leakage matrix L.
e elements in L correspond to the elements in T on a one-to-one basis. e elements in L corresponding to the elements in P and Q are accumulated, and we compare (i,j)|TC i,j ∈P L i,j and (i,j)|TC i,j ∈Q L i,j . If the two are different, the Trojan was injected during the manufacturing process. Otherwise, the circuit is clean and secure.

Circuit
Expansion. Section 2.3.2 proves that the core of building pair TC sets is to solve an MKP problem. At present, the knapsack problem is still an NPC problem. Horowitz and Sahni proposed a two-table algorithm using the divide-and-conquer method, with time complexity O(2 (n/2) log2 (n/2) ) [34]. With the introduction of various artificial intelligence algorithms, many optimized knapsack solutions have been proposed. But, they are still essentially in exponential time complexity.
In order to make the detection scheme reach the practical and feasible time complexity, this paper combines the detection with the chip design. We make some adjustments to the original design to make it suitable for our detection scheme. As mentioned in Section 2.2.1, we insert the redundant circuit into the original circuit. Once the set P is selected, the set Q is not directly solved according to the conditions. We randomly select elements to join set Q. For each grid, calculate the distance between P and Q. By inserting redundant toggles in the original design, the TC of P and Q in each grid reaches the same. e complexity of solving the knapsack problem is transformed into the complexity of constructing redundant circuits.
In our scheme, the main function of the circuit expansion is to reduce the time overhead of the designer when building the pair TC sets. e AND gate is the most basic logic element and exists in all CMOS circuits. So, we extend redundant circuits based on an AND gate and its two fan-in gates, as shown in Figure 4. We designed five extension methods based on the different types of gate 1 and gate 2 (AND or NOT) to generate additional TC without changing the original logic function of the circuit. e extended circuit logic is shown in Table 1.

Overhead.
e five methods mentioned in Section 3.1.2 all add two redundant logic gates, which can generate two additional TC on average. at is, the number of redundant logic gates generated by this solution is In terms of the time complexity of Algorithm 3, as we mentioned in Section 2.2.4, Partition is an O(1) complexity operation. erefore, the time complexity of Algorithm 3 is related to the number of executions of Expansion. On average, every time the TC differs by two, we need to insert a redundant logic. e execution frequency of the algorithm is (ΔTC/2). erefore, the time complexity of Algorithm 3 is O(ΔTC).

Security of KP-Based HT Detection.
e attacker model is as follows. e adversary can get the layout design of the circuit and the format of the input and output. In order to bypass this detection scheme, the adversary can (1) determine which grids are used in the detection by solving P and Q sets and (2) construct a special Trojan so that the injected Trojan presents the same TC for any test vector.
For the first method, the adversary cannot obtain T because T is constructed based on the TB which is the subset of the entire input space. Taking the AES128 encryption algorithm as an example, the size of its input space is 2 128 . It is difficult for the adversary to guess T. Even considering the worst case, the adversary can determine T. In Section 2.3, we have explained that solving P and Q from T is an exponential complexity O(2 (n/2) log2 (n/2) ). erefore, the adversary cannot bypass our detection through the first method. With the second method, the attacker does not need to solve P and Q. However, the Trojan trigger circuit cannot achieve the same TC state for all test vectors. Because of the small probability of triggering the Trojan, the Trojan trigger circuit must have a large fan-in cone [35]. When the part state meets the trigger condition, the fan-in cone will produce different responses. If the same TC occurs for any input, the trigger circuit loses its function. erefore, the adversary cannot bypass our detection through the second method.

Detection Capability of the Scheme.
e existence of process variation makes the measured power consumption calculated according to P and Q not completely equal. Let the difference between physical power consumptions without Trojan be P pv− total and the power consumption of the Trojan trigger circuit be P trojan . If equation (5) is satisfied, the scheme can do detect the injected Trojan: Because the trigger circuit generates different TC under different inputs, we use average TC to measure the relationship between the size of the Trojan and the original circuit. We   Security and Communication Networks denote the average TC ratio between the Trojan trigger circuit and the original circuit as ρ � (AVG(TC trojan )/AVG(TC orig )). And, because TC is proportional to the nominal power consumption of the circuit, we get ρ � (P trojan /P orig ). According to the previous equation, at is, for a hardware Trojan with the toggle scale greater than (P pv− total /P orig ), a 100% successful detection rate can be achieved. e scheme proposed in this paper can effectively deal with process variation. In the physical detection, P pv− total will be less than the theoretical value, so it can detect a smaller scale Trojan.

Experiment
We conducted the experiments on Xilinx Virtex5 XC5VLX30 FPGA. e device granularity on FPGA can only reach the standard FPGA unit, that is, LUT. erefore, in the experiment, we convert all gate-level signals into LUT output signals.

Circuit under Test.
For the detection scheme proposed in this paper, we carried out the experiment both with simulated data and physical data. e simulation experiment is based on an FPGA implementation of a 3-share AES S-box masking [36]. We call it TIS16. e physical experiment is based on AES-ECB encryption [37].TIS16 uses a new addition chain to accelerate and lighten the S-box, as shown in Figure 5, where S stands for square operation and M stands for multiplication. e hardware implementation of TIS16 is shown in Figure 6. e shamul module is used to perform the inversion defined by the addition chain and iterate the square operation according to the affine transformation in GF (2 8 ). e shamac module performs shared multiplication on constants and then performs shared addition. (2) Select n elements TC i,j from T and put them into P and Q, respectively; (3) for k from 1 to n g (4) if � (o 1 , o 2 , . . . , o n g ); ALGORITHM 3: Circuit modification. Table 1: Five methods to expand AND logic.

Number
Origin logic Extended logic clock, and each encryption contains a total of 11 clocks. is circuit uses a total of 881 registers and 2296 LUTs.

Reducing Process Variation
ere is a proportional relationship between the TC and the physical power consumption of the circuit. Based on the TC, we simulated the effect of process variation on each signal and obtained the simulated power consumption affected by process variation. By measuring the deviation of the physical power consumption under different grid sizes, the ability of our scheme to resist process variation was verified. e simulation of the circuit is based on Xilinx's ISim. e specific process of the experiment is as follows: (1) Circuit Simulation. Postplace and route simulation of the test circuit generates a VCD file, in order to obtain TC of the test circuit under a given input (in this experiment, we focused on the part that actually implements the encryption function. We only simulated the masking part, and the operation of the control circuit was ignored).
is experiment is based on Xilinx's FPGA, and the circuit can naturally be partitioned by SLICE. e logic elements belonging to the same SLICE are divided into the same grid.
(3) VCD File Analysis. Calculate the TC of each SLICE (grid) at each clock, and generate matrix T by Algorithm 2. (4) Build Pair TC Sets P and Q. In the experiment, we use the Euclidean distance as an indicator to measure the difference between the TC of two sets in each grid. We build P and Q with the minimized Euclidean distance to ensure that the TC difference of each grid is minimized. It is used to reduce redundant gates. (5) Insert Redundancy. Compared to ASIC circuits, FPGAs specify the total number of hardware resources available to the designer. In the experiment, we choose the speed-first strategy during place and route, leaving a part of resources in each SLICE to add redundancy. On FPGA, for the expansion scheme proposed in 4.1.3, we convert it into the corresponding LUT truth table. For grids with different TC, we construct a LUT that is related to the input. For grids with the same TC, we construct a redundant LUT that is independent of the input. (6) Power Simulation. e simulation power consumption includes the nominal power caused by toggles and the noise power caused by process variation. Research by Chang and Sapatnekar [32] shows that, with the influence of process variation, the leakage power consumption of logic gates approximately follows a logarithmic Gaussian distribution. We simulate the power consumption of each flip as e Y i , where Y i ∼ N(μ y i , σ 2 y i ) is a Gaussian random variable. According to the spatial correlation of process variation parameters, for TC in the same grid, we generate simulation power with the same expectation and variance (μ g i , σ 2 g i ). e average power consumption of each grid is denoted as a random variable M � μ g 1 , μ g 2 , . . . , μ g n . We assume M ∼ N(0, σ 2 total ). e size of σ 2 total affects the proportion of process variation in the total power consumption. In our experiment, we set it to 5%. σ 2 g i affects the relevance of TC in the same grid and is related to r. We set σ 2 g i of each region to the same value and test the ability of this solution to resist process variation under different σ 2 g i . When divided by SLICE size, the circuit is partitioned into 206 grids. After the 4th step, the circuit is activated with input vectors according to P and Q, respectively. e resulting TC are 88291 and 83939. After the redundancy is inserted, their TC are both 88310. And, the final simulated values are 70131.27 and 71335.53. e deviation between the two sets is about 1.72%. Regardless of the partition, we directly choose input to make P and Q toggle the same, which means the entire circuit is in the same grid. e resulting simulation power is 75488.57 and 78639.83, respectively. e deviation between the two is about 4.17%. e values of σ 2 g i in the two experiments are set to 0.01 and 0.001. It can be seen from the experiments that the detection scheme proposed in this paper has a certain resistance to process variation, which improves the detection accuracy of Trojan under the same experimental environment.

Physical Experiment.
e first five steps of the physical experiment are the same as the simulation experiment. In Step 6, we used the physical power consumption data. In the physical experiment, we tested the experimental effect under different grid sizes. Specifically, in the experiment, the smallest partition unit we use is SLICE (r � 1), and each SLICE contains 4 LUTs. Besides, we tested the performance of the larger partitions (r � 1, 2, 4, 8, and 16).
We describe the circuit in the XDL file of Xilinx and instantiate the GII AES design twice. e two instantiations are placed and routed in the same way only with an offset in the phase. e distance of the same function SLICEs is r, as mentioned in Section 2.2.4. r here refers to the ordinate difference of SLICE in the XDLRC file. As shown in Figure 7, r of SLICE1 and SLICE2 equals 1, and r of SLICE1 and SLICE4 equals 3. In this way, when we measure the power of the two different instantiations, it is equivalent to constructing a P set and a Q set under a partition with r.
We choose an AES instance as the reference circuit and collect 100,000 power traces. e traces are statically aligned based on correlation and denoised with Gaussian filtering. e preprocessing power traces of ten rounds' AES encryption are shown in Figure 8. Under different r, we x 36 x 54 x 108 x 126 x Security and Communication Networks perform the same power collection and preprocessing on other circuits and calculate the power difference between them (as Q set) and the reference circuit (as P set). Figure 9 shows the difference in power consumption between P set and Q set under different r.
From Figure 9(f ), it can be seen intuitively that when r is larger, the difference in power consumption between the two sets will fluctuate more. We choose the Euclidean distance as the evaluation index and calculate the distance between the power trace of the reference circuit and the power trace of the circuits under different r. Experimental results show that when the circuit is partitioned into small grids and the power consumption is compared by grids, the smaller the grids, the smaller the process variation in the power consumption. And, the comparison result will be more accurate. e scheme proposed in this paper can effectively reduce the influence of process variation on the detection accuracy of Trojans.

Simulation Experiment.
In order to demonstrate the detection capability of KP-based HT detection, we injected hardware Trojan in TIS16 and conducted the simulation experiment. e HTsample comes from Trust-Hub [38], and the experiment is based on the AES-T800 in Trust-Hub. After place and route, TIS16 uses a total of 305 registers and 745 LUTs, and the Trojan circuit occupies an additional 4 registers and 8 LUTs. Our experiment only considers the Trojan trigger circuit and does not pay attention to the payload (when the payload is not triggered, it will not produce any toggle that affects the detection method). e trigger circuit of the AES-T800 is a finite state machine. When the four consecutive input vectors meet the four given values, the Trojan will be triggered. Based on the circuit used in Section 4.2, we inject the Trojan trigger circuit. In order to  ensure that the original design will not be modified, we directly inject the Trojan in the XDL file which contains the placement and routing information.
For the circuit with HT, we activate it with the same input in Section 4.2. e simulation power difference between P and Q sets is 4,297.01, which accounts for 6.02% of the original circuit power consumption, and the difference caused by the Trojan trigger circuit accounts for 4.33%. is value exceeds the 1.72% caused by process variation shown in Section 4.2. e result proves that our scheme can successfully detect hardware Trojans.
Note that the number of active gates of the Trojan trigger circuit and the original circuit is completely different under different test benches. erefore, it may result in different detection effects choosing different test benches. In the experiment, we test different test benches and find that the minimum power consumption difference caused by Trojan is only 565.56 (0.79%). And, the maximum can reach 3297.69 (4.62%). When we select the worst input vectors, the Trojan power consumption may be masked by process variation. is fact reflects the importance of another work in this article, which is to reduce process variation. When the algorithm proposed in this paper is not used, process variation causes 4.17% power deviation in our experiment, which makes it impossible to successfully detect the AES-T800 Trojan under most test benches. When the circuit is partitioned, the process variation is reduced to 1.72% making the detection success rate increase.

Physical Experiment.
We have extracted the Trojan trigger circuit in Trust-Hub [38]. In the Trust-Hub Trojan library, there are only three Trojan trigger modes for AES. In addition to the AES-T800 finite-state machine trigger structure used in the simulation experiment, the remaining structures include counter triggers and specific plaintext triggers. However, the solution in this article only pays attention to the toggle power consumption of the trigger circuit, which dilutes the trigger structure of the Trojan. At the same time, we assume that the trigger structure does not change the original function. erefore, we use redundant circuits with different toggles to represent the Trojan trigger circuits equivalently. In the physical experiment, we insert the "Trojan" circuit, as shown in Figure 10. In the ten rounds of AES encryption, the encryption core calculates a 128 bit

Security and Communication Networks
Dnext signal at the end of each round and applies it to the next encryption. We pull out two bits of D next and perform an AND operation. e result is sent to a redundant register. In this way, we can inject an extra toggle in each round of encryption in the original circuit. e specific implementation is shown in Figure 11. e total scale of the injected circuit is 20 registers and 40 LUTs, accounting for 2.27% and 1.74% of the original design, respectively. We calculate the absolute difference between the power consumption of the circuit-inserted hardware Trojans and the power consumption of the reference circuit. And, the result is compared with the circuit under different r. As shown in Figures 12(a) and 12(b), when the Trojan trigger circuit contains registers, the extra power consumption is relatively large. Even under the partition length of r � 16, the power of the Trojan is still far greater than the power caused by the process variation. e Trojan trigger circuit of this size can be effectively detected. On this basis, we reduced the size of the Trojan trigger circuit, by removing the registers in Figure 10 and turning the Trojan trigger circuit into a pure combinational circuit. As shown in Figures 12(c) and 12(d), under the Trojan scale of 40 LUTs, the power consumption of the Trojan is mixed with process variation when r � 8. At this time, due to the influence of process variation, the existence of Trojans cannot be detected. But, when the partition size reaches r � 1, the Trojan can still be obviously distinguished. Figure 12(e) shows the difference in power consumption between the Trojan trigger circuit with registers and the Trojan trigger circuit without registers under the same toggles.
Experiments show that the proposed scheme can effectively detect the combinational logic Trojan trigger circuit which accounts for about 1% of the original circuit scale.

Scheme Efficiency and Overhead.
For time and space overhead, we found that the solution time is linear and can be directly calculated based on small data size. At present, under the simulation condition of a clock of 100 ns, the postroute simulation using Isim takes 16 minutes when performing 1000 encryptions (26 clocks each time, including key expansion). And, the generated VCD file reaches 4018405 kB (about 3G). For the above simulation data (data volume at 26 * 108 moments), the analysis time of the VCD file is about 6 minutes.
In this experiment, we use the knapsack algorithm to solve the multidimensional knapsack. When the knapsack dimension reaches 6 (that is, the total number of partitions ng is 6), for 1000 traces (10 clocks per trace), the size of the state matrix of dynamic programming reaches more than 16 GB. Without state recording, when the dimension reaches 10, the solution time has exceeded 24 hours.
In the previous article on hardware Trojan detection, the work of TeSR [25] and SeMIA [26] are similar to this paper. TeSR leverages on the uncorrelated temporal variations in transient current signature of sequential hardware Trojans to isolate their effect from process and measurement noise. By comparing current signature of a chip for the same input pattern at different time windows, TeSR can only detect sequential hardware Trojans, while the solution proposed in this article can detect both combination Trojans and sequential Trojans. e Trojan example used in TeSR contains 8 registers. Our scheme can detect the Trojans without registers. SeMIA uses the inherent structural self-similarity in the design to detect hardware Trojans. e experiments in SeMIA show that SeMIA can detect hardware Trojans that account for 2.3% of the original circuit. e solution proposed in this paper can still effectively distinguish the Trojans from the process variation when the Trojan accounts for 1.74% of the area.

Conclusions
is paper proposes a detection scheme for post-silicon hardware Trojans. Our method combines design and detection. We insert several redundant circuits during the design so that the power consumption selected at a specific time can be superimposed to form two self-referenced power consumption sets. By modifying the design, the problem of requiring a specific structure in the existing self-reference detection scheme is solved. For the modified circuit, we can generate two sets of circuit running moments. e total toggle counts of the two sets are equal. In this way, the physical power consumption corresponding to the two sets are also equal. eir power consumption can be referred to each other, and they can be seen as each other's golden template. e adversary has no knowledge of the redundancy addition process. In order to find the equal toggle counts, they have to solve a knapsack problem. Given that solving the knapsack problem is an NP problem, it proves that even if the adversary obtains the original design, he/she cannot know which power is included in the self-reference set.
is guarantees the security of the proposed detection method. Based on the spatial correlation of process variation, this article divides the circuit and extends the knapsack into a multidimensional knapsack. We enable the variation in each grid to reach a balance which minimizes the deviation caused by process variation in the overall power consumption. In this paper, the resistance to process variation is realized by dividing the circuit into small grids, and it is verified in experiments.

Future Work.
In the future, we will study excellent test-bench generation technology to improve the success rate of our detection scheme. When the circuit scale is expanded to a large order of magnitude, our method will be more complicated. We try to study the appropriate algorithm to make the difference between them as small as possible when selecting the elements of the P and Q sets. Besides, we will explore the possibility of the hardware Trojan location through grid division and quantify the relationship between the average TC ratio and process variation.
Data Availability e simulation power data and physical traces used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.