Keywords

1 Introduction

Theoretically, authentication schemes and protocols are based on the assumption that the key stored in the non-volatile memory (NVM) is secure [1]. Unfortunately, this is quite difficult to achieve in practice. Physical attacks, e.g. side channel attack and reverse engineering, would result in key exposure and security breaks. Moreover, software attacks like malicious software and viruses, can also steal the private key. In industry, the natural idea for protecting against private key exposure through invasive physical attacks is to create a tamper sensing area to store the key information [2]. However, such methods are always complex and have not solved the essential problem that the private key is still permanently stored in the non-volatile memory.

Physically Unclonable Functions (PUFs) have been introduced to provide a more cost-effective alternative for key protection. Based on the idea that “There are no two identical leaves in the world”, PUFs extract and reflect the entity’s “hardware fingerprints” into their unique challenge-response behavior. Invoking a PUF instance with different challenges, it correspondingly returns different instance-specific response sequences that can be used for cryptographic purpose. PUF’s such challenge-response behavior is determined directly by its physical structures. During the manufacturing process, varieties of random and uncontrollable factors would leave subtle differences on every entity, therefore, there cannot be perfectly identical products created. For example, even produced by the same production line with the most advanced control technology, every SRAM cell has its own preference of power-on state (logic ‘1’ or ‘0’); the actual frequencies of nominally identical ring oscillators vary from each other; the arrival order of signals that go through similar propagation paths varies on different chips. These examples are also mechanisms of SRAM PUF [3, 4], Ring Oscillator PUF (RO PUF) [6,7,8] and Arbiter PUF [5, 9, 10] respectively.

As physical features extracted by PUFs are inherent and permanent, keys derived from PUFs can be generated anytime when it is strictly needed. Such keys only exist in the security system for a short period and disappear when power-off. In addition, many PUFs are tamper sensitive, invasive attacks may change PUFs’ behavior notably and irreversibly, which implies that any attempts to probe the PUF instance by invasive methods is meanwhile taking risk of destroying the key material it contains.

However, PUF just address storage problems, keys derived from PUFs are still faced with disclosure threats. The prime threat that has long impaired PUF-base key protection is the leakage caused by malwares. Though the PUF-based key disappears when power-off, malicious code can also steal it during the working process. Another threat comes from the leakage of PUFs’ Challenge Response Pairs (CRPs). Because keys are directly derived from PUFs’ response sequences, it is crucial to prevent PUFs’ CRP set from being observed completely. However, due to specialization of labor in society, PUFs are always threatened to be observed by cooperative companies or malicious employers during the manufacturing process.

In this paper, we implement a cost-effective key protection scheme, which secures the system through all stages of development.

  1. 1.

    We utilize PUF to bind the chip’s hardware and firmware with its private key to authenticate both legality of the device and integrity of the running operating system thereby secure the operating environment of the generated key;

  2. 2.

    We take advantage of the concept of Controlled PUF (CPUF) and use a hash module to enhance the security of PUF’s CRP set;

  3. 3.

    We adopt module reuse technique to make our scheme cost-effective and prove the scheme’s feasibility by implementation on Xilinx KC705 evaluation boards.

The rest of the paper is organized as follows. We firstly introduce CPUFs and PUF-based key generation in Sect. 2. Then we illustrate our scheme with security discussions in Sect. 3. In Sect. 4 we present detail designs of the scheme we implement on Xilinx KC705 evaluation boards. Finally, we conclude in Sect. 5.

2 Related Work

2.1 Controlled PUFs

A PUF’s CRP set contains all the secret with which the PUF can sever as a physical root of trust. Unfortunately, almost all popular electric PUFs used in practice are so called “Weak PUF”. These PUFs have limited CRPs, which can be totally observed with low cost. In addition, path-delay based PUFs, e.g. Arbiter PUF and Ring Oscillator PUF, are proved to be vulnerable to modeling and machine learning attacks [11,12,13]. If a PUF’s behavior has been penetrated, the key derived from it is also exposed.

To overcome the inborn defect of “Weak PUF”, Blaise et al. proposed Controlled PUF (CPUF) [14], which enhances PUF’s resistance to being modeled and broaden the application range of “Weak PUFs”. A CPUF is a combination of a PUF and an inseparable circuit, which usually implements an encryption or hash algorithm. This circuit governs the PUF’s input and output, which is so called “control”. The input control restricts the selection of challenges, which is very effective in protecting the PUF from modeling attacks that adaptively choose challenges. The output control prevents the adversary from probing the PUF, because it hides the physical output of the PUF and the adversary can only obtain indirect sequences derived from PUF’s responses [15] (Fig. 1).

Fig. 1.
figure 1

Using control to improve a “Weak PUF”

2.2 PUF-Based Key Generation

PUF-base key generator is usually consisted of two parts: a secure sketch and an entropy accumulator [16,17,18,19].

The secure sketch guarantees the perfect reproduction of the key derived from PUF’s response. It is usually implemented by an Error Correcting Code (ECC) algorithm as Fig. 2 demonstrates. To ensure correctness of the key’s recovery, the correcting capability of the ECC should be carefully designed according to PUF’s reliability. During the sketch process, some redundant information ω will be produced. This redundant information ω is called “helper data” and can help recover the noisy response in the recover process. Generally, the helper data is stored in an NVM without any protection, the worst estimation of remaining entropy considering the helper data being revealed is H(y) − (#c − #r), where H(y) is the min-entropy of the enrolled response sequence y, #c is the code length of ECC and #r is the bit number of the encoded random number r.

Fig. 2.
figure 2

Secure sketch of code-off construction

Though PUF’s response sequences are supposed to be random and unpredictable, they are in fact not nearly-uniform bit strings that satisfy the security requirement for a secret key. Therefore, an entropy accumulator is demanded to extract high quality random keys from response sequences that only possess limited entropy per bit. A secure hash algorithm is often applied as an entropy accumulator and the construction of PUF-based key generator is shown in Fig. 3.

Fig. 3.
figure 3

Construction of PUF-based key generator

3 PUF and Software Collaborative Key Protection Scheme

In this section, we will elaborate our full key protection scheme which enhances the PUF’s security and meanwhile authenticates the integrity and legality of the firmware.

3.1 Attack Model

The considered attack scenario is demonstrated in Fig. 4. Providing the NVM that stores the firmware is unprotected; other peripherals are all authenticated that none of them will leak PUF’s key materials. With respect to the stored legal firmware we assume it is well behaved, i.e. it will not output the private key or key materials, besides, during the running process it carefully checks received commands or requirements to avoid buffer overflow attacks. Under these assumptions, we mainly consider the following three phases of adversaries that either have access to the PUF instance or have chance to tamper the firmware.

Fig. 4.
figure 4

Attack model

  • When chips are just produced: The chip manufacturer has physical contact with the chip, which means the chip’s physical features are easy to be investigated by the chip manufacturer.

  • During the software development phase: At this stage, the bootloader and softwares have not been locked, cooperative companies or malicious employees may have chances to run malicious codes that read and send the PUF’s CRPs or even the generated keys out of the chip.

  • When chips have hit the market: When the firmware and bootloader have been locked, the most likely way for adversaries to inject malicious codes is to attack the unprotected NVM and break the secure boot system. We don’t consider dynamic code injection like buffer overflow attack, for such problems are more concerned with secure software development. We assume the only approach for adversaries to inject malicious code is the NVM that stores the firmware.

3.2 Our Scheme

The architecture of the proposed scheme is demonstrated in Fig. 5. The enhanced PUF module is consisted of a hash module, a secure sketch module and a conventional electric PUF instance.

Fig. 5.
figure 5

Schematic diagram of the proposed scheme

Firstly, to keep the generated key free from injected malicious code, we aim to bind the private key with the legal firmware, consequently any subtle variation in the firmware code will lead to the failure of the key’s recovery thus we can ensure that the legal key only appears when the firmware has been authenticated successfully. To achieve this goal, we condense the firmware code with the hash module. Then this hashed firmware sequence would be used as challenges to invoke the PUF instance and the obtained corresponding response sequences would serve as key generation materials. To guarantee the consistency of the running and the input firmware code, the PUF module have direct access to the NVM that stores the firmware code, i.e. firmware code is read directly by hardware logic without modification.

This hash module also serves as an entropy accumulator to form a PUF-base key generator with the secure sketch module and PUF instance. The secure sketch guarantees the generated key’s reproducibility. Regarding the PUF instance, considering PUF’s responses fluctuate randomly at every measurement, PUF itself can be regarded as a physical random source. Therefore, Except for offering instance-specific materials to generate the private key, PUF meanwhile forms a random number generator (RNG) with the hash module to serve the secure sketch in the key generation phase.

In addition, we notice that the PUF’s input challenges and output responses have all been processed by the hash module, i.e. our design has naturally possessed CPUF structure, which strengthens PUF’s resistance to adversaries like the chip manufacturer who have chance to read PUF’s CRPs directly.

The working flow of our key protection scheme is described as follows:

Key Generation Phase:

  1. 1.

    Directly read the firmware code (e.g. bootloader and the software code) from the NVM;

  2. 2.

    Calculate the hash value of the firmware code;

  3. 3.

    Use the obtained hash value as PUF’s challenge to invoke the PUF instance and acquire a response sequence y;

  4. 4.

    Invoke the PUF instance multiple times to get a long response sequence y r , hash this sequence to create a random number sequence r;

  5. 5.

    Sketch the response sequence y with obtained random number sequence r and save the generated helper data w in an NVM;

  6. 6.

    Hash the response sequence y to get the final private key pk and output it.

Key Recovery Phase:

  1. 1.

    Directly read the firmware code from the NVM;

  2. 2.

    Calculate the hash value of the firmware code;

  3. 3.

    Use the obtained hash value as PUF’s challenge to invoke the PUF instance and acquire a noisy response sequence y′;

  4. 4.

    Load helper data w;

  5. 5.

    Recover the noisy response y′ with the helper data w and acquire the recovered response sequence y″;

  6. 6.

    Hash the recovered y″ to get the recovered private key pk′ and output it.

3.3 Discussion and Analysis

Our scheme improves conventional PUF-based key protection scheme from two main aspects. On one hand, we bind the legal firmware strictly with the generated private key to protect the key throughout the chip’s lifetime. During the manufacturing and software development stages, before the valid firmware is completed, there is no legal private key observed. After the software development stage, the system is sensitive to any change of the firmware code, because according to hash function’s properties, even one-bit change in the firmware code will lead to a completely different hash result, i.e. a completely different challenge and finally a completely different key. Therefore, the successful recovery of the private key in return verifies the integrity and legality of the operating system by which it will be used.

On the other hand, the hash function and the PUF has naturally constructed a CPUF, thus prevent adversaries from probing the PUF instance. To investigate the enhanced PUF, the adversary should either be able to construct an input of the hash algorithm to generate a specific challenge, or to reversely derive the PUF’s response from the hashed result. Therefore, adversary who can successfully investigate the enhanced PUF is equivalent to possessing the ability to break the hash algorithm.

Furthermore, PUF’s inherent instance-specific behavior ensures the key’s uniqueness and the reproduction of the key in return proves the identity of the hardware.

To ensure the generated key pk to be successfully reproduced and possess sufficient entropy, two requirements must be satisfied: correctness requirement and security requirement. Before analysis we should build a proper model of PUF’s response.

PUF’s Response Model:

Providing every bit of the response sequence is independent. The actually measured response can be modeled as

$$ Y_{meas} = Y_{ihrt} \left( {INS,CHA} \right) \oplus E $$
(1)

Where INS is the PUF instance set and CHA is the challenge set. Y ihrt represents PUF instance’s inherent physical features. For ∀y ihrt (inscha) ∈ Y ihrt , y ihrt (inscha) is decided by the instance’s random characters and the input challenge, it is invariable at each measurement; E is the summation of random variations (e.g. voltage and temperature fluctuations, thermal noise etc.) during the measurement, it changes at every measurement, i.e. ∀i ≠ j, there is e i  ≠ e j . We assume PUF’s bit error rate is p e .

Correctness:

To ensure the key’s correct recovery, the error correcting capability of ECC algorithm in the secure sketch must be sufficiently strong. The lower bound of required error correcting capability is determined by the noise rate between the enrolled response y and reproduced response y′.

Assume that at the i th measurement, the obtained response sequence \( y_{i} = y_{ihrt} \left( {ins,cha} \right) \oplus e_{i} \), then for arbitrary two measurements \( y_{i1} = y_{ihrt} \left( {ins,cha} \right) \oplus e_{i1} \) and \( y_{i2} = y_{ihrt} \left( {ins,cha} \right) \oplus e_{i2} \), i1 ≠ i2, the difference between yi1 and yi1 is \( e_{i1} \oplus e_{i2} \), which is a Bernoulli distribution with probability 2p e  – 2p 2 e . Therefore, the number of different bits between yi1 and yi1 is a binomial distribution with probability 2p e  − 2p 2 e . Assume the ECC’s parameters are \( \left[ {n, k, t} \right] \), i.e. it contains 2k different codewords of length n bits, which are each at least 2t − 1 bits apart, the correctness requires that:

$$ \mathop \sum \limits_{i = 0}^{t} f_{bino} \left( {t,n,2p_{e} - 2p_{e}^{2} } \right) \ge 1-p_{fail} . $$
(2)

where \( f_{bino} \left( {t,n,{\text{p}}} \right) = \left( {\begin{array}{*{20}c} n \\ t \\ \end{array} } \right)p^{t} \left( {1 - p} \right)^{n - t} \), p fail is the permitted failure probability for key’s recovery, usually p fail  = 10−6 in the industry.

Security:

As we have assumed that every bit in a binary response sequence r ∈ {0, 1}n is independent, min-entropy calculated as formula (3) offers a lower bound of responds’ randomness in the worst case.

$$ H_{\infty } \left( r \right) = \mathop \sum \limits_{i = 1}^{n} - \log_{2} \left( {{ \hbox{max} }\left\{ {P\left( {r^{i} = 1} \right),P\left( {r^{i} = 0} \right)} \right\}} \right) . $$
(3)

According to Sect. 2.2, when the helper data w is disclosed, the min-entropy remained in the recovered response sequence y″ is H(y) + H(r) − #w. Assume the length of the generated key is l key , to make the key possess sufficient randomness, H(y″) should be equal or greater than m, i.e.

$$ H_{\infty } \left( y \right) + H_{\infty } \left( r \right) - \# w \ge l_{key} . $$
(4)

According to descriptions in Sect. 3.2, there is:

$$ {\text{p}}\left( {y^{i} = 1} \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{{N_{puf} }} \mathop \sum \nolimits_{j = 1}^{{N_{chan} }} \left( {y^{i} \left( {ins_{k} ,cha_{j} } \right) = = 1} \right)}}{{N_{puf} N_{chan} }} . $$
(5)
$$ {\text{p}}\left( {r^{i} = 1} \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{{N_{meas} }} (r_{k}^{i} \left( {ins,cha} \right) = = 1)}}{{N_{meas} }} . $$
(6)

P(ri = 1) and \( P\left( {r^{i} = 0} \right) \) are probabilities for the i th bit of response to equal 1 and 0 respectively. Respectively substitute them into formula (3), we can calculate H(y) and H(r). From formula (5) and (6), we can see that the randomness of response sequence y, which will be used to generate the private key, comes from the PUF, \( H_{\infty } \left( y \right) = H_{\infty } \left( {y_{ihrt} \left( {ins,cha} \right)} \right) \); as for response r that is used to generate random numbers, its randomness comes from random factors during multiple measure process, i.e. H(r) = H(e) and average \( p_{e} = \sum\nolimits_{i = 1}^{n} {{ \hbox{min} }\left\{ {P(r^{i} = 1), P\left( {r^{i} = 0} \right)} \right\}/n} \).

Define entropy density:

$$ \rho \left( r \right) = \frac{{H_{\infty } \left( r \right)}}{\# r} . $$
(7)

Let l y and l r represent the length of \( y \) and y r respectively, then H(y) = l y ρ(y) and H(y r ) = l r ρ(y r ), l y and l r should satisfy inequations:

$$ \frac{{l_{y} }}{n}\left[ {n\rho \left( y \right) + k - n} \right] \ge l_{key} . $$
(8)

and

$$ l_{r} \rho \left( {y_{r} } \right) \ge \frac{{l_{y} }}{n}k. $$
(9)

4 Implementation

To verify our scheme’s feasibility, we implement it on Xilinx KC705 FPGA boards.

4.1 Experiment Architecture

On the KC705 board, there is a Quad SPI flash memory which provides 128 Mb of nonvolatile storage. This flash is directly connected to the board’s FPGA. When the Quad SPI flash is used for configuring the FPGA, the flash start-up configuration file (mcs file) that contains both the hardware configuration file (bit file) and software executable file (elf file) will be read from the flash to configure the hardware and then the contained executable elf file will run on the Microblaze. Therefore, the mcs file can be regarded as the system’s firmware.

Therefore, the architecture of our experimental system is shown in Fig. 6, which consists of three parts: the enhanced PUF module, including a RO PUF instance, a SHA2-256 hash module and an ECC module that adopts Reed-Muller code, the Microblaze, a soft microprocessor core designed for Xilinx FPGAs and a Quad SPI flash with a SPI flash controller. Detailed workflows will be elaborated in Sect. 4.2 after all the parameters have been decided according to PUF’s actual properties.

Fig. 6.
figure 6

The architecture and workflows of our key protection scheme

4.2 PUF Design and Parameter Determination

We choose RO PUF to implement our scheme. Particularly, we adopt FROPUF proposed in [20]. This PUF fully utilizes configurable propagation delay of Look Up Table to improve RO PUF’s hardware efficiency. We implement 1024 such ROs on each KC705 board and each RO pair can generate a 16-bit response.

To determine l y , l r and ECC’s correction capability, we randomly choose 5000 challenges to investigate the PUF’s properties. Based on the obtained data, we figure out the PUF’s bit error rate p e  = 0.0235, entropy densities of PUF response sequences that used for key generation and random number generation are ρ(y) = 0.9839 and ρ(y r ) = 0.0376 according to formulas (5)–(7).

To generate 256-bit keys, i.e. l key  = 256, we choose SHA2-256 as the hash module. As the Reed-Muller code RM(1, m) is a binary [2m, m + 1, 2m−1] linear block code, substitute p e and the above parameters into formula (2), we get m ≥ 6. Therefore, we finally adopt RM(1, 6), whose code length \( n = 64 \,bit \) and the required random sequence in every block is 7-bit long, i.e. k = 7. Substitute n, k, l key and ρ(y) into in Eq. (8) we get l y  ≥ 2744.57. Let it be the smallest integer that can be divided evenly by n, there is l y  = 2816, l y /n = 44. Substitute them and other related parameters into inequation (9), we learn that the length of the random number sequence \( l_{y} \cdot k/n = 308 \) bit and l r  ≥ 8191.49, let l r  = 8192.

After we implement all the hardware designs, we find that the size of the generated mcs file is about 30.74 Mb. As we need 2816 bit response to generate the private key and every 16-bit response is corresponding to a 20-bit challenge, we totally need 2816 × 20/16 = 3520 bit challenge. Let the challenge length l chal be the smallest integer that can be divided evenly by 256, we get l chal  = 3584 and l chal /256 = 14.

The working process of the enhanced PUF module is demonstrated as follows:

Key Generation Phase:

  1. 1.

    Receive key generation command from the CPU, then read 35 Mb data from the SPI flash;

  2. 2.

    Divide the read data into 14 parts, hash each 2.5 Mb part in sequence to totally get a 3584-bit sequence, cut out 3520 bits as the challenge sequence;

  3. 3.

    Invoke the PUF instance by every 20-bit challenge successively to obtain a 2816-bit response sequence y;

  4. 4.

    Divide the 1024 ROs into 512 pairs, read the response of all the RO pairs to form an 8192-bit response sequence y r , then divide y r into four parts, hash every part to totally get 512 bits random number, cut 308 bits as the random number sequence r;

  5. 5.

    Use the ECC module to encode the random sequence r by every 7 bits, then sketch the response sequence y with encoded r and output the helper data;

  6. 6.

    Hash the response sequence y to acquire a 256-bit key and output it.

Key Recovery Phase:

  1. 1.

    Receive recovery command and helper data from the CPU, then read 35 Mb data from the SPI flash;

  2. 2.

    Divide the read data into 14 parts, hash each 2.5 Mb part in sequence to totally get a 3584-bit sequence, cut out 3520 bits as the challenge sequence;

  3. 3.

    Invoke the PUF instance by every 20-bit challenge successively to obtain a 2816-bit response sequence y’;

  4. 4.

    Sent the helper data to the ECC module;

  5. 5.

    Use the ECC module to recover the noisy response sequence y′ with the helper data and get recovered response sequence y″;

  6. 6.

    Hash the recovered response sequence y″ to acquire a 256-bit key and output it.

4.3 Experiment and Result

We generate 128 different elf files with Xilinx SDK. Functions of these 128 elf files are roughly the same but differ in detail. Then we pack these elf files respectively with the bit file to generate 128 mcs files. For comparison, we load these mcs files separately into the onboard SPI flash of two KC705 evaluation boards. Finally, we send commands by the upper computer to control the onboard system to generate a key and recover it for 100 times and record the generation and recovery results.

According to our experiment result, all the 256 generated keys (128 on each board) are 100% successfully recovered. The distributions of generated keys’ Hamming distances are demonstrated in Fig. 7. We mainly compare keys that generated from the same mcs file but on different boards and keys that generated on the same board but come from different mcs files. From the figure, we can conclude that changes either in the hardware or the firmware would lead to dramatical variation in the generated key.

Fig. 7.
figure 7

Distributions of generated keys’ Hamming distance

5 Conclusion

To protect the PUF-based generated key throughout the chip’s lifetime, we propose a novel key protection scheme, in which we bind the chip’s firmware and the embeded PUF to collaboratively generate the chip’s exclusive key. Before the valid firmware is completed, there is no legal key observed, our scheme thereby protects the system during the manufacturing and software development stages. After the software development stage, the system is sensitive to any change of the firmware code. The successful recovery of the legal key in return verifies the device and firmware’s legality and the hash module naturally forms a CPUF with the PUF instance, which further boosts the PUF’s resistance to attacks.