Deep PUF: A Highly Reliable DRAM PUF-Based Authentication for IoT Networks Using Deep Convolutional Neural Networks

Traditional authentication techniques, such as cryptographic solutions, are vulnerable to various attacks occurring on session keys and data. Physical unclonable functions (PUFs) such as dynamic random access memory (DRAM)-based PUFs are introduced as promising security blocks to enable cryptography and authentication services. However, PUFs are often sensitive to internal and external noises, which cause reliability issues. The requirement of additional robustness and reliability leads to the involvement of error-reduction methods such as error correction codes (ECCs) and pre-selection schemes that cause considerable extra overheads. In this paper, we propose deep PUF: a deep convolutional neural network (CNN)-based scheme using the latency-based DRAM PUFs without the need for any additional error correction technique. The proposed framework provides a higher number of challenge-response pairs (CRPs) by eliminating the pre-selection and filtering mechanisms. The entire complexity of device identification is moved to the server side that enables the authentication of resource-constrained nodes. The experimental results from a 1Gb DDR3 show that the responses under varying conditions can be classified with at least a 94.9% accuracy rate by using CNN. After applying the proposed authentication steps to the classification results, we show that the probability of identification error can be drastically reduced, which leads to a highly reliable authentication.


Introduction
A large number of modern cryptographic protocols are based on physical unclonable functions (PUF) implementations, which are used for key agreement and device authentication [1][2][3][4][5]. Memory-based PUFs are popular among other implemented PUFs due to being a major component in many electronic devices and requiring minimum (or no) additional circuit for PUF operation [6,7]. Dynamic random access memory (DRAM)-based PUFs provide large address space and utilize several controllable properties to generate unique identifiers for identification and authentication purposes [8][9][10][11]. Recently, researchers have proposed DRAM latency-based PUFs, which can be used to provide random device signatures by exploiting the timing parameters (e.g., activation (t RCD ), precharge time (t RP ), etc. [12,13]). Reliability and robustness are two fundamental properties of a desirable PUF, which prove the independence of output responses on internal/external noises and ambient conditions. Most of the existing PUFs use some post-processing techniques, which require helper data algorithms and complex error correction codes (ECCs) to extract reliable responses and conduct a proper authentication procedure [8,14]. However, these methods cause significant hardware/computational overheads and additional Non-volatile memory (NVM) to store helper data in addition to their security defects [15][16][17]. Most of the

•
We propose deep PUF as a two-stage mechanism, including multi-label classification and challenge verification, to provide a robust and lightweight device authentication without error-correcting codes and other pre-filtering methods.

•
We implement two types of latency-based proposals (t RCD and t RP PUFs) as the fast runtime accessible DRAM PUFs and analyze their characteristics to train the CNN. • Finally, we develop a CNN model using experimental data and analyze the robustness and security of the proposed deep PUF.
The remainder of this paper is organized as follows: Section 2 explains the background and motivations. Section 3 presents the proposed deep PUF. Sections 4 and 5 demonstrate DRAM experiments and CNN development results, respectively. Section 6 discusses deep PUF performance and security, and Section 7 concludes the paper and mentions future directions.

DRAM Operation and Timing Parameters
The hierarchy of a DRAM device organization is presented in Figure 1. As shown in Figure 1a, a DRAM cell is the lowest level of DRAM structure, which stores one bit of data based on the charge of its capacitor. A cell encodes value "1" when the capacitor is fully charged and value "0" as it is fully discharged. The cells are written to or read from using the access transistors, which are enabled by the wordline and the bitline connects the cells of each column. Figure 1b shows how these components form a two-dimensional subarray of a DRAM module. The combination of numerous subarrays forms a bank and banks are cooperated to organize a DRAM chip, as shown in Figure 1c. To activate a row, the row decoder enables the corresponding wordline, and then the stored information is transferred to the sense amplifiers. When the row is accessed, the data can be read/written using rd/wr commands [23]. Each read operation consists of multiple states, and the memory controller produces different commands and manages the states (see Figure 2). In the precharge state, all bitlines are precharged, and other open wordlines are deactivated after using the Sensors 2021, 21,2009 3 of 16 precharge (PRE) command. Before any rd/wr command can be generated, the corresponding wordline must be opened via the activation (ACT) command. Next, rd/wr commands can be sent to the opened row, subject to a minimum required time (activation time, t RCD ). For a subsequent read/write operation, it is necessary to issue a PRE command to deactivate the opened row. The next row will be accessible after a specified time named precharge time (t RP ). There are also other timing parameters such as t RAS and t CL that are used by the memory controller to manage DRAM operations [24][25][26][27].
decoder enables the corresponding wordline, and then the stored information is transferred to the sense amplifiers. When the row is accessed, the data can be read/written using rd/wr commands [23]. Each read operation consists of multiple states, and the memory controller produces different commands and manages the states (see Figure 2). In the precharge state, all bitlines are precharged, and other open wordlines are deactivated after using the precharge (PRE) command. Before any rd/wr command can be generated, the corresponding wordline must be opened via the activation (ACT) command. Next, rd/wr commands can be sent to the opened row, subject to a minimum required time (activation time, tRCD). For a subsequent read/write operation, it is necessary to issue a PRE command to deactivate the opened row. The next row will be accessible after a specified time named precharge time (tRP). There are also other timing parameters such as tRAS and tCL that are used by the memory controller to manage DRAM operations [24][25][26][27].

DRAM PUF Technologies
In this section, we explain the existing DRAM-based PUFs and their strategies to enhance robustness and reliability.
A variety of works have discussed the concept of using DRAM as the major hardware of most modern systems to organize an intrinsic PUF. DRAM cells do not have zero values at the start-up time and the capacitor in each cell is initialized to a random value caused by manufacturing variations, which is utilized to provide device signatures and configure a DRAM PUF [22]. The requirement of power cycles and a period of time before response decoder enables the corresponding wordline, and then the stored information is transferred to the sense amplifiers. When the row is accessed, the data can be read/written using rd/wr commands [23]. Each read operation consists of multiple states, and the memory controller produces different commands and manages the states (see Figure 2). In the precharge state, all bitlines are precharged, and other open wordlines are deactivated after using the precharge (PRE) command. Before any rd/wr command can be generated, the corresponding wordline must be opened via the activation (ACT) command. Next, rd/wr commands can be sent to the opened row, subject to a minimum required time (activation time, tRCD). For a subsequent read/write operation, it is necessary to issue a PRE command to deactivate the opened row. The next row will be accessible after a specified time named precharge time (tRP). There are also other timing parameters such as tRAS and tCL that are used by the memory controller to manage DRAM operations [24][25][26][27].

DRAM PUF Technologies
In this section, we explain the existing DRAM-based PUFs and their strategies to enhance robustness and reliability.
A variety of works have discussed the concept of using DRAM as the major hardware of most modern systems to organize an intrinsic PUF. DRAM cells do not have zero values at the start-up time and the capacitor in each cell is initialized to a random value caused by manufacturing variations, which is utilized to provide device signatures and configure a DRAM PUF [22]. The requirement of power cycles and a period of time before response

DRAM PUF Technologies
In this section, we explain the existing DRAM-based PUFs and their strategies to enhance robustness and reliability.
A variety of works have discussed the concept of using DRAM as the major hardware of most modern systems to organize an intrinsic PUF. DRAM cells do not have zero values at the start-up time and the capacitor in each cell is initialized to a random value caused by manufacturing variations, which is utilized to provide device signatures and configure a DRAM PUF [22]. The requirement of power cycles and a period of time before response generation to extract unbiased signatures are the most important challenges of this technique. Retention-based PUF is another well-studied mechanism, which generates DRAM PUF-based random and unique patterns by preventing the refresh operation for a period of time (waiting time). Retention-based PUFs require a long period of time to extract sufficient failure. This method exploits the pre-selection of blocks and helper data algorithms to extract the robust responses for key generation and authentication purposes. Therefore, retention-based PUFs include significant time, hardware and storage overheads.

Latency-Based DRAM PUFs
As mentioned in the DRAM organization, there are specified timing constraints to schedule the DRAM operations correctly. Altering these parameters can affect the reliability of DRAM information and result in data leakage. Latency-based structures benefit this feature to construct a PUF. The t RCD -based PUF is formed by reducing the minimum time period required to activate rows to be accessed [12]. This structure applies a filtering mechanism to eliminate unstable bits in different iterations to enhance the PUF's robustness and repeatability. A separate DRAM rank is needed to count and store the latency failures of each iteration. The evaluation time of PUF responses is noticeably increased due to the filtering phase. However, this mechanism is not adequate, and ECC approaches are still required to perform a reliable PUF.
Another technique is proposed in [13] that is based on t RP -reduction and disrupts precharge procedure to obtain erroneous data. The t RP -based technique categorizes the cells on the basis of their dependency on input patterns and measurements. Then, only the independent cells are qualified to be used. Next, the specific selection algorithm is designed to choose the acceptable cells and improve the robustness of PUF. In such a scenario, the CRP space is noticeably contracted, and the effects of environmental variations are not considered.

Post-Processing and Pre-Selection Algorithms
Most of the existing PUF technologies such as DRAM PUFs utilize helper data algorithms and ECCs to improve the reliability and robustness [14,19,20,[28][29][30]. However, using helper data may leak some information about the secret keys, and ECC circuits cause significant hardware and software overheads. Temporal majority voting (TMV) is one of the simplest ECCs, which is a repetition code and is based on sampling PUF cells multiple times and selecting the majority sample. Bose Chaudhuri Hocquenghem (BCH) codes are another popular ECC that are usually utilized as a final error-correction technique. It is inefficient to use BCH codes alone when the bit-error rate (BER) of native responses is high. Thus, some additional stages are required to be applied to raw PUF responses before using ECC mechanisms due to varying characteristics and unpredictable behaviors of DRAM cells in different measurements under normal/unstable ambient conditions.
As mentioned above, current DRAM PUFs, including latency-based ones, mask unstable and unsuitable cells using filtering processes or selection algorithms [12,13]. However, these solutions limit the CRP space by disqualifying the unacceptable cells and cause extra time and implementation overheads. Additionally, they need specific algorithms to determine the accurate location (address information) of selected cells to indicate them in the challenges [31,32].

Motivation
As stated in Section 2.2, the current solutions of error-reduction in DRAM PUFs cause additional costs and cannot be efficiently implemented, particularly when the PUF is embedded in a resource-restricted IoT device. Adding considerable hardware overheads and increasing the implementation complexity on the device side, as well as limiting the CRP space of DRAM PUF, are some of the most important disadvantages of existing error-reduction methods and the key motivations of this work. Last but not least, it has been shown that most of error-reduction schemes such as fuzzy extractors have their own security defects [14]. The deep PUF mechanism eliminates the need for errorreduction strategies and their overheads using a two-stage deep CNN-based mechanism for a lightweight authentication.

Proposed Deep PUF
Our proposed method substantially focuses on an authentication technique which is suitable for device identification with no extra overheads for resource-constraint nodes in an IoT network. In this work, we first generate DRAM PUF responses for different challenges over multiple measurements under various temperature conditions. We create a CRP database including each challenge with its corresponding responses (Section 4). Next, we take the advantages of CNNs to extract the shared features of generated responses as well as failure patterns. In such a way, the developed CNN learns the shared features of several responses generated for each challenge. Then, it will be able to recognize corresponding responses produced in all operating conditions. This method will help us to address the error-reduction issues and confront reliability requirements to organize an authentication technique. Deep PUF can be used as a standalone security mechanism or as a part of multi-factor authentication (MFA). The proposed authentication process is generally divided into two major stages: (i) an enrollment phase and (ii) an authentication phase. • Stability of operating conditions: locating the PUF device in a stable ambiance in which the variety of conditions (e.g., temperature, voltage) is not appreciable, causes more consistency inside each class and results in better accuracy. Due to the PUF sensitivity to environmental conditions, in an environment with varying temperatures, the number of bit failures in each measurement and the way the failures are distributed may cause samples to be far different than usual. In this case, deep PUF requires involving the responses of all possible temperatures to extract entire failure features, thereby leading an accurate classification.

•
Variety of blocks and input patterns: one scenario is to organize the classes using only a single memory block and writing different patterns into it as the challenges, and the other one is exploiting various blocks. If only one memory block is utilized to perform the PUF, it is necessary to provide the challenges based on different input data patterns. However, in the case of using multiple blocks, the challenges can be configured by the same data for all blocks.   Figure 4 shows the authentication procedure as well as the major parts of server and device. In each authentication request from the PUF device, the server sends one of the challenges (prearranged class labels) to the device to configure a particular DRAM block as a PUF. Next, the device generates the corresponding response and sends it to the trusted server. Then, the server authenticates the device in two steps:

Authentication Phase
1. The received raw bits are classified using CNN, structured during the enrollment phase. 2. The detected label is compared with the original challenge.
The device will be authenticated if the class label in which the response was categorized matches the original challenge. Otherwise, if the class of received response and the sent challenge are different, the authentication will be discarded, and the server will reject the device's request to exchange data. In this structure the burden of device authentication is completely moved to the server, which has almost no resource limitations. Therefore, deep PUF enables an authentication process for resource-constrained nodes without any extra implementation overheads. In the first stage, the characterization of DRAM responses based on a particular DRAM PUF technology (e.g., t RCD based, t RP based, etc.) over multiple iterations and under various ambient conditions is analyzed. Then, considering the necessary features to develop a successful classifier, challenges are selected, which contain the address of memory blocks and the input data patterns. The output responses as well as failure bits for each challenge are categorized without any modification (see Figure 3a). The number of measurements to obtain the comprehensive features of whole possible responses for each challenge can be effectively set based on intrinsic robustness evaluation.
To organize the training dataset for CNN, we transform binary responses into twodimensional arrays of unsigned integers and finally gray-scale images via the visualization phase (see Figure 3b). Therefore, the required dataset is constructed during the first and second steps. After creating the dataset of the PUF device, it is necessary to extract the chief and common features of the samples (responses) of each class (challenge) by training a CNN. The CNN developing procedure, including the hyper-parameter settings, should be optimized in consideration of PUF characteristics. The most important and influential • Robustness: determines the effects of different operating conditions on output responses. This property affects the similarity of samples in a single class and accuracy of classification results. Robustness of DRAM PUF can be calculated using intra-Hamming distance (HD) or intra-Jaccard index values. • Uniqueness: enough difference between two responses using two distinct DRAM blocks results in uniqueness. This factor shows the difference of samples belonging to separate classes and can be determined by computing inter-class HD. Figure 3c depicts the developed deep CNN which is trained on the generated dataset and learns the failures behavior under various measurements.
We recognize two major variables which significantly affect the classification accuracy and finally the authentication performance.

•
Stability of operating conditions: locating the PUF device in a stable ambiance in which the variety of conditions (e.g., temperature, voltage) is not appreciable, causes more consistency inside each class and results in better accuracy. Due to the PUF sensitivity to environmental conditions, in an environment with varying temperatures, the number of bit failures in each measurement and the way the failures are distributed may cause samples to be far different than usual. In this case, deep PUF requires involving the responses of all possible temperatures to extract entire failure features, thereby leading an accurate classification. • Variety of blocks and input patterns: one scenario is to organize the classes using only a single memory block and writing different patterns into it as the challenges, and the other one is exploiting various blocks. If only one memory block is utilized to perform the PUF, it is necessary to provide the challenges based on different input data patterns. However, in the case of using multiple blocks, the challenges can be configured by the same data for all blocks. Figure 4 shows the authentication procedure as well as the major parts of server and device. In each authentication request from the PUF device, the server sends one of the challenges (prearranged class labels) to the device to configure a particular DRAM block as a PUF. Next, the device generates the corresponding response and sends it to the trusted server. Then, the server authenticates the device in two steps:
The received raw bits are classified using CNN, structured during the enrollment phase.

2.
The detected label is compared with the original challenge.
The device will be authenticated if the class label in which the response was categorized matches the original challenge. Otherwise, if the class of received response and the sent challenge are different, the authentication will be discarded, and the server will reject the device's request to exchange data. In this structure the burden of device authentication is completely moved to the server, which has almost no resource limitations. Therefore, deep PUF enables an authentication process for resource-constrained nodes without any extra implementation overheads.

DRAM Experiments and Observations
In this section, we present DRAM PUF implementation results and check the behavior of responses that are generated using latency-based technologies (i.e., tRCD and tRP PUFs). The experimental evaluations are conducted using a DDR3 DRAM module. Figure 5 shows our experimental setup. We examine the characteristics of both latency PUFs to make a better decision considering the CNN requirements. We read DRAM values in different conditions to evaluate the robustness and uniqueness of DRAM blocks.  Table 1 shows the parameter values of our experiments that are the same for both evaluated structures. To measure the robustness of each PUF, we extract multiple responses over several iterations at varying temperatures (25-55 °C). Figure 6a shows the

DRAM Experiments and Observations
In this section, we present DRAM PUF implementation results and check the behavior of responses that are generated using latency-based technologies (i.e., t RCD and t RP PUFs). The experimental evaluations are conducted using a DDR3 DRAM module. Figure 5 shows our experimental setup. We examine the characteristics of both latency PUFs to make a better decision considering the CNN requirements. We read DRAM values in different conditions to evaluate the robustness and uniqueness of DRAM blocks.

DRAM Experiments and Observations
In this section, we present DRAM PUF implementation results and check the behavior of responses that are generated using latency-based technologies (i.e., tRCD and tRP PUFs). The experimental evaluations are conducted using a DDR3 DRAM module. Figure 5 shows our experimental setup. We examine the characteristics of both latency PUFs to make a better decision considering the CNN requirements. We read DRAM values in different conditions to evaluate the robustness and uniqueness of DRAM blocks.  Table 1 shows the parameter values of our experiments that are the same for both evaluated structures. To measure the robustness of each PUF, we extract multiple responses over several iterations at varying temperatures (25-55 °C). Figure 6a shows the  Table 1 shows the parameter values of our experiments that are the same for both evaluated structures. To measure the robustness of each PUF, we extract multiple responses over several iterations at varying temperatures (25-55 • C). Figure 6a shows the intra-Jaccard of PUF responses using both t RCD and t RP reduction-based methods. The intra-Jaccard index determines the similarity of two PUF responses for the same challenge. This is calculated as R 1 ∩ R 2 R 1 ∪ R 2 for two sets of responses, where R 1 ∩ R 2 indicates the size of the shared failures and the R 1 ∪ R 2 is the total number of failures in R 1 and R 2 . A Jacard index close to 1 shows the more similarity between R 1 and R 2 . In this work, this metric is used to check the repeatability and robustness of DRAM PUF responses. These results are based on the average values that we have gathered by checking multiple samples at each temperature. We also have tested the sensitivity of PUF responses to temperature variations by intra-HD calculations; the results are shown in Figure 6b, indicating the reliability of DRAM PUF responses and also the similarity of samples creating a class. Another principal factor affecting the performance of deep PUF is uniqueness, which measures the difference between failure distributions into two different memory blocks. We have analyzed this factor by comparing multiple samples belonging to various blocks of the DRAM module using inter-HD. Table 2 presents the average uniqueness for t RCD and t RP -based methods considering the average number of bit failures in each block. intra-Jaccard of PUF responses using both tRCD and tRP reduction-based methods. The intra-Jaccard index determines the similarity of two PUF responses for the same challenge. This is calculated as ∩ ∪ for two sets of responses, where R ∩ R indicates the size of the shared failures and the R ∪ R is the total number of failures in R and R . A Jacard index close to 1 shows the more similarity between R and R . In this work, this metric is used to check the repeatability and robustness of DRAM PUF responses. These results are based on the average values that we have gathered by checking multiple samples at each temperature. We also have tested the sensitivity of PUF responses to temperature variations by intra-HD calculations; the results are shown in Figure 6b, indicating the reliability of DRAM PUF responses and also the similarity of samples creating a class. Another principal factor affecting the performance of deep PUF is uniqueness, which measures the difference between failure distributions into two different memory blocks.
We have analyzed this factor by comparing multiple samples belonging to various blocks of the DRAM module using inter-HD. Table 2 presents the average uniqueness for tRCD and tRP-based methods considering the average number of bit failures in each block.    After analyzing the robustness and uniqueness for both t RCD and t RP PUFs, we realize that they have desirable characteristics to develop a classifier and organize deep PUF. These characteristics include the similarity among the samples into each class and variety among samples from different classes. Table 3 summarizes the generic HD values for stable and unstable conditions, which can be two possible scenarios during a deep PUF configuration. We focus on the t RCD -based PUF that comparatively has more intra-class consistency. Table 3. Inter-class (HD) and intra-class HD, considering environmental conditions.

Dataset Creation
Each DRAM PUF is configured by sending M challenges to the device and generating N responses for each of them. The CNN is trained on and organized to classify these challenges from N × M responses. Each challenge is defined as a class label, and the corresponding responses are the class samples. The value of N can be adjusted by the total number of classes and the environmental conditions, which significantly determine the consistency of samples in each class. In order to examine the effect of important variables, including stability of ambient conditions and variety of input patterns (see Section 3), we generate four datasets considering the following scenarios:

1.
The same input pattern (all "1"s) is used to characterize all blocks and the operating conditions are stable (room temperature and nominal voltage).

2.
Different input patterns (0x00, 0x01 . . . 0xFF) are used for different blocks and the conditions are stable.

3.
The same input pattern is used to characterize all blocks and the operating conditions are unstable.

4.
Different input patterns are used for different blocks and the conditions are unstable.
The inputs of the network are visualized DRAM data, which are converted to grayscale images, as demonstrated in Section 3. The samples of different classes are randomly shuffled and each dataset is divided into 80% training, 20% testing data. Table 4 includes the main features of generated datasets.

Training the Classifier
The proposed deep PUF consists of convolution, max-pooling and fully connected layers. The summary of our network model is presented in Table 5. The activation function of all layers except the output layer is the rectified linear unit (ReLU). Classification is performed by determining the probability of different classes using the Softmax. In this classifier, we utilize categorical cross-entropy as the loss function and the Adam algorithm as the optimizer. Algorithm 1 shows the proposed scheme in form of a pseudo-code containing dataset generation, training process and testing process. Gain the features → f i ; // after applying the defined layers Assign the features to the label (f i , y i ); end for for j = 1 to N do Build collection of features for each class→ (F j ,Y j ); end for Build F = (F 1 , F 2 , . . . , F N ) Update the features → F ; end for Build the final collection of features Output:

Performance Metrics
The proposed network is simulated using the Keras library of Python and TensorFlow backend. In this work, we analyze CNN performance considering two major variables described in Section 3, using datasets based on four presented scenarios (see Section 5.1). Table 6 shows the accuracy results of training the network. The influence of using the data augmentation technique is examined and is presented in Table 6. This technique improves the accuracy of classification by expanding the training dataset. In the worst scenario (unstable environments and the same inputs), the accuracy of classification is 92.29%, which can achieve 97.79% in the case of applying different input patterns in a stable condition.
Additionally, for a classification problem with N challenges, the number of samples for each label (M) is an influential parameter to achieve better accuracy. In this work, the authors have accomplished the classification process using 90 samples for each class, which is a reasonable decision for a PUF-based mechanism leading to a cost-effective enrollment procedure. However, it is functional to add more samples to each class in order to generate a more comprehensive dataset and achieve satisfactory accuracy depending on the application. Figure 7 illustrates the average accuracy of classification after 60-epoch training as a function of the number of samples in each class, considering the different number of classes. The experiment is performed by writing the same input pattern into various memory blocks at room temperature. The results indicate that it is feasible to achieve an error less than 10 −1 and even near 10 −2 by adjusting the number of measurements during the enrollment phase. various memory blocks at room temperature. The results indicate that it is feasible to achieve an error less than 10 −1 and even near 10 −2 by adjusting the number of measurements during the enrollment phase.

Security and Robustness
The security and robustness of authentication mechanisms are generally measured using two popular metrics: the false acceptance rate (FAR) and the false rejection rate (FRR). Generally, these two undesirable errors are defined considering the major PUF properties (intra-HD and inter-HD), and there is a tradeoff between them which can be controlled by a threshold for obtaining a suitable FAR and FRR [33]. The threshold is determined by effective parameters depending on the application. In this paper, FAR refers to the probability that a wrong response is verified as the true response from the target device and FRR is the probability of wrongly rejecting the target entity's response. Based on the proposed authentication mechanism, the classification of received response is a major stage in accepting or rejecting it. When the server sends a challenge to the target device, the probability that the corresponding response is rejected directly depends on the result of classification of the response, which is compared to the original class in the next stage. Thus, the accuracy of classification significantly affects the FRR, and the probability of misclassification determines the FRR. However, the FAR value is not directly influenced by the rate of classification error and depends on CNN features. Assuming that a wrong response from an invalid device is received, it can be classified in each class with the same probability due to the uniqueness of PUF responses generated by different DRAM blocks. Therefore, the FAR is about 1/N, where N indicates the number of classes of the trained CNN. In Figure 8, the values for the FAR and FRR for deep PUF are shown and compared to some other PUFs. The threshold used for controlling the tradeoff between FAR and FRR can be determined by the number of classes. With a smaller number of classes (N = 60), the FAR and FRR achieve an equal value (0.016), but as N increases, the FAR and the accuracy of classification decrease leading to a higher FRR. However, the dependence of accuracy on other features (e.g., number of samples in each class and CNN

Security and Robustness
The security and robustness of authentication mechanisms are generally measured using two popular metrics: the false acceptance rate (FAR) and the false rejection rate (FRR). Generally, these two undesirable errors are defined considering the major PUF properties (intra-HD and inter-HD), and there is a tradeoff between them which can be controlled by a threshold for obtaining a suitable FAR and FRR [33]. The threshold is determined by effective parameters depending on the application. In this paper, FAR refers to the probability that a wrong response is verified as the true response from the target device and FRR is the probability of wrongly rejecting the target entity's response. Based on the proposed authentication mechanism, the classification of received response is a major stage in accepting or rejecting it. When the server sends a challenge to the target device, the probability that the corresponding response is rejected directly depends on the result of classification of the response, which is compared to the original class in the next stage. Thus, the accuracy of classification significantly affects the FRR, and the probability of misclassification determines the FRR. However, the FAR value is not directly influenced by the rate of classification error and depends on CNN features. Assuming that a wrong response from an invalid device is received, it can be classified in each class with the same probability due to the uniqueness of PUF responses generated by different DRAM blocks. Therefore, the FAR is about 1/N, where N indicates the number of classes of the trained CNN. In Figure 8, the values for the FAR and FRR for deep PUF are shown and compared to some other PUFs. The threshold used for controlling the tradeoff between FAR and FRR can be determined by the number of classes. With a smaller number of classes (N = 60), the FAR and FRR achieve an equal value (0.016), but as N increases, the FAR and the accuracy of classification decrease leading to a higher FRR. However, the dependence of accuracy on other features (e.g., number of samples in each class and CNN model) can address this issue and enable a desirable FRR. In such a scenario, it will be possible to minimize both the FAR and FRR considering the major features, such as increasing the accuracy and the number of CNN classes simultaneously.
creasing the accuracy and the number of CNN classes simultaneously.
More generally, when a device is being authenticated, the probability of error during the response verification is influenced by both inter-device HD and inter-class HD. in Deep PUF, generating enough samples of each class using different ambient conditions, provides more involvement and minimizes the FRR. However, in other mechanisms, the response space is not large enough, and it is difficult to control both these errors.

Performance Comparisons
Deep PUF employs the tRCD-based PUF mechanism to generate raw DRAM data, proposed in [12]. This method uses a filtering procedure to extract the reliable cells and form the output response, which significantly increases the evaluation period. Deep PUF enables a lower evaluation time than tRCD-based PUF technology due to removing the filtering mechanism. The evaluation period of deep PUF can be measured in a way similar to tRCDbased PUFs, expressed by Equation (1).
We also experimentally measure the evaluation time of deep PUF to confirm Equation (1). The average result of multiple evaluations is 0.95ms, which is almost equal to the value calculated by Equation (1). This period is much lower than tRCD-based PUF's evaluation time, which is 88.2ms. Note that the evaluation time has been measured for the PUF operation on the device and does not include the time of authentication process on the server side. Furthermore, tRCD-based PUF needs at least two DRAM ranks: one for PUF operation and one for counting the latency failures. The proposed deep PUF is operational with only one rank and is appropriate for low-cost systems. Additionally, both tRCD-based [12] and tRP-based PUFs [13] require post-processing error correction algorithms that cause significant time and hardware overheads. On the other hand, retention-based PUFs [12,28] require a long period of time (order of minutes) to extract sufficient failure bits and generate reliable signatures, which makes the DRAM rank unavailable for a long time. More generally, when a device is being authenticated, the probability of error during the response verification is influenced by both inter-device HD and inter-class HD. In Deep PUF, generating enough samples of each class using different ambient conditions, provides more involvement and minimizes the FRR. However, in other mechanisms, the response space is not large enough, and it is difficult to control both these errors.

Performance Comparisons
Deep PUF employs the t RCD -based PUF mechanism to generate raw DRAM data, proposed in [12]. This method uses a filtering procedure to extract the reliable cells and form the output response, which significantly increases the evaluation period. Deep PUF enables a lower evaluation time than t RCD -based PUF technology due to removing the filtering mechanism. The evaluation period of deep PUF can be measured in a way similar to t RCD -based PUFs, expressed by Equation (1) We also experimentally measure the evaluation time of deep PUF to confirm Equation (1). The average result of multiple evaluations is 0.95ms, which is almost equal to the value calculated by Equation (1). This period is much lower than t RCD -based PUF's evaluation time, which is 88.2ms. Note that the evaluation time has been measured for the PUF operation on the device and does not include the time of authentication process on the server side. Furthermore, t RCD -based PUF needs at least two DRAM ranks: one for PUF operation and one for counting the latency failures. The proposed deep PUF is operational with only one rank and is appropriate for low-cost systems. Additionally, both t RCD -based [12] and t RP -based PUFs [13] require post-processing error correction algorithms that cause significant time and hardware overheads. On the other hand, retention-based PUFs [12,28] require a long period of time (order of minutes) to extract sufficient failure bits and generate reliable signatures, which makes the DRAM rank unavailable for a long time.

Security Discussion and Countermeasures against Possible Attacks
In this authentication structure, the server only stores the list of original challenges and the information of trained CNN. Therefore, the responses are not stored in the server storage. This property improves the security against the insider attack that can be executed by a malicious entity with authorized server data access.
Snooping-based and modeling attacks take place when multiple CRPs related to the same memory block are accessed and learned by the adversary [34]. These attacks can be prevented by exploiting various and separate blocks of the DRAM during the deep PUF configuration. Since, as the implementation results indicate, different memory blocks entail various features, it is difficult to model all blocks of a DRAM chip using the limited leaked CRPs.
PUF re-use attacks are theoretically possible in the deep PUF-based authentication mechanism, as the server may send identical challenge to a PUF device. In such a way, if the adversary can intercept in the process, the authentication system can suffer from re-use attacks. One alternative way to avoid such attacks is to utilize a simple encryption algorithm to enhance the security of transmitted data [18]. One-time-use protocol is another way, which can be employed to prevent re-use attacks considering the application requirements [35]. Another solution is to include erasability and certifiability as two additional features in PUF application [36]. The property of certifiabilty provides an offline certification to check and verify the expected features from the PUF responses. Erasability can be conducted using reconfiguration methods, which are compatible with DRAM PUF organization. We leave the implementation and validation of these techniques to future work.

Conclusion and Future Work
In this paper, we present a new DRAM PUF-based authentication method that exploits a deep CNN to configure a strong PUF with an expanded CRP space and light-weight implementation. This method eliminates additional error-correction mechanisms and their overheads. The experimental analysis of DRAM latency-based PUFs with their characterizations under various conditions and the feasibility of developing a precise CNN are elaborated in this paper. We organize a CNN-based classifier using real DRAM data. We examine the effects of major parameters by generating four datasets based on four key scenarios and applying them to CNN. Based on simulation results, we show that in the case of using various input patterns for different blocks at varying ambient conditions, the proposed classifier can achieve 94.9% accuracy. We also propose a twostep authentication technique, including response classification and label verification. This method can noticeably minimize identification errors and leads to higher reliability than classification accuracy. The proposed scheme can be employed as a stand-alone mechanism or as a part of multi-factor authentication. Additionally, our proposed deep PUF moves all implementation overheads to the server and is appropriate for low-cost and resource-constrained devices. Finally, we demonstrate that deep PUF significantly reduces the evaluation time and hardware overheads compared with the existing DRAM PUFs. One direction for the future work may be derived from studying how to extend the proposed approach to other strong PUFs and analyzing existing techniques to protect deep PUF against possible attacks to achieve a secure and light-weight authentication protocol.