Performance Analysis of Secure Elements for IoT

: New protocol stacks provide wireless IPv6 connectivity down to low power embedded IoT devices. From a security point of view, this leads to high exposure of such IoT devices. Consequently, even though they are highly resource-constrained, these IoT devices need to fulfil similar security requirements as conventional computers. The challenge is to leverage well-known cybersecurity techniques for such devices without dramatically increasing power consumption (and therefore reducing battery lifetime) or the cost regarding memory sizes and required processor performance. Various semi-conductor vendors have introduced dedicated hardware devices, so-called secure elements that address these cryptographic challenges. Secure elements provide tamper-resistant memory and hardware-accelerated cryptographic computation support. Moreover, they can be used for mutual authentication with peers, ensuring data integrity and confidentiality, and various other security-related use cases. Nevertheless, publicly available performance figures on energy consumption and execution times are scarce. This paper introduces the concept of secure elements and provides a measurement setup for selected individual cryptographic primitives and a Datagram Transport Layer Security (DTLS) handshake over secure Constrained Application Protocol (CoAPs) in a realistic use case. Consequently, the paper presents quantitative results for the performance of five secure elements. Based on these results, we discuss the characteristics of the individual secure elements and supply developers with the information needed to select a suitable secure element for a specific application.


Introduction
In a typical, straightforward IoT application, a sensor node is sending data to an application server. Nowadays, in many cases, such a node connects to a proprietary, local network with a gateway providing connectivity to the internet by protocol translation. Due to this non-transparent access from the internet, there is little need to implement performance-hungry security measures on the resource-constrained sensor node itself. In contrast, the gateway with its lower constraints on resources can, to a certain extent, isolate the sensor node through security measures, such as firewalls and standard cryptographic algorithms.
Interestingly, IoT networks like Thread [1] introduce native Internet Protocol version 6 (IPv6) connectivity down to the low-power sensor nodes. While this transparency offers many new opportunities to access these resource-constrained devices, in turn, it now requires them to provide a similar level of security as conventional computers. However, cryptographic algorithms demand complex calculations with long execution times when performed on a microcontroller unit (MCU) with little processing power. These algorithms require a high amount of energy and thus reduce battery lifetime. Moreover, conventional security algorithms (e.g., Rivest-Shamir-Adleman RSA) exacerbate this effect as they were IoT 2022, 3 not designed to run on resource-constrained devices. Thus, various semiconductor vendors have developed dedicated hardware for MCUs to support the execution of cryptographic algorithms. These so-called secure elements improve the overall security of an IoT device and reduce the energy consumed by cryptographic computations. This paper introduces the concept of secure elements and shows their opportunities as well as their challenges. Specifically, we describe five secure elements in detail with appropriate energy and execution time measurements. The provided data supports the selection of a suitable secure element for individual IoT applications.
This paper is structured accordingly: Section 2 highlights the motivation behind this project. Section 3 contains the technical background (secure elements, Thread protocol, CoAP) needed to understand the rest of the paper. Section 4 lists the five evaluated secure elements and all the required soft-and hardware used for the evaluation. The measurements and subsequent discussion are located in Sections 5 and 6, respectively. Section 7 describes the key findings gathered during this project, Section 8 draws attention to related work and similar studies, and Section 9 draws appropriate conclusions.

Motivation
Although secure elements have been available for a few years, there are still few to no quantitative comparisons between chips from different manufacturers. Arguably, this is primarily because manufacturers are very reluctant to provide information about the hard-and software of their secure elements. Until recently, one usually had to sign a non-disclosure agreement (NDA) before receiving the documentation and vendor-specific software development kit (SDK) required to use the secure element. Although more and more manufacturers are now moving away from these prohibitive practices, there is still a lack of quantitative comparisons that help embedded security engineers decide on which secure element best fits the task at hand. Embedded security is already lagging behind the current emergence of IoT, which can be seen, among other things, in the lack of large and active developer communities. Understandably, a lack of information does not contribute positively to this situation. Therefore, our primary motivation for this project is to make such information publicly available and contribute to the security of IoT systems.

Secure Elements
A secure element is an integrated circuit (IC) for executing cryptographic algorithms (mostly) in hardware. Such algorithms typically include the advanced encryption standard (AES), elliptic curve cryptography (ECC), the elliptic curve digital signature algorithm (ECDSA), secure hashing algorithms (SHA), message authentication codes (MAC), and more depending on the feature set of the secure element. They are designed to execute these computationally intensive algorithms fast and with minimal energy. Furthermore, they need to be connected to an MCU over a serial interface and function as a cryptographic coprocessor.
In contrast, more and more MCUs with built-in security peripherals also implement such algorithms in hardware. They have an advantage concerning execution time and energy consumption compared to secure elements as the security peripherals are connected to the internal data bus and not to a comparatively slow external interface. For such MCUs, it is common practice to encrypt sensitive key material with a master key before storing it in the flash memory. However, even though the sensitive data is encrypted, such MCUs usually provide little to no tamper detection as well as no tamper-resistant memory.
Conversely, most secure elements provide tamper-resistant memory, multiple tamper detection mechanisms like active shielded circuits, voltage and temperature monitoring, and inputs for user-defined tamper sensors. These features are a clear added benefit over using MCUs with built-in security peripherals. Additionally, they provide sophisticated countermeasures against side-channel attacks and have some level of Common Criteria certification.
A (D)TLS handshake is a likely use case for secure elements, which is usually implemented in two distinct ways. On the one hand, some secure elements support an MCU in the execution of cryptographically complex computations. However, the complete (D)TLS stack must still be present in the MCU to handle the various messages correctly. Conversely, other secure elements can handle the entire (D)TLS session independently. In this case, the MCU does not have to process the (D)TLS messages but only forwards them in both directions. The execution of all security-related functions within the secure element naturally offers higher security. Nevertheless, these secure elements can also support a (D)TLS stack running on the MCU, making them very versatile. Further theoretical foundations concerning secure elements can be found in our previously released papers: chapter II "Secure Elements" in "Security on IoT Devices with Secure Elements" [2] and "Securing the IoT: Introducing an Evaluation Platform for Secure Elements" [3].

Constrained Application Protocol
The Constrained Application Protocol (CoAP) is specified by RFC7252 [4] and is a document transfer protocol like Hypertext Transfer Protocol (HTTP). However, as the name implies, it is specifically designed for constrained devices. Bit fields and mappings are used to keep the packets as small as possible, and User Datagram Protocol (UDP) is used instead of Transmission Control Protocol (TCP). CoAP is based on a simple client/server model and follows the RESTful paradigm. The secure Constrained Application Protocol (CoAPs) utilizes DTLS and guarantees confidentiality, integrity and authenticity of the CoAP packets.

Thread Protocol
Thread is a wireless network protocol for embedded IoT devices building on the Institute of Electrical and Electronics Engineers (IEEE) 802.15.4 [5] standard. It provides a mesh network for resource-constrained devices where each device has IPv6 connectivity. The devices within the Thread network have access to the internet via border routers. This new connectivity provides new opportunities such as service discovery for each device or the possibility to establish a real end-to-end encrypted channel between an end device and an application server. Figure 1 shows a setup of such a Thread network and a corresponding application server residing outside of that network.

Test Cases
We define two specific test cases for our performance evaluation in terms of execution time and energy consumption. While the first test case focuses on cryptographic primitives, the second one features a practical application. Specifically, a node and a server establish a secure session by executing a DTLS handshake. To establish a baseline reference without MCU-external hardware acceleration, we measure both test cases with the cryptographic functions running in software on the MCU using MbedTLS [6]. In contrast, subsequent measurements make use of the evaluated secure elements. The results allow a direct comparison to the reference as well as between devices.

Cryptographic Primitives
This test case assesses the performance of the individual secure elements concerning the following cryptographic primitives: • Generate a random number (32 bytes) • Generate an ECC key pair (secp256r1 [7]) • Calculate the SHA-256 hash of the random number • Sign the hash with ECDSA (using key pair from before) • Verify the signature (using key pair and hash from before) secp256r1 is one of the most common elliptic curves and thus has been selected for this performance evaluation. Furthermore, curves with larger bit-fields were not evaluated as they are rarely seen in IoT.
Moreover, the generation of the random number with the reference implementation requires the following remarks: MbedTLS contains a block-cipher counter-mode based deterministic random bit generator (CTR_DRBG) specified in NIST SP800-90A [8]. This cryptographically-secure pseudorandom number generator (CSPRNG) requires a strong external entropy source for periodical reseeding. Thus, the reference implementation does not purely rely on software, as in this case, the true random number generator (TRNG) in the MCU itself is used as an entropy source. We will discuss this further in Section 6.1.

DTLS Handshake
In this test case, a DTLS handshake with a server demonstrates the impact of the secure elements as external crypto accelerators on execution time and energy consumption. We have chosen TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 as the cipher suite for the DTLS handshake as this suite is available in all the evaluated secure elements. Therefore, this choice facilitates a fair comparison even though there are arguably better cipher suites from a security standpoint. Lastly, CoAPs is used as the application protocol for getting some dummy data from the server.

Test Setup
In order to keep the complexity of the network as low as possible, the Thread network only consists of a border router and a secure node. Furthermore, the CoAPs server is running directly on the border router itself to exclude as many unknowns as possible, which could arbitrarily affect the handshake in any way. Figure 3 illustrates this test setup.   Figure 4 depicts the secure node, which consists of a Nordic nRF52840 development kit (DK) [9] and the secure element shield, which is a self-designed PCB with the form factor of an Arduino shield on which all evaluated secure elements are present. Additionally, a Raspberry Pi 3 Model B+ [10] and an nRF52840 dongle [11] (used as a radio co-processor for IEEE 802.15.4) serve as the required border router and CoAPs server.

Evaluated Secure Elements
This paper evaluates the following secure elements:  [19,20] These five secure elements have been selected as they list similar use cases and target applications in their data sheets. Furthermore, they all support the specified algorithms and the corresponding elliptic curve required for performing the cryptographic primitives, which allows the direct comparison of the results.

Supported Cryptographic Primitives
Although all these secure elements provide cryptographic primitives like symmetric encryption with AES, various ECC algorithms (ECDH, ECDHE, ECDSA, . . . ), secure hashing algorithms (e.g., SHA-256) and message authentication (e.g., HMAC-SHA-256), essential differences exist. These include AES modes, hash and block cipher sizes, and supported elliptic curves. Thus, it is essential to check if the selected secure element provides the required functionality prior to including it in a new design.
Furthermore, the OPTIGA™ Trust X contains a complete (D)TLS stack and can handle the entire handshake on its own by creating and parsing the messages without the help of the MCU. Thus, the MCU does not have to process the (D)TLS messages and only transmits them between the secure element and the (D)TLS peer. Conversely, the other secure elements support the MCU by accelerating the computationally expensive cryptographic algorithms while still requiring a (D)TLS stack to be present in the MCU for creating and parsing the actual (D)TLS packets (i.e., ClientHello, ServerHello, . . . ).

Memory Structure
The secure elements differ not only in the amount of memory available but also in their structure. While the ATECC608B, OPTIGA™ Trust M, and OPTIGA™ Trust X feature pre-partitioned slots with explicit restrictions on what type of data can be stored within them, the MAXQ1061 and the SE050 allow the users to define the type and size of the storage objects individually.

Available Interfaces
All evaluated secure elements feature an I2C interface. The OPTIGA™ Trust M, OPTIGA™ Trust X, and the ATECC608B allow an I2C clock speed of up to 1 MHz. Moreover, the latter also allows the use of a significantly slower Single-Wire Interface (SWI). The SE050 does not contain any other serial interface than I2C, but it features the fastest I2C clock speed at 3.4 MHz. In contrast, the MAXQ1061 has the slowest maximum I2C clock speed at 400 kHz. However, it is the only secure element containing an SPI peripheral, and with a maximum clock speed of 4 MHz, it provides the fastest serial interface of them all.
We strongly recommend encrypting the communication between the host MCU and the secure element. Even though the symmetric keys used for such secure channels reside somewhere in the memory of the MCU and are at risk of being extracted by an attacker, it significantly increases the level of security. This increase is because the attacker requires drastically more time and effort as merely probing the serial interface is not enough anymore. Therefore, it mostly mitigates the risk of easy eavesdropping but also more dangerous attacks like tampering with the random numbers, which would severely impede the reliability of certain cryptographic algorithms.

Energy Consumption and Sleep Modes
Another critical aspect of secure elements is their energy consumption during computation, idling, and sleeping. Their active supply current, sleep current, wake-up delay, as well as other characteristics, may differ widely (up to orders of magnitude).

Provisioning
All manufacturers provide a provisioning service that can inject unique credentials into the secure elements at secure manufacturing facilities. However, the ATECC608B is the only secure element that must be provisioned and locked before any of the functionalities (e.g., getting random numbers from the TRNG) can be used. On the one hand, this slightly complicates the development stage, but it also increases the overall security as the chip cannot be used without being locked first.

Compliance
Compliance with known certifications is another important aspect of secure elements, as this might be a hard requirement depending on the target application. The Common Criteria (CC) [21] provide a well-known standard within the field of Information Security. The hardware and firmware of the SE050 and the hardware of the OPTIGA™ Trust M and OPTIGA™ Trust X are CC EAL6+ certified. Although compliant development and design practices for CC EAL4+ have been used on the MAXQ1061, which would suggest compliance, Maxim has not performed any actual evaluation and thus cannot prove the compliance by document (W. Pflästerer, personal communication, 6 June 2021). The ATECC608B is the only evaluated secure element that does not state any compliance to CC certifications. Moreover, as true randomness is fundamental in cryptography, the compliance of the TRNGs with further certification is another key characteristic. While the MAXQ1061 does not state a compliance with any certification, the TRNGs of the other secure elements are either compliant with NIST SP 800-90A/B/C [8,22,23] or AIS-31 [24].

Resources for Development
Lastly, the secure elements also differ in how much resources are available. The OPTIGA™ Trust M and OPTIGA™ Trust X provide a lot of publicly available documentation, examples, and ready-to-use driver implementations (e.g., Nordic nRF5 SDK, Zephyr, . . . ). In contrast, the full data sheet and SDK of the MAXQ1061 are still under NDA at the time of writing this paper, with the remaining secure elements landing somewhere between these two.

Selected Communication and Sleep Modes
In order to make the measurements uniform, we chose an I2C clock speed of 400 kHz, as this is the fastest possible communication that all secure elements support. The evaluated secure elements require the MCU to poll their busy state over I2C. Moreover, the MCU is not put into sleep mode during the secure elements' execution of the cryptographic primitives.

Secure Node Firmware
Zephyr [25] is an open-source real-time operating system for resource-constrained devices (mainly microcontrollers) with a focus on connectivity. It integrates various protocol stacks, and supports a wide range of architectures and processor families, allowing you to compile the same code for a wide range of MCUs. This abstraction from the hardware and communication interfaces makes this RTOS an excellent choice for IoT projects. The Zephyr project is part of the Linux Foundation, and this is visible in its structure. Among other things, it has a device driver model with clear parallels to conventional Linux. This model allows us to create a so-called external module for every secure element, which contributes to an excellent encapsulation of SDKs provided by the chip vendors. Furthermore, it includes an open-source implementation of the Thread protocol called OpenThread [26], which uses MbedTLS as a cryptographic software library. Although the OPTIGA™ Trust X can handle the complete DTLS handshake on its own, in our setup, the DTLS handshake is handled by MbedTLS. Thus, all secure elements are used as external cryptoprocessors for all security and performance-critical computations, which results in a better comparison between secure elements.
MbedTLS (version 2.7.0 or later) allows alternative cryptographic implementations [27], so-called hooks, by packing the definitions of various functions into include guards as displayed by Listing 1. The following configuration options have been defined to hook the desired computations with the evaluated secure elements:

CoAPs Server
The CoAPs server running on the OpenThread border router is based on Eclipse Cali-fornium™ [28]. This CoAP framework is written in Java and compliant with RFC7252 [4]. By using the build automation tool Maven [29], Californium™ can be easily obtained and used in a project. The Californium (Cf) core implements the basic CoAP functionality, and the additional Scandium (Sc) submodule complements this with the necessary security capabilities (like DTLS 1.2 and key storage) to set up the desired CoAPs server. The server is provided with a PKCS#12 [30] archive file in which all required certificates and the server's private key are stored. Figure 6 shows how a Keysight N6705B power analyzer is used to determine execution times and power consumption. It supplies the nRF52840 MCU with 3.3 V which in turn supplies the connected secure element. It is used to measure the supply current of the MCU and the secure element. Furthermore, it measures the voltage of the supply and an auxiliary GPIO of the MCU (left side of the MCU directly connected to a voltage measurement channel of the power analyzer), which signals when the individual cryptographic primitives and the DTLS handshake are currently being executed. This setup does not just allow the acquisition of the execution time, but also the energy consumption of the MCU, secure element and both of them combined.

Measurement Setup
Secure Element I2C GPIOs GPIO Figure 6. Conceptual measurement setup of the node for acquiring the execution time and energy consumption during the stated test cases.
In order to allow the separation of the MCU supply from the rest of the development kit, it needs to be set up as listed in Appendix A.1. Additionally, Appendix A.2 provides a more detailed explanation of the measurement setup.
To assess the variance, we acquire a large number of measurements by remotely controlling the power analyzer through a Python package. The same package processes the individual measurements and performs statistical analysis.        ECDSA Verify Figure 11. Time and energy measured for verifying an ECDSA signature (secp256r1) relative to the reference implementation. (median of values, rounded to the nearest percent, n = 100). The relative standard deviation (RSD), also known as the coefficient of variation (CV), describes the amount of dispersion with regard to the mean. In contrast to the standard deviation, this allows us to compare data sets with widely different means, as is the case with the measurements of the cryptographic primitives. Equation (1) expresses how the RSD is calculated using the standard deviation σ and the mean µ.

Cryptographic Primitives
The secure elements exhibited very consistent performance as 10.6% (energy consumption of the MAXQ1061 while generating random numbers) was the highest RSD of all the time and energy measurements within the cryptographic primitives. All the remaining measurements even had an RSD < 5%. Even though the values are tightly grouped, they are not normally-distributed throughout. This is mainly because the secure elements are polled at defined intervals via I2C, and thus the end of a primitive is recorded on the next poll instead of the exact time when the primitive finished. Primitives with a very short execution time would require an extremely small polling interval in order to test for normality. For this reason, the figures and tables in this chapter are based on the median instead of the arithmetic mean. However, the remaining statistical values (min, max, mean, standard deviation, and relative standard deviation) are available in the supplementary data (refer to the supplementary materials statement after Section 9).

DTLS Handshake
We noticed during the measurement process that handshake packets were sometimes lost and had to be retransmitted either by the client or the server, thus dramatically increasing the execution time. As an example, Figure 12 shows the DTLS handshake using the software-based MbedTLS reference. Since the focus is not on the reliability of OpenThread, we have removed all outliers that differ from the median by 20 percent or more. Figure 13 shows the corresponding adjusted measurements. We applied the same removal for all handshake measurements.
Identical to the benchmark for the cryptographic primitives, the DTLS handshake benchmark was repeated 100 times. Table 6 lists the remaining amount of valid measurements of the DTLS handshake after the removal of outliers caused by OpenThread.   Figure 14 and Table 7 show the time and energy consumption of the node (secure element and MCU) for the DTLS handshake in percentages relative to the reference implementation using only MbedTLS on the MCU. E.g., the ATECC608B takes 39% of the energy consumed by the reference implementation when performing a DTLS handshake. I.e., substituting the software-based cryptographic operations in MbedTLS with the hardwarebased alternatives in the ATECC608B decreases the energy consumption of the DTLS handshake by 61%. The  Table 6)). As with the cryptographic primitives, we calculated the RSD for all adjusted handshake measurements. The largest RSD resulted from the execution time of the OPTIGA™ Trust M at 3.4%. Again, this is an indication of how tightly grouped the measurements are, and thus only the median is listed in this section, and all the remaining statistical values are available in the supplementary data (refer to the supplementary materials statement after Section 9).

Discussion
Importantly, even though this paper focuses on the performance of secure elements, their use is not limited to improving execution time and energy consumption. When used correctly, they also significantly improve the IoT device's overall security. Furthermore, a potential increase in execution time or energy consumption compared to the pure MbedTLS reference is often justified by the gained security.

Cryptographic Primitives
The results of the cryptographic primitives show that, as expected, no blanket statements can be made about the use of secure elements in regard to execution time and energy savings. However, looking at the primitives where the secure elements generally exhibited a higher performance as the software-based reference (i.e., generating an ECC key pair and ECDSA sign/verify), the performance increase was particularly evident in the execution time. In order to be able to make further statements about the results, the primitives require to be examined individually.
As mentioned in Section 4.1.1, the CTR_DRBG needs to be seeded with a strong entropy source. This source would be either the TRNG within the MCU in case of the reference implementation or the externally connected secure element. By default, MbedTLS reseeds the CTR_DRBG after every 10,000 calls to mbedtls_ctr_drbg_random, which is used to retrieve random data. Furthermore, the prediction resistance can also require the CTR_DRBG to be reseeded if it is enabled. Thus, the internal TRNG or the externally connected secure element will not be polled for random data with every call to mbedtls_ctr_drbg_random. Requesting 32 bytes of random data from the CTR_DRBG without reseeding only requires 0.246 ms and 2.76 µJ, which significantly reduces the average time and energy required for getting random data. Of course, one has to decide on the tradeoff between reducing the energy consumption but still reseeding the CTR_DRBG frequently enough in order to keep the prediction resistance high. It is important to note that in contrast to most TRNGs in the secure elements, the TRNG in the nRF52840 is not compliant with any certificate. Thus, the higher time and energy consumption is clearly justified as strong sources of randomness are at the foundation of information security.
Using a secure element for calculating the SHA-256 over 32 bytes requires vastly more time and energy than the reference implementation. Arguably, calculating the hash on a secure element does not improve the overall security of an IoT device as hashing does not involve any cryptographic secret or key. Thus, the data might as well just be hashed internally.
Looking at the three primitives containing asymmetric cryptography, we see that secure elements shorten the execution time considerably in 13 out of 15 measurements while decreasing the energy consumption in 12 out of 15. Depending on your specific application, it is possible to either significantly improve or degrade the performance in terms of execution time and energy consumption. Therefore, a suitable secure element must be carefully evaluated and selected. This necessity reinforces the need for quantitative comparisons, as presented in this paper, so that developers can improve their application's overall security without sacrificing performance.

DTLS Handshake
All secure elements noticeably save time when compared to the reference implementation. This saving is due to the MbedTLS hooks introduced in Section 4.5.1, which allow the ECDSA signature creation and verification to be offloaded to the secure elements. As observed with the cryptographic primitives, ECDSA computations benefit significantly from the hardware acceleration supplied by the secure elements. Furthermore, storing the sensitive key pairs and certificates in the tamper-resistant memory of a secure element, rather than in the flash of an MCU, can significantly improve protection. Although generating random numbers with the MCU's internal TRNG was faster and more energy-efficient, we still added this as a hook for the DTLS handshake to make use of the secure elements' certified TRNGs. Since this hook is only needed for reseeding the CTR_DRBG, the negative impact on time and energy is negligible and therefore clearly justifiable. Moreover, we did not hook the computation of hashes such as SHA-256 since this neither improves performance nor security. Finally, cipher suites using the elliptic curve Diffie-Hellman ephemeral (ECDHE) key exchange require an ECC key pair to be generated and subsequently used in a key exchange for every handshake. MbedTLS again provides hooks to offload these two computationally-expensive computations to the connected secure element. Unfortunately, it turned out that we could not simply implement these hooks for one of the secure elements and then just duplicate the code with some minor modifications, mainly because their API structures are vastly different. While one requires a session (between the secure element and the host MCU) to be opened, another does not allow the import of arbitrary public keys without sticking to a strict key importation process. The ECDHE key exchange could not be implemented for the MAXQ1061 and the SE050 without changes to MbedTLS itself. This should not be a problem in and of itself but was considered out of scope for this project. Therefore, further significant time and energy savings can still be expected for these two secure elements.

Performance Comparison
The Microchip ATECC608B was the most energy-efficient secure element in 3 out of 5 tested primitives when compared to the other secure elements. Especially with resourceconstrained devices, this can be a huge selling point. Although it is important to note that it was never the fastest one, and a complete DTLS stack needs to be present in the MCU for performing handshakes as it cannot handle the handshake on its own. This requires a substantial amount of memory and might end in selecting an MCU with more memory than otherwise needed.
Interestingly, even though the Infineon OPTIGA™ Trust M is the successor of the OPTIGA™ Trust X, it didn't outperform its predecessor as their results were either tied or only marginally different. Therefore, the selection between these two secure elements should be purely based on their respective feature set.
The Maxim MAXQ1061 requires the most energy out of all secure elements while performing asymmetric cryptography. On the other hand, it is more than twice as fast as the second-ranked secure element when generating random data. Additionally, the MAXQ1061 can be connected using SPI with a clock speed of up to 4 MHz (20 MHz for when using the AES stream encryption engine) which is the fastest connectivity out of all the secure elements. As data transmission is a significant portion of the execution time, this provides a significant improvement of the energy consumption as well. Furthermore, the newly released MAXQ1065 has an extended feature set by adding a physically unclonable function (PUF), extended DTLS support, etc. More importantly, it should be able to perform the tested primitives with significantly less energy given that the complete communication can run over SPI with a maximal clock speed of 20 MHz and an active supply current of 5 mA compared to the 17 mA of the MAXQ1061.
Lastly, the NXP SE050 was clearly the fastest secure element for ECDSA signing and verifying. Additionally, it was never the best but also never the worst regarding energy consumption, which earns it the all-rounder title.

Feature Set and Development
From a developer's point of view, the OPTIGA™ Trust M and OPTIGA™ Trust X allowed for the most straightforward integration. This is mainly because they provide a lot of publicly available resources like extensive documentation, examples, and ready-to-use driver implementations (e.g., Nordic nRF5 SDK, Zephyr, . . . ). Furthermore, the same software framework can be used for the OPTIGA™ Trust M as well as the OPTIGA™ Trust X without any changes, which makes switching between them extremely easy. Currently, version 1 (V1) and version 3 (V3) of the OPTIGA™ Trust M are available, with the V3 having additional features (i.e., AES, HMAC, HKDF, . . . ). Unfortunately, the OPTIGA™ Trust M V1 and OPTIGA™ Trust X used for this project don't provide a feature for loading externally generated private keys into the secure element, and thus have to be generated internally. In contrast, the OPTIGA™ Trust M V3 is capable of such a protected key update. Although, not being able to import private keys is arguably a desirable restriction from a security standpoint as it forces the private keys to never exist outside of the secure element in the first place. However, during development or in the case of a project like this one, this is rather cumbersome as we cannot just create a key pair and a corresponding client certificate once and load it into the secure elements.
Even though the ATECC608B performed the best regarding energy consumption, it has even stricter setup requirements as it has to be configured, provisioned and then locked before even the least security-critical functions like getting random data can be executed. This constraint results in a relatively rigid development process. However, it increases security by preventing human errors (e.g., locking cannot be forgotten if the device is subjected to a functional test as it simply would not work without it being locked). In contrast, other secure elements can be deployed without locking them.
Even though the MAXQ1061 required significantly more energy for the asymmetric cryptography compared to the ATECC608B, it provides considerably more flexibility during development in regards to not having to lock the provisioned data before being able to execute anything. It features a life-cycle model with 4 states (Delivery, Initialized, Operational, Terminated) and four security conditions (Anyone, Admin, Host, Secure boot), otherwise known as roles. Finally, for any given life-cycle state, an object in the internal memory can be assigned any combination of the five object access rules (Read/Export, Write/Import, Execute, Generate, Delete) and the previously mentioned security conditions. This culminates in the most extensive role-based access model compared to all the other secure elements. Furthermore, 32 kB of user-programmable EEPROM storage is significantly more flexible than other secure elements where the memory is already partitioned in fixed slots.
The SE050's access model might not be as extensive as the MAXQ1061's. However, it features up to 50 kB of user-programmable storage, which should satisfy all secure storagerelated needs of any given project. Moreover, the SE050 has the largest feature set of all evaluated secure elements regarding algorithms, supported elliptic curves, and bit lengths. It features another I2C peripheral where it acts as a master in order to read and write sensors securely. NXP provides extensive API documentation and corresponding samples, which speed up the development immensely.

Which Secure Element Is the Right One for the Job?
The measurements show that the fastest secure element for any given task is not necessarily the most energetically favorable. The same observation also holds in the other direction, where the most energetically favorable secure element is not necessarily the fastest. Furthermore, each secure element achieves the first or second rank in execution time or energy consumption for at least one cryptographic primitive. A look at the general description of the secure elements also shows that their feature set does not allow any conclusions to be drawn about their performance or certification.
These are essential findings and underline our motivation for this project. This is why it is so important for developers to have easy access to information and comparisons, because depending on the project, a different secure element may be the best choice to achieve the optimum in terms of performance and energy consumption.

Key Findings
Working with secure elements has shown their great potential and need in the IoTindustry. Secure elements support resource-constrained devices by performing cryptographic algorithms in hardware and providing tamper-resistant storage. Especially in asymmetric cryptography, we were able to improve time and energy efficiency compared to the software-based reference implementation. Conversely, the generation of random numbers resulted in a lower performance independent of the used secure element. However, it is imperative to note that some of the secure element TRNGs are certified, which is not the case for the internal TRNG of the nRF52840. Thus, their use is quite justified despite lower performance.
Furthermore, we strongly advise against hashing with secure elements because the time and energy consumption exceeds the reference by several orders of magnitude. Especially as there is also no increase in the security level. In contrast, the secure storage of sensitive material, such as private keys, long-lived session keys, and root-of-trust certificates, provides an opportunity to increase the overall security of embedded devices significantly.
The results of the measurements clearly show that no general statements can be made as to which is the best or the worst performing one. Other metrics like feature set, development experience, and cost must also be considered for this decision because they might justify a slightly lower performance in some cases. The specific use case doesn't just dictate which metric is more important but also what is "better" or "worse". e.g., one would intuitively define a larger feature set as the better one. However, a project might require some lesser-known curve or algorithm that just happens to be implemented by a secure element with an otherwise significantly smaller feature set. Fortunately, the measurement data in this paper, together with the respective data sheets, should allow developers to make an informed decision.
It all comes down to the availability of extensive and-most importantly-publicly accessible information for developers to find a suitable secure element for their specific applications. Until recently, this was made much more difficult because NDAs had to be signed for accessing the documentation and SDK of each secure element. Fortunately, more and more manufacturers are moving away from such prohibitive practices, and at the time of publishing this paper, you only had to sign an NDA for one out of five chips. We strongly believe that detailed and publicly available information significantly lowers the threshold for using secure elements.

Related Work
As outlined in Section 2, we cannot find any quantitative comparisons of secure elements even with extensive searches. Works that provide performance figures mostly compare a single secure element against other types of hardware acceleration. For example, P. Kietzman et al. [31] compared the performance of the ARM CryptoCell 310 (contained in the nRF52840) to the ATECC608A with various symmetric and asymmetric algorithms. They obtained comparable time and energy consumption results during ECC key generation, ECDSA signing and verifying (secp256r1) to our measurements with the ATECC608B. In another publication, P. Kietzmann et al. [32] provide a guideline on PRNGs in the IoT by comparing various aspects (like statistical properties, execution time, energy consumption, and memory overhead) of software-and hardware-based solutions. Unfortunately, their results using the ATECC508A (predecessor of the ATECC608A) cannot be directly compared to ours as these two secure elements do not contain the same RNG. According to Microchip's application note AN2589 [33], "the ATECC608A includes an enhanced high-quality cryptographic random number generator [. . . ]" when compared to its predecessor.
B. Pearson et al. [34] acquired a faster ECDSA signature generation (−4.2%) using the ATECC608B and secp256r1 and ECDSA signature verification (−19%). However, it is unclear if they start the time measurement before calling the first ATECC608B API function and stop the time after the last API call returns, or if they measured the time between the first and last I2C packet. This point would, of course, explain the observed differences. Moreover, the listed MCUs (Texas Instruments CC3220SF, Espressif ESP8266, Espressif ESP32) have maximum clock speeds of 80, 160, and 160 MHz, respectively. This is consistently higher than the nRF52840, which could lead to faster execution of the API calls and thus result in shorter execution times. R. A. Nofal et al. [35] conducted extensive measurements of the TLS handshake and record layer. Amongst other things, they measured the execution time and energy consump-tion of ECDSA signing and verifying (secp256r1) on a BCM4343W which is also an ARM Cortex M4 MCU. As cryptographic libraries, they used MbedTLS, as well as the micro-ECC (µECC) library [36]. Interestingly, our implementation using MbedTLS on a slower MCU resulted in a 56% faster ECDSA sign and 26% faster ECDSA verify operation. Conversely, their µECC implementation is even faster (signing 56% and verifying 85% faster). E. Nascimento [37] and T. Silde [38] concluded independently that µECC's side channel protections are not properly documented. Additionally, T. Silde states that further security measures for differential power analysis are required to be on par with MbedTLS.

Conclusions & Future Work
Secure elements can support resource-constrained devices in terms of security by providing tamper-resistant memory, a variety of cryptographic algorithms, and the on-chip generation of key pairs with the private key never leaving the secure element. Thus, they allow increasing the overall security of an IoT device. Unfortunately, there are few to no quantitative comparisons between secure elements from different manufacturers as they have been reluctant to publibly release information about their devices' hard-and software. This paper provides detailed measurement results of cryptographic primitives and a real-world application use case executed on five different secure elements to combat this information gap. The measurement results indicate that secure elements can improve the battery life of embedded devices depending on the respective execution time and energy consumption of the cryptographic algorithm. Furthermore, the paper presents detailed information about the evaluated secure elements to supply developers with additional information to select a suitable secure element for a given application.
The benchmark used in this paper tests only a few primitives and a single real-world use case. Since these were chosen strategically, they already provide much information about the performance of the secure elements. Nevertheless, future work should significantly expand the benchmark in order to supply the developers with a more extensive overview of the available secure elements. For example, the already used primitives should be executed for various data lengths where applicable (e.g., random generation and SHA-256), and other curves could be used for the asymmetric primitives (e.g., secp256k1). Moreover, primitives that were not tested in the current benchmark, such as message authentication codes (MACs) or symmetric ciphers, should also be included. Finally, secure elements from other manufacturers should also be added in order to be able to create an even more comprehensive benchmark.

Data Availability Statement:
The processed data presented in this study is available in the supplementary material attached to this article. Contact the authors directly for requesting access to raw measurement data.

Abbreviations
The following abbreviations are used in this manuscript:   VEXT -> nRF P0.03 100 kΩ pull-down to GND The host MCU resets the OPTIGA™ Trust M, OPTIGA™ Trust X, and MAXQ1061 using their dedicated RESET pin. On the secure element shield, the RESET pins of the secure elements are connected to the RESET pin of the Arduino pin header. Lastly, this connects to P0.18 of the nRF52840 which is also the RESET pin of the MCU. In order for the MCU to be able to independently control this pin, the reset button (SW5) needs to be disconnected with the modifications listed in Table A3.  Figure A1 illustrates how the Keysight N6705B power analyzer is connected to the secure node. It supplies the nRF52840 MCU as a constant voltage source with 3.3 V over channel 1 while simultaneously monitoring the voltage and current of the MCU with the same channel. Channel 2 is configured as a voltmeter and is connected to the additional GPIO (P0.03) which signals when the individual cryptographic primitives and the DTLS hanshake are currently being executed. The rest of the DK (on-board JLink probe, etc.) is supplied over USB (J2) as this would not be present on a custom board. Thus, it is supplied separately in order to prevent any erroneous energy measurements of the MCU. The DK contains a regulator that provides a 3.3 V rail which is then passed through channel 3 configured as an ammeter and finally used to power the secure element.  Table A4 contains the versions of all the libraries, frameworks, and SDKs used in this project. They were further modified to solve incompatibilities and add missing functionality.

Appendix C.2. DTLS Handshake
Tables A8 and A9 list the time and energy required for executing the DTLS handshake. They are constructed in the same way as Tables A5 and A6, with the percentages again being relative to the reference (column "MbedTLS"). Table A8. Absolute execution time (abs.) of the node (MCU) during the DTLS handshake using MbedTLS (reference) and absolute execution time of the node (MCU and secure element) during the handshake using the secure elements with percentages relative (rel.) to the reference (median of values, rounded to 3 significant digits, n = (see table 6)).

Time [s]
MbedTLS  Table A9. Absolute energy consumption (abs.) of the node (MCU) during the DTLS handshake using MbedTLS (reference) and absolute energy consumption of the node (MCU and secure element) during the handshake using the secure elements with percentages relative (rel.) to the reference (median of values, rounded to 3 significant digits, n = (see table 6)).