SoftME: A Software-Based Memory Protection Approach for TEE System to Resist Physical Attacks

,


Introduction
The development of the Internet of Things (IoT) [1,2] is hailed as the third wave of world information development after computers and the Internet.Embedded systems play a major role in the information processing of IoT.Embedded systems emerged in the 1970s and have been applied to various fields.They have high requirements for security and real-time [3].Human beings are exposed to more and more smart devices in our daily lives, most of which are embedded systems.These devices store valuable personal information, making them the target of attackers and resulting in software attacks and memory data leakage accidents from time to time.To enhance the security of embedded systems, ARM proposed the TrustZone [4] security extension to its CPU architecture, which provides resource access control and memory isolation to protect sensitive data.ARM TrustZone technology plays an important role in information protection and has important applications on mobile embedded devices [5][6][7].It divides hardware resources into a secure world and a normal world and builds an isolated trusted execution environment (TEE) for applications to protect trusted applications from compromised operating systems and applications [8].
As the value of the data stored in embedded devices increases, effective physical attack methods have emerged.From a cost perspective, inexpensive physical attacks allow attackers to launch attacks more easily, making them more serious practical threats to embedded systems.Some embedded devices are easily lost, such as mobile phones and personal computers, and the working environment of some embedded devices is unsupervised, which makes embedded devices more vulnerable to physical attacks.Such attacks mainly refer to cold boot attacks, bus monitoring attacks, and DMA (direct memory access) attacks.A cold boot attack obtains data by attacking physical memory.A bus attack can obtain data transmitted on the bus, and a DMA attack allows an attacker to use DMA interface of the device to obtain data.Attackers extract and analyze the data in the memory to make illegal profits, making research on the security of embedded systems critical.Unfortunately, TrustZone does not enforce memory encryption, so it cannot resist above physical attacks [9].Therefore even if sensitive information is stored in physical memory protected by TrustZone, an attacker can gain valuable information through inexpensive physical attacks.A solution to protect TEE against physical attacks is required.
In response to these issues, we propose SoftME, an approach to protect the confidentiality and integrity of sensitive applications.In our approach, we use TrustZone technology to allocate the on-chip memory space to the secure world, and execute TEE OS on the on-chip memory to protect against cold boot attacks.On-chip memory communicates with the core via the on-chip bus, so it can resist bus attacks.We use data encryption to protect the security of data transmission and storage off-chip.More specifically, the data is encrypted on the on-chip memory before being written back to the off-chip memory.That is to ensure that the data is always in the form of ciphertext when transmitted off-chip.We also design task scheduling for encryption and task execution, enabling parallel execution of tasks on a single processor.Compared to existing solutions, our approach has the following advantages.On the one hand, our design is based on software, without the need for additional hardware support.On the other hand, our approach does not need to modify the applications, and they can be executed directly.We implemented our prototype on the development board supporting TrustZone and evaluated its performance through experiments.The experimental results show that our approach improves the security of embedded systems and there is no significant increase in performance overhead.
In summary, we make the following contributions: (i) We propose SoftME, an approach that uses the onchip memory to defend against physical attacks.This approach allocates the on-chip memory space to the secure world of TrustZone, with no additional hardware support and no need to modify applications.
(ii) We protect the confidentiality and integrity of offchip data by authenticated encryption algorithm and guarantee the fairness of the encryption process and task execution process through task scheduling.
(iii) We implement our prototype system on a physical development board supporting TrustZone, and the experimental results show that our approach improves system security and does not significantly increase system overhead.
The rest of the paper is organized as follows.Section 2 presents related work.Section 3 introduces background knowledge.Section 4 discusses our threat models and assumptions.Section 5 describes our design in detail and the security analysis of our approach.The implementation is described in Section 6, followed by the performance evaluation in Section 7. Finally, Section 8 concludes this paper.

Related Work
An attacker can obtain sensitive information in an embedded device through physical attacks.In order to resist physical attacks, the academic field has proposed various security solutions.This section briefly introduces these solutions, which can be divided into hardware assistance and software enhancements.
2.1.Hardware Assistance.As early as 2000, the Stanford University Computer Systems Laboratory implemented an execution only memory in the form of the hardware that supported internal compartments, and the compartments could not access each other [10].In addition, in [11], the authors proposed a hardware-based memory protection scheme which protected the integrity of a single-core processor and the confidentiality of multiprocessor shared memory.
In [12], they described the hardware mechanism of the SecBus project.The project used a separate hardware module to implement encryption protection, designed to protect the memory bus against bus attacks.To achieve the on-chip memory encryption, the Dartmouth College research team designed the Bear microkernel operating system [13,14].The Bear operating system provided an encryption mechanism using a security-enhanced processor to process data that appeared outside of the processor.In [15], the authors used a dedicated hardware detector to detect and prevent malicious attacks.
Hardware-assisted protection scheme has an advantage in performance; however, the disadvantages of integrating hardware components are that they occupy the space of embedded devices and increase energy consumption.Our approach relies on the on-chip memory which is available on many embedded development boards, and our approach is based on software to defend against physical attacks.

Software Enhancements.
For embedded systems, sensitive data can be protected by encrypting memory.Hong D et al. proposed three different encryption methods for processing different sensitivity data.DynapoMP [16] was the first method to consider dividing the on-chip memory into different regions.For the problem of excessive memory encryption overhead, Papadopoulos P et al. implemented a secure memory allocator (s malloc) [17] to allocate any size of memory from the heap dynamically, and any data written to this part of the memory would be encrypted.In addition, in [18,19], the authors enhanced data security by redesigning the operating system.In [20], they proposed processormemory bus encryption using the technique of locking cache.Similarly, in [9], Zhang N et al. also created a cache-based independent execution environment by locking the cache.In [21], the researchers proposed SecureME to hide data from compromised operating system and built a secure computing environment for applications.In [22], the authors proposed vTZ, a solution for virtualizing TrustZone.It provided a secure and isolated execution environment between guest TEEs.For physical attacks, Guan et al. proposed Copker [23], an encryption engine implemented inside the CPU, and they proved that it provides a secure service against cold boot attacks.In [24], the authors proposed Loop-Amnesia, a disk encryption technology that eliminated cold boot attacks.The mechanism is a kernel-based software design that provides protection for encryption keys in RAM and CPU.Most of above designs require modifications to the operating systems and cannot be arbitrarily ported to other hardware platforms.One of the advantages of our approach is that we have implemented it in software without any modification to operating systems and it can be used to protect data within an acceptable overhead to the system.

Background
In this section, we will introduce the background knowledge related to our work.We will introduce the TEE architecture and ARM TrustZone first, and then we will discuss the onchip memory architecture of ARM processor.

TEE and ARM TrustZone.
Trusted execution environment (TEE) is proposed by GlobalPlatform.It is a trusted computing environment to provide security services to applications [25,26].TEE architecture is shown in Figure 1.The rich execution environment (REE) provides the execution environment for most functional applications, forming a dual operating system architecture with TEE [27].In the dual operating system architecture, the GPOS and the TEE OS are executed on the same hardware platform, and hardware isolation based technology forms a secure execution environment for TEE OS.GPOS provides general functions to meet the functional requirements of the system, while TEE OS provides protection to trusted applications.
TrustZone is a set of secure hardware extension mechanisms for ARM processors to build an isolated computing execution environment for trusted applications [28].ARM TrustZone technology reconfigures the processor, using the TrustZone Address Space Controller (TZASC) and Trust-Zone Protection Controller (TZPC) to divide the hardware resources into two separate parts, called the secure world and the normal world.In order to switch between the two worlds, ARM processor adds a new mode, monitor mode.The software running in this mode is called monitor, which is responsible for saving the context and restoring the state being switched while switching the worlds.In the CP15 coprocessor with the ARM processor, there is a Security Configuration Register (SCR) with an NS bit that indicates the current state of the processor.The secure world and the normal world can be switched by setting the NS bit in the monitor mode.If the NS bit is 1, it means that the current processor state is the normal world, and if it is 0, it means that the current processor state is the secure world.The processor can enter the monitor mode from the normal world by calling SMC (Secure Monitor Call) instruction or hardware interrupts.

On-Chip Memory
Architecture.The application of the cache greatly improves processor performance, but it has unpredictable data access time.The emergence and widespread use of the on-chip memory solves this problem [29].On-chip memory (OCM) is a general term for static random-access memory (SRAM) that is integrated into the chip for non-cache use.Compared with cache, OCM has the advantages of low power consumption, high performance, and small footprint.Banakar R et al. show that OCM reduces power consumption by 40% and the on-chip area by about 34% compared with the cache of the same capacity [30].A typical embedded system with OCM is shown in Figure 2. The OCM and the cache communicate with off-chip memory through an off-chip data interface.Unlike off-chip memory, the on-chip memory and core communicate through the on-chip bus, so the on-chip memory is more secure than the off-chip memory.
In addition, OCM has a separate address space, and its address space is often mapped to the memory space.In this case, the entire address space is divided into two parts, OCM only occupies a small part, and the rest is still allocated to the off-chip memory.Therefore, the access from processor to OCM can be directly performed the same as the offchip memory.Accordingly, the access mode of the on-chip memory greatly improves the speed of data reading and writing without going through the caching process.As a result, more and more embedded processors are beginning to integrate OCM on-chip to improve system performance.

Threat Model and Assumptions
To crack embedded systems to obtain valuable information, attackers have designed a variety of attack methods in the embedded system hardware level.In this section, we will Security and Communication Networks describe the threat model related to our design firstly and then introduce the assumptions of our design.

Threat Model.
In our work, we focus on physical memory leaking attacks, such as cold boot attacks, bus monitoring attacks, and DMA attacks.For physical side-channel attacks, they require specialized protection techniques, and we do not consider this kind of attacks in this paper.We also do not consider physical attacks against the SoC, such as invasive attacks.This kind of attacks is targeted at information inside chips and requires specialized and expensive attack equipment, such as laser cutting system, microprobing station, oscilloscope, and focusing ion beam workstation.Only knowledgeable attackers can perform such attacks.So performing such attacks is quite expensive.The value of this equipment may far exceed the value of the targets being attacked.The attack process is also complicated, and even an experienced attacker may take several months, resulting in higher attack costs [31].Therefore, attacks such as physical side-channel attacks, code injection attacks, and other complex physical attacks are out of the scope of our research.

Cold Boot Attacks.
Cold boot attacks are a new type of physical attacks, which have become a part of many popular security threat models.In a cold boot attack, the attacker utilizes the data remanence effect of the memory to obtain the key and valuable information stored in the off-chip memory [32].Experiments have shown that after the device is powered off, the data on the DRAM does not disappear immediately but will remain for a while [33].Measurements of DDR1 and DDR2 demonstrate a correlation between temperature and RAM remanence, indicating that even if the surface temperature of the RAM module is slightly cooled by 10 ∘ C, the remanence effect is significantly extended [34].One way to launch the cold boot attack is that an attacker utilizes this period of time to physically remove the DRAM from the target board and put it into the device that the attacker has prepared in advance to read the content in the DRAM.Another method is to restart the target board into an operating system controlled by the attacker to output memory content.Some research groups have successfully performed cold boot attacks on Android smartphones and retrieved private information such as encryption keys, address books, and photos from RAM [35].

Bus Monitoring Attacks.
Buses are the crucial information transmission channels between various functional components, which have become the primary targets of attackers.Typical examples of such attacks are bus snooping attacks and bus tampering attacks.For example, an attacker can mount an FPGA board to the bus of an embedded system.By configuring and controlling the FPGA, the attacker can steal or even modify the data transmitted on the bus, thereby disrupting system execution.
Another typical example for bus monitoring attacks is in [36].In the original Xbox game system, keys were stored in plaintext and transmitted over the South Bridge bus.The attackers exploited bus snooping and injecting to capture or insert information in the bus between system components to obtain the keys and decrypt the secure bootloader, thereby destroying the system trust chain.Subsequently, the attackers developed a low-cost chip that could be soldered to the game system bus, allowing users to bypass the security monitoring mechanism to play pirated games.4.1.3.DMA Attacks.By configuring the devices that can use the DMA port, an attacker can bypass software security mechanisms and directly read the physical memory.One solution to such attacks is to utilize the IOMMU [37].The operating system can program the IOMMU to limit the range of memory that the DMA device can access and even deny the DMA device access to memory.For devices equipped with IOMMU, IOMMU makes it impossible for malicious devices to access memory by DMA attacks.However, IOMMU is not available for every device.Fortunately, most ARM platforms support TrustZone, and DMA requests from compromised OS can be rejected to protect secure memory.

Assumptions.
We assume that the ARM platform supports TrustZone technology and is equipped with the onchip memory.On-chip memory is trusted to protect against physical attacks such as cold boot attacks, while off-chip memory and peripherals are untrusted.We also assume the existence of a device key in the SoC and it is trusted [38], such as KNOX's Device-Unique Hardware Key (DUHK).

System Architecture and Design
In this section, we will introduce the design of SoftME (a Software-based Memory Encryption protection approach) in detail based on the previously discussed techniques and assumptions.Finally, we will make a security analysis of our approach.
5.1.System Architecture.In our approach, we use the system architecture described in Figure 3. TrustZone divides hardware resources into two worlds, a secure world and a normal world.The memory of the secure world consists of two parts, one is the on-chip memory space and the other is a small portion of the DRAM.TEE OS and the monitor run on the on-chip memory, and the secure DRAM is used to store encrypted trusted tasks.Most of the space in DRAM is allocated to the normal world for GPOS.The monitor runs in monitor mode and is responsible for handling hardware interrupts and switching between the two worlds.Due to the resource isolation mechanism of TrustZone, GPOS cannot access devices and resources of the secure world, such as IO peripherals and memory, so TEE OS will not be affected by compromised operating systems and applications.For communication between the two worlds, we allocate a small piece of memory on the off-chip memory as shared memory.We design a task scheduler and a memory protection engine (MPE) in the TEE OS.The task scheduler is responsible for scheduling multiple tasks to ensure fair execution.The memory protection engine is used to decrypt or encrypt tasks while reading data from the on-chip memory or writing data to the off-chip memory to ensure data confidentiality and integrity.

Memory Protection Engine Workflow.
In order to prevent sensitive information from being stolen and tampered with, we must ensure the confidentiality and integrity of the data stored on the DRAM.Therefore, we design a memory protection engine in the TEE OS: when a task needs to switch from OCM to DRAM, the memory protection engine encrypts and protects the integrity of the data; when a task needs to be loaded from DRAM to OCM, the memory engine decrypts it and performs an integrity check.The specific process is shown in Figure 4.The execution process of a task in SoftME includes three phases: the loading phase, the 6 Security and Communication Networks activation and execution phase, and the switching phase.Next we will give a detailed design description of these three phases.
(a) The Loading Phase.The loading phase corresponds to step 1 of Figure 4.This phase is to load the task from offchip memory to the on-chip memory and decrypt it.Before the loading phase, we assume that the task on the offchip memory has been encrypted.We use the authenticated encryption (AE) algorithm to protect the confidentiality and integrity of the task.The AE algorithm combines the message authentication code and the encryption algorithm to ensure the confidentiality and integrity of the data.Encrypting plaintext can ensure the confidentiality of the data and integrity authentication can verify whether data has been tampered with [39].The generation of the unique key K  for each task and the encryption process of the task will be described in the third phase.At this phase, we assume that the key K  , ciphertext C, initialization vector (IV) IV, and the tag value T are already known.The tag value is the output parameter of the encryption process and is used to verify the integrity of data.
The description of the decryption process is shown in Table 1.The decryption algorithm is represented by the symbol AE', and the encryption algorithm is represented by AE.The ciphertext is the encrypted task read from the off-chip memory.The input parameters of the process are ciphertext C, IV, K  , and T. If the authentication to the ciphertext is successful, then the plaintext is output; otherwise the symbol FAIL is returned.If the authentication fails, it indicates that the ciphertext has been tampered with by an attacker, and the task cannot be recovered.After obtaining the plaintext of the task, we put the task to the ready queue.
(b) The Activation and Execution Phase.The activation and execution phase corresponds to steps 2-4 of Figure 4.In this phase, a task is activated by the operating system and execution begins, just like the normal task execution process.In this phase, the task runs on the on-chip memory.On-chip memory is non-cached SRAM, so it can be read directly by the processor.
(c) The Switching Phase.The switching phase corresponds to step 5 of Figure 4.This phase includes encrypting the data after a task has been completed and storing the ciphertext to DRAM.
Each task has a unique key K  .As mentioned earlier, we assume that the device key K  exists and is trusted.We use K  to derive K  via the HMAC-based Key Derivation Function (HKDF) [40].The HKDF algorithm is a key generation function based on the Hash-based Message Authentication

Code (HMAC)
. There are two input parameters for HKDF, a device key and a message.We use the taskID created in the second phase as a message.Since each task has a unique taskID, the key K  for each task is unique.The device key is highly confidential, and only the kernel has permission to read and operate it.Even if one task obtains the taskID of some other task, it cannot operate the device key, so it cannot obtain other task's key.The encryption process in this phase is shown in Table 2.The input parameters of the process are the key value K  , the plaintext P, and the initialization vector IV.The initialization vector is a random number with a fixed length, which is generally 16 bytes.The initialization vector value is incremented by 1 before each encryption.The GCM specification states that IV does not require randomness, but requires that the same key does not use the same IV [41].We increment the IV by 1 to get different IV values.The plaintext is the task that the memory protection engine reads from the on-chip memory after the task execution is completed.The output values are the ciphertext C and the authentication tag T. After all these steps are completed, the ciphertext will be swapped to DRAM for storage.The tag value, key, and IV of a task will be stored on the on-chip memory.

Scheduler Workflow.
In order to achieve fair scheduling for multitasking, we use the method of time slice polling to design the task scheduler.When a task is created, it will be given a priority, such as high priority, medium priority, or low priority.Tasks in the ready state with the same priority form a ready queue.Take the high-priority ready queue as an example.The workflow of the scheduler is shown in Figure 5. Assume that there are four tasks in the high-priority queue.Currently, task1 is in the running state, and the other three tasks are in the ready state.Suppose our time slice polling time is set to 1ms.During the time period of 0-1ms, task1 is running.At the time of 1ms (assuming that the time of state switching is tiny enough to be ignored), the scheduler sets the state of task1 from the running state to the ready state and places it at the end of the ready queue.Therefore, task2 gets the CPU resources, starts running, and so on.The task scheduler ensures that each task has the same fair execution time.Therefore, even if the CPU is single-threaded, at the high level, it still implements multiple tasks.

Security Analysis. Sensitive data is valuable to attackers.
It is possible to be attacked by an attacker while the task is running and data is being transferred and stored.In this section, we will prove that SoftME is able to resist the physical attacks listed in the threat model.
Cold boot attacks.Operating systems and applications are vulnerable to cold boot attacks at runtime.In our approach, we run TEE OS and monitor on the on-chip memory.Compared to the off-chip memory, the on-chip memory has the following two features to prevent cold boot attacks.First, the cold boot attack needs to restart the device for subsequent attacks.For the on-chip memory, no matter for how long, after the device is powered off or restarted, the firmware on the board will initialize the on-chip memory and clear all its contents immediately [42], so the attacker could not get the confidential contents on the on-chip memory.In contrast, at room temperature, the off-chip memory can retain a portion of the content (0.1%) even after two seconds when the device is powered off.Second, the cold boot attack is launched when the bootloader of GPOS is started.At that time, the processor state has been the normal world, so the malicious code of the cold boot attack runs in the normal world and it cannot tamper with the on-chip memory, which has been partitioned to the secure world.To sum up, the on-chip memory will not be attacked by cold boot attacks.
Bus monitoring attacks.The on-chip memory is also secure against bus monitoring attacks, because sensitive data never leaves the on-chip memory and it is not transmitted over any exposed off-chip bus.However, the on-chip memory space is limited, it is impossible to store all sensitive data, and a part of sensitive data has to be stored in off-chip memory.For off-chip sensitive task, it can be protected by encryption.SoftME is designed to ensure that data does not appear in plaintext on the off-chip memory or on the bus.The authenticated encryption algorithm also generates a tag value while generating the ciphertext.This value is stored on the chip and used to perform integrity check on the data during decryption.
DMA attacks.On ARM platforms that support Trust-Zone, DMA attacks are also ineffective.Because DMA reads data directly from DRAM, the data we store on DRAM is processed by encryption.And the on-chip memory is allocated to the secure world, TrustZone will prevent illegal devices from accessing secure world memory through the DMA interface.

Implementation
In this section, we will detail the experimental environment and implementation of SoftME.
We implement our design on the Freescale i.MX6q SABRE Lite Board.It features Cortex A9 processor at 1 GHz per core, 1 GB of 64-bit wide DDR3, and 256K onchip memory.The bootloader for hardware initialization and system boot is provided by onboard flash.The trusted operating system used for the prototype system is version 1.4.0 of TOPPERS/FMP, the general-purpose operating system kernel is Linux 3.10.53,and the monitor for world switching is SafeG 1.2.4.The serial port UART3 is selected as the information transmission port.

Memory Isolation.
We build the SoftME architecture described in Figure 3 on the development board.Figure 6 shows the memory address arrangement of the prototype system.The memory system of the platform consists of two parts, on-chip memory and off-chip memory (DRAM).On-chip memory is all assigned to secure world, running the lightweight trusted embedded operating system TOP-PERS/FMP.DRAM is divided into two parts, one for the secure world to store trusted applications and the other for normal world to run Linux.SafeG [43] is used as the monitor for world switching and runs on the on-chip memory.Since FMP, SafeG, and Linux storage space cannot be overlapped, we will store them separately in the memory area, where 0x00900000-0x0091FFFF is used by FMP, and 0x00920000-0x0092FFFF is used by SafeG.0x12000000-0x4EFFFFFF of the off-chip memory space is used by Linux.The memory protection engine and task scheduler are running in FMP.The remaining size of the on-chip memory is calculated by the total size of the on-chip memory minus the size occupied by the TEE OS and the monitor.In our experimental platform, the total size of the on-chip memory is 256K, the size allocated for FMP kernel (including the memory protection engine and task scheduler) is 128K, and the size allocated for the monitor SafeG is 64K, so the remaining free space is 64K.Therefore, the maximum size of trusted tasks can be about 64K.

Port TEE OS to the On-Chip Memory.
According to the above analysis, the off-chip memory is insecure and vulnerable to physical attacks.An important step in implementing SoftME is that the TEE OS should be executed on the on-chip memory.According to the i.MX 6Quad processors reference manual, the physical memory address range of the on-chip memory is 0x00900000-0x0093FFFF (Figure 6), and the entire on-chip memory region can be used freely after booting.We modified the FMP target-dependent configurable parameter value TARGET T OS START ADDRESS to be 0x00907000, which is in the on-chip memory address space range.In addition, the text base address and data base address of FMP are also modified to the physical address range of the on-chip memory.Therefore, after the system boots, FMP will be executed on the on-chip memory.

The Implementation of the Task Scheduler and the Memory Protection
Engine.TOPPERS/FMP is a new generation trusted operating system kernel developed by Japanese TOP-PERS project team that follows the ITRON 4.0 specification.ITRON is a real-time multitasking system specification that has become the Japanese industry standard.In terms of creating and activating tasks, we use APIs that comply with ITRON standards.The code sample for creating and activating the task scheduler is described in Table 3.We use the system static function, cre cyc ( ), to create a task scheduler in the configuration file.An ID will be given to the task when the task is created.sta cyc ( ) is used for activating a task scheduler and the task state management function irot rdq ( ) is used for rotating task precedence.The task scheduler is activated in the TEE OS application to be available.The role of the memory protection engine is to process the sensitive data on the DRAM.After executing on the onchip memory, the task data is encrypted and then written back to the DRAM.The encryption key is derived from the device key.The code sample for creating and activating the memory protection engine is described in Table 4.In terms of implementation, we still use the functions of ITRON to create and activate tasks.cre tsk ( ) is used to statically create tasks in the configuration file, and act tsk ( ) activates tasks.The encryption algorithm we use is AES GCM 128.We build a cryptography library by using Galois/Counter Mode (GCM).GCM is a kind of authenticated encryption algorithm with counter mode and message authentication code.Its performance evaluation has shown excellent performance on many platforms [44].The value of parameter mode is either encrypt or decrypt.encrypt indicates that the process is an encryption process, and decrypt indicates that the process is a decryption process.Owing to the tiny size of the cryptography library we built, we can easily integrate the memory protection engine into FMP and port them to the on-chip memory.However, ITRON has a standard real-time kernel specification for any small embedded system design, including FMP, which results in some problems in the process of importing cryptographic algorithms in our experiment.More specifically, FMP developers use a custom function library specified by the standard; therefore if we directly imported the cryptographic algorithm to an FMP application, library conflict will be thrown.To address this problem, we replace the library file imported by memory protection engine with the FMP library file.

Experimental Results and Analysis
In this section, we evaluate the prototype system and give an analysis of the experimental results.In the experiment, we measured the overhead of the memory protection engine by encrypting different size of plaintext in the prototype system.Then we designed several trusted tasks for evaluation and analyzed the impact of the memory protection on the overhead of these tasks.

Code Modification.
To coordinate the execution of the system, we need to patch the Linux kernel to support the execution of the monitor.The specific modifications are summarized as follows.We modified a total of 26 lines of code for 5 files in the bootloader, mainly some macro definitions related to switching the two worlds and the name of some startup file.A total of 2696 lines of code for 7 files are modified in Linux, mainly some conditional statements related to switching the worlds and SMC call to SafeG.Finally, we modified a total of 263 lines of code for two communication related configuration files in Linux to communicate with SafeG.

Overhead of Memory Protection Engine.
First of all, we measured the overhead of the memory protection engine in the prototype system, with encrypting data whose length ranges from 1K to 8K.As demonstrated in Figure 7, the overhead of the memory protection engine increases almost linearly with the increase of the plaintext size.
Standard deviation is the most commonly used form of quantization that reflects the degree of dispersion of a set of  data.Figure 8 shows the standard deviation of the data we collected.It can be seen that the overhead is variable and not very stable.

Overhead on Multitasks.
In this section, we show the overhead of our approach in terms of the task execution time, the task switching time, and the task preemption time, and we analyze the experimental results separately.
To observe the impact of encryption on the simultaneous execution of multiple tasks, we designed six same tasks for evaluation.Since the time of each time slice of the FMP can be set to a minimum of 1 millisecond, in order to make the experimental results obvious, the execution time of each task we designed is longer than 1 millisecond.
For multiple tasks, we enable the tasks to be executed simultaneously by activating the task scheduler.To observe the impact of the memory protection engine on multitasks, we design two encryption strategies for the memory protection engine.The first strategy is that all the tasks are encrypted by only one memory protection engine.In this case, all tasks data cannot be encrypted at the same time and may generate waiting time.In the second strategy, multiple tasks are managed by multiple memory protection engines.That is, when the system creates a task, it also creates a memory protection engine accordingly.Owing to the tiny size of the memory protection engine, it will not have a significant impact on the system performance.In the experiment, each case is compared with the corresponding basic case tasks.These basic tasks are executed on the same experiment platform without the memory protection.
For the above three cases, we executed 1, 2, 4, and 6 tasks and measured their overhead separately.We have run 50 times for each case, and Figure 9 shows the experimental results and the comparison of the three cases.The overhead of a task execution includes task execution time, task switching time, and dispatch time of the task scheduler.It can be seen from the figure that the running time is almost linearly proportional to the number of tasks.Table 5 illustrates the overhead introduced by data encryption and shows the proportional relationship between strategy1/strategy2 and the basic case, respectively.For the first encryption strategy, the overhead of a task execution is increased by about 50% compared to the basic case.When multiple tasks are executed, the advantages of the second strategy are obvious, since the overhead increases only about 20%. Figure 10 lists the standard deviation of the overhead.Overall, these three cases are more stable than the overhead of the memory protection engine shown in Figure 8.We think it is because the execution time of the task hides the overhead of the memory protection engine.Among the three cases, the basic case is relatively stable, and we can see that although the second strategy has an advantage in terms of overhead, its stability is slightly worse than the first strategy.
As can be seen from above experimental results, in the embedded real-time operating system, ensuring the security of the tasks will inevitably affect the real-time task.Therefore, for a task requiring high real-time performance, we keep it on the on-chip memory.Therefore, the task does not need to be decrypted when it runs again.We measured the task preemption time and the task switching time for multitasking on the on-chip memory.The task preemption time is the time required for a high-priority task to preempt a low priority  task.It includes the time to save the context of the preempted task and the time to resume the context of the high-priority task.We measured the task preemption time in the Linux kernel, the FMP kernel without the memory protection, the FMP kernel running strategy1, and the FMP kernel running strategy2, respectively.The evaluation tasks are two tasks with different priorities, task1 and task2.The priority of task1 is set high, and the priority of task2 is set low.Task1 is activated in task2, so task1 can preempt task2.Using the system timer, the time before activating task1 and the time before running the first instruction of task1 are recorded, respectively.We run the experiments 100 times, and the average times are shown in Table 6.It shows that the FMP kernel has an absolute advantage over Linux in processing real-time tasks, and it can be seen that SoftME has no effect on the task preemption time.
The task switching time is the time spent on switching one task to another task.We activated six tasks at the same time and recorded the time when the first instruction of each task was invoked, denoted by t1 to t6.So the time spent on switching from task1 to task2 is t2-t1, the time spent on switching from task2 to task3 is t3-t2, and so on.The experimental result is shown in Figure 11.In Section 5.3, we configured the task switching time to be 1ms.It can be seen from the figure that the actual time is more than 1ms, which is due to the need to save and restore the context when the tasks are switched.

Conclusions and Future Work
IoT devices have the disadvantage of being unsupervised, making them vulnerable to physical attacks.We propose SoftME, an approach for protecting trusted tasks against physical attacks.We use the on-chip memory to protect TEE OS and design a memory protection engine to protect the confidentiality and integrity of the tasks stored on the off-chip memory.We implemented SoftME on the physical development board.Finally, the experimental evaluation shows that the memory protection introduces an overhead of about 20%, which is within acceptable limits.
However, for a single-core embedded system, encryption will have a negative impact on the execution of real-time tasks.Multicore architecture has replaced single-core systems in many areas and has become the mainstream of embedded systems.In a multicore architecture, multiple tasks can be executed in parallel, with more parallel computing power, lower clock frequency, lower power consumption, and higher efficiency.Therefore, in future work, we can allocate a dedicated core for the memory protection engine, and other cores are responsible for executing tasks.This design will make use of the parallel computation of multicore to reduce the overhead caused by encryption and decryption, thereby improving the performance.

Figure 2 :
Figure 2: Embedded system based on the on-chip memory.

Figure 5 :
Figure 5: Workflow of the task scheduler.

Figure 7 :
Figure 7: The overhead of the memory protection engine.

Figure 8 :
Figure 8: Standard deviation reflecting the degree of data dispersion.

Figure 9 :Figure 10 :
Figure 9: The overhead of different protection strategies.

Figure 11 :
Figure 11: The task switching time.

Table 1 :
Decryption process in the loading phase.Get the unique key   .Data transfer from DRAM to OCM.1.Read the  and  stored on the OCM. 2.  = AE ' (  , , , ).

Table 3 :
Code sample of the task scheduler.

Table 4 :
Code sample of the memory protection engine.

Table 5 :
The proportional relationship between the two strategies and the basic case.

Table 6 :
The task preemption time.