Exploiting Security Dependence for Conditional Speculation Against Spectre Attacks

Speculative execution side-channel vulnerabilities such as Spectre reveal that conventional architecture designs lack security consideration. This article proposes a software transparent defense framework, named as Conditional Speculation, against Spectre vulnerabilities found on traditional out-of-order microprocessors. It introduces the concept of security dependence to mark speculative memory instructions which could leak information with potential security risks. More specifically, security-dependent instructions are detected and marked with suspect speculation flags in the Issue Queue. All the instructions can be speculatively issued for execution in accordance with the classic out-of-order pipeline. For those instructions with suspect speculation flags, they are considered as safe instructions if their speculative execution dose not refill new cache lines with unauthorized privilege data. Otherwise, they are considered as unsafe instructions and thus not allowed to execute speculatively. To pursue a balance of performance and security, we investigate two filtering mechanisms, Cache-hit-based Hazard Filter and Trusted Page Buffer-based Hazard Filter to filter out false security hazards. As for true security hazards, we have two approaches to prevent them from changing cache states. One is to block all unsafe access, the other is to fetch them from lower-level caches or memory to a speculative buffer temporarily, and refill them after confirming that they are on the correct execution path. Our design philosophy is to speculatively execute safe instructions to maintain the performance benefits of out-of-order execution while delaying the cache updates for speculative execution of unsafe instructions for security consideration. We evaluate Conditional Speculation in terms of performance, security, and area. The experimental results show that the hardware overhead is marginal and the performance overhead is minimal.


INTRODUCTION
S PECULATIVE and out-of-order execution are fundamental techniques to exploit instruction-level parallelism (ILP) in modern high-performance processors. In typical handling of mis-speculation, the pipeline states, such as integer and floating registers, are rolled back to the fault instructions. However, some microarchitecture states, such as cache contents, are usually not reverted, since such negligence does not violate the architectural semantics. Unfortunately, recently exposed Spectre and Meltdown, which are of speculative execution side-channel vulnerabilities, have revealed the security hazards of neglecting those unrecovered microarchitectural states [1], [2], [3], [4], [5]. Attacks exploiting speculative execution vulnerabilities usually induce a victim to speculatively perform operations that would not occur during correct program execution, but when occurring it would leak the victim's confidential information via a side channel to the adversary. Speculative execution vulnerabilities become a serious threat to commodity systems since speculative execution is widely adopted in most modern microprocessors [6], [7], [8].
Industrial researchers have responded rapidly to mitigate these threats [9], [10], [11], [12]. Retpoline, proposed by Google, converts indirect jump instructions into a blocking loop that combines return instructions to avoid unsafe speculative execution [13]. Intel has provided multiple microcode updates for their products and software developers can invoke specific instructions to enable different granularities of defense mechanisms to avoid interferences with the branch predictor between applications running at different privilege levels [9]. Various isolation mechanisms, such as KAISER and Site Isolation, are developed to shut down the observable channel between security domains [14], [15]. Although they effectively ease the security tensions, most existing mitigation techniques are software-based and more or less sacrifice transparency and/or performance.
To strike a balance between security, performance, and transparency, it is essential to innovate the microarchitecture design to safeguard the speculative execution. In the fourth quarter of 2018, Intel released a series of Coffee Lake R processors with hardware-based defenses against Meltdown and Foreshadow [16]. However, other Spectre-type variants are still mitigated by microcode together with the operating system. To date, there has been no widely accepted hardware solution for defending Spectre variants. This paper focuses on the microarchitecture design innovations against the major variants of Spectre (Spectre-V1, V2, V4, and SpectrePrime).
More specifically, our work concentrates on vulnerabilities associated with branch speculation and memory access speculation. Overall, this paper makes the following contributions: 1) We first propose the concept of Security Dependence.
Similar to data dependence and control dependence, this kind of new dependence is used to depict the speculative instructions which leak micro-architecture information with potential security risks. 2) An effective and software transparent defense framework against Spectre, named as Conditional Speculation, is proposed. This framework consists of three components to detect whether a speculative access is safe or not according to Security Dependence dynamically, then allow executing safe instructions as usual, and prevent the execution of unsafe parts from changing cache states. 1 Security Hazard Detection is introduced in the Issue Queue to identify suspected unsafe instructions with security dependence. 2 Targeting at the different features of multiple side-channel attacks, Hazard Filter is deployed to filter out safe instructions which do not leave observable traces in channels and allow them to execute speculatively. 3 Security Hazard Response is employed to prevent microarchitectural states from being changed by unsafe instructions. This paper describes how Conditional Speculation defends against Spectre attacks based on the cache side channel. 3) Two filtering mechanisms are investigated to figure out falsely identified security hazards, with the goal of pursuing a balance of performance, security, and transparency. The first proposed Cache-hit based Hazard Filter targets the speculative instructions which hit the cache. Since their speculative execution will not change cache (content), they are safe. Another proposed filter, Trusted Page Buffer based Hazard Filter (TPBuf), identifies safe speculative instructions from another perspective. For realistic and dangerous Spectre variants that use the shared memory-based cache side channel (e.g., Flush+Reload), their speculative execution of malicious gadgets have a common feature named as S-Pattern. TPBuf is designed to capture S-Pattern from all speculative execution. For any speculatively executed memory instructions, it is considered safe if it does not match S-Pattern. 4) Two Hazard Response mechanisms are explored. The straightforward one is to block all unsafe instructions. Furthermore, inspired by InvisiSpec [17], a hardwarebased buffering mechanism, named as SPBuf, is investigated for temporarily buffering speculative traces. It can ensure security on the one hand, and provides data for subsequent data-dependent operations, offering higher performance on the other hand. We quantify the performance impact of the locations of Speculative Buffer. Experiments show that deploying a speculative buffer in last-level cache can achieve a similar performance as deploying in all caches because Hazard Filters can take advantage of the locality of the private cache even without the Speculative Buffer. This observation motivates us to propose a lightweight Speculative Buffer design. In particular, we only deploy Speculative Buffer with the last-level cache, thus avoiding the complex cache coherence modifications in the private caches. The next section contains a brief description of Spectre. Section 3 presents the threat model. The concept of security dependence is introduced in Section 4. Section 5 introduces the framework of Conditional Speculation. Sections 6, 7, and 8 describe the Hazard Detection, Hazard Filter, and Hazard Response respectively. Section 9 evaluates the Conditional Speculation. Section 10 is the discussion. And Section 11 summaries related works. Section 12 concludes this paper.

UNDERSTANDING THE SPECTRE ATTACKS
Spectre attacks usually trick the processor into speculatively executing instruction sequences that should not have been executed under correct program execution. By influencing which instructions are speculatively executed, this kind of attacks is able to use a side channel to transmit/leak victim's information out. A typical Spectre attack has the following three common key steps.

Induce Victim to Incorrect Speculation
There are two major approaches in Spectre attacks to induce victim to incorrect speculation.
Branch Speculation. Through purposeful training of the branch predictors, an adversary can change the control flow to incorrect speculative execution path to access the unauthorized data [1], [4], [18], [19], [20], [21]. Some processors use static branch predictor, which makes it much easier for an attacker to construct mis-speculative execution. What's more, complete process-or thread-level isolation is rare in branch predictor for existing high-end processor cores. It makes cross-process or cross-thread attacks feasible.
Memory Speculation. Another possible approach to inducing speculative execution is load speculation [11]. The load instruction is usually allowed to be speculatively executed even if the address of its older store instruction is unknown. Attackers can exploit it to induce the load instruction to speculatively access confidential data illegally.

Construct a Long Timing Window for Incorrect Speculative Execution
To gather enough and stable information of the incorrect speculation, a long timing window is essential for the adversary. There are several ways to achieve this, and we introduce two classic approaches. Delinquent Memory Accesses. The attacker can use the cache line flush instruction or other ingenious methods to evict their source operands into off-chip memory [1], [3], [4], [11]. Such delinquent memory access will hold the predicated instruction a long time in Issue Queue due to unready source operand.
Long Dependence Chain. Constructing a long data dependence chain for computing source operands can also be used to provide a longer timing window for stable speculative executions [22].

Infer Secrets From Side-Channel Leakages
During the long timing window, subsequent speculative execution might leave traces in microarchitecture which can be observed by some side-channel methods. As a widely used method now, cache side-channel attack exploits the time difference of memory accesses to deduce whether a victim process has loaded a specific cache line or not, and then infer the offset address or execution path. There are many well-studied cache side-channel attacks, including Flush+Reload [23], Prime+Probe [24], Evict+Reload [25], Flush +Flush [26] and Evict+Time [27].

THREAT MODEL
We have the following assumptions on an attacker. She can execute her codes on the same machine with the victim process without elevated privileges. And it is feasible for the attacker to induce the target branch to jump to malicious gadgets in the single or cross address space.
This paper aims at a large class of representative Spectretype attacks. They steal victims memory contents instead of the value of registers (Note that stealing memory contents is perhaps more dangerous than stealing register values). We define the defense scope primarily for the following reasons. The community has not yet found a way to enumerate all the possible side channels. Many choices are possible for the side-channel component, such as the cache hierarchy, AVX units [28] and ports [29]. Thus it should be noted that Spectre-type attacks do not restrict themselves to cache side channel only. However, identifying the existence of a side channel is only the first, small step towards mounting highly successful attacks. Compared with other side channels, cache side channel seems much more mature and efficient and has been widely used in most of the documented Spectre variants. Note that while we limit our discussion to this threat model in this paper, the basic architecture can still work for an expanded threat model.

SECURITY DEPENDENCE
Security is a complex issue. In this paper, we focus on the type of problems caused by side-channel vulnerabilities exploited by Spectre. These are essentially micro-architecture information leakages due to mis-speculation. To help capture the problems caused by unsafe speculative execution, we introduce the concept of Security Dependence.
Instruction j is security-dependent on Instruction i with respect to leakage channel c if both conditions hold: i precedes j in program order. If j is speculatively executed ahead of i, j will leakage information into channel c. Note that since leakage happens in a variety of channels, security dependence is defined with respect to the particular channel. Given the above definition, Table 1 summarizes the major security dependence in Spectre vulnerabilities. In this paper, we focus on how to defend against the first six variants based on cache (content) side channels. In other words, if j does not change cache content, then it does not have a security dependence with respect to cache content channel and we consider it to be safe. We can find that security dependence comes from two situations, including memory-memory speculation and branch-memory speculation.
Security Dependence Under Memory-Memory Speculation. One example is the Proof-of-Concept (PoC) X86 assembly code piece of Spectre V4 in Listing 1, which exploits speculative store bypass (also named as load speculation) [11]. Instruction 1 (i1 hereafter) is a store operation, and i2 is a load operation. Assuming these two instructions access the same memory address, there is a RAW dependence actually. The attacker might construct an environment in which i1 is pending in the issue queue for the unprepared source register (rdi), i2 is speculatively launched and get forwarded stale sensitive data. Once the address of i1 is known, the load mis-peculation occurs and the incorrect execution results of i2 need to be discarded. However, the sensitive data related cache line has already been refilled into L1 cache by i2 and i4, such information leakage allows attackers to infer the sensitive data. Therefore, we say both i2 and i4 are security dependent on i1. Security Dependence Under Branch-Memory Speculation. In case of Spectre V1 in Listing 2. In attackers' well-designed environment, the branch i6 stays in the issue phase, and the i7 is speculatively executed ahead of time to access unauthorized sensitive data. As with the previous scenario, the cache contents are changed in such mis-speculation and the sensitive data might be inferred out via cache side channel. According to our definition, i7 is security dependent on i6.  [19] indirect br mem cache Spectre V2 [1] indirect br mem cache Spectre V4 [11] mem mem cache SpectrePrime [4] conditional br mem cache SpectreRSB [20] return mem cache ret2spec [21] return mem cache NetSpectre [28] conditional br mem/AVX cache/AVX SMoTherSpectre [29] indirect br conditional br port contention 1 Side channel c listed in this table is the method used in corresponding documented papers.

CONDITIONAL SPECULATION FRAMEWORK
Speculation and cache side-channel information leakages are critical factors for Spectre attacks. A straightforward defense policy is to prohibit any speculation of memory instructions at the cost of severe performance slowdown. Actually, not all speculative memory access instructions pose a risk of leaking information. Therefore, we propose a defense framework, named as Conditional Speculation, which detects whether a speculative access is safe or not according to security dependence dynamically, then allows executing safe instructions as usual, and prevents the execution of unsafe parts from changing cache states. The framework of Conditional Speculation is summarized in Fig. 1 with three major components. Guided by this framework, we propose a generic design of core microarchitecture, as shown in Fig. 2. Security Hazard Detection is integrated into the Issue Queue to identify the security dependent memory access instructions and assigns each instruction with a suspect speculation flag to indicate they POSSIBLY change cache contents due to mis-speculation.
Security Hazard Filters are integrated into the load store queue (LSQ) and L1 DCache to identify whether instructions tagged with suspect speculation are safe or not. With the goal of pursuing a balance of performance and security, we propose two filters in LSQ and L1 DCache to remove false security hazards. The proposed Cache-hit based Hazard Filter targets the instructions which hit in L1 DCache without any cache (content) side-channel information leakage. From another perspective, Trusted Page Buffer based Hazard Filter (TPBuf Filter) identifies more safe speculative instructions.
Since Spectre variants that use the shared memory-based cache side channel, in which secret data and memory region used for constructing side-channel usually located at different memory pages, their speculative execution of malicious gadgets has a common feature (S-Pattern) described in Section 7.2. It is considered safe if a suspicious access does not match S-Pattern. The instructions that survive the filters are allowed to be speculated normally, thereby obtaining better performance in the context of security.
Security Hazard Response handles the remaining unsafe instructions. We investigate two approaches to prevent them from changing cache states. One approach is blocking all unsafe accesses (Blocking), which sends back a signal from L1 Dcache to Issue Queue that the re-issue logic is applied to the memory instruction until its security dependence is resolved. The Blocking approach requires only minor modifications to the processor but might have negative impacts on performance. The other method, named as SPBuf, makes unsafe accesses invisible to caches hierarchy and allows them to load data speculatively to subsequent data-dependent instructions. In Conditional Speculation, we find that deploying Speculative Buffers only in last-level cache can achieve a similar performance as deploying in all level caches for the benefits of Hazard Filter. Thus we deploy the buffer in the last level cache only, which avoids modifying cache coherence logic in private caches for SPBuf.

SECURITY HAZARD DETECTION
We design a Hazard Detection logic based on bit matrices as shown in Fig. 3, which zooms in the Issue stage in Fig. 2. Bit-matrix is a popular way used by some commodity processors to track data dependence and age information [30], [31]. With a security detection module, the security dependence becomes one of the factors to determine whether an instruction to be issued, together with data dependence and age information.
Matrix Organization. Assuming that the Issue Queue (IQ) has N items, the security dependence matrix will contain a register array of NxN bits. The number of read ports of this matrix is equal to the dispatch width, and the number of write ports is equal to the issue width. This matrix is indexed by IQPos (Issue Queue Position) of each instruction. Given any Instruction X, IQPos_X denotes its location in the IQ. If the value of the Matrix[IQPos_X, IQPos_Y] is 1,  X has security dependence on Y. Otherwise, it means there is no security dependence between them.
Matrix Initialization. When the new instruction X is dispatched into the IQ, one row is allocated with the index IQPos_X. For each Instruction Y which is valid in the IQ at this moment, Matrix[IQPos_X, IQPos_Y] determines the security dependence between instructions according to the following formula. There are three conditions: 1 If Y is valid and precedes X, which means Y is dispatched into the IQ before X. 2 According to our threat model, we check if a memory instruction is security-dependent on the previous branch or memory instructions. 3 If the preceding branch or memory instructions are still waiting in the IQ when a memory instruction is issued, this memory instruction will be considered to have security dependence.
Matrix½X; Y ¼ðIssueQ½X:opcode ¼¼ MEMÞ &ðIssueQ½Y :opcode ¼¼ MEMorBRÞ &IssueQ½Y :valid &!IssueQ½Y :issued: Hazard Detection. Fig. 3 illustrates the three-stage options of the Issue Queue. At the 1st stage, the data dependence matrix generates a dependence vector. At the 2nd stage, this vector is then sent to the age matrix to select the oldest ready instruction to be issued. At the 3rd stage, for those instructions selected to be issued, the security dependence matrix is queried to get their security dependence, and then the states of corresponding entries of Issue Queue are updated. In particular, bits in each row of the security dependence matrix are processed by OR operation and the result demonstrates whether there is a potential security hazard. When an instruction is selected to be issued and a security hazard is detected, it will be tagged with a suspect speculation flag.
Matrix Clearance. After an instruction X is issued, the corresponding bit in Update Vector Register will be set as 0. The column of security dependence matrix indexed by IQPos_X will be reset at the next cycle. Such operation means that the security dependence between corresponding instructions and X is cleared.

SECURITY HAZARD FILTER(S)
According to the security dependence under branch-memory and memory-memory speculation, a large number of memory access instructions are justified as suspect by Hazard Detection module. Employing a conservative approach to execute these instructions causes significant performance slowdown. However, some speculative memory accesses are safe because of no observable traces left. Kinds of filters are investigated to figure out these safe instructions and allow them to be executed speculatively. Focus on Spectre variants based on cache side channels, we propose two filters, Cache-hit based Hazard Filter and Trusted Pages Buffer based Hazard Filter.

Cache-Hit Based Hazard Filter
When a memory instruction is speculatively issued to the memory access pipeline, it is tagged with its suspect speculation flag. If the suspect memory access hits in L1 DCache, it will continue as a normal memory instruction. However, if it encounters a miss in L1 DCache, the missing request will be marked as unsafe and the memory access will be sent to Hazard Response module. This design requires only minimal changes in the L1 DCache control logic.
Secure Update for Cache Replacement Logic. It should be noted that if speculative accesses that hit L1 DCache under update cache replacement metadata (e,g, LRU bits), secret information is possible to be observed [32]. For example, an attacker can train the LRU bits of given sets, then carefully induce the victim to change the LRU bits speculatively, then figure out which sets have been accessed, and finally infer sensitive data. To prevent such attacks, we propose two secure update policies and evaluate them in Section 9.3.

1)
No update policy skips LRU updates for speculative accesses that hit the L1 DCache. For speculative accesses that eventually become non-speculative, not updating LRU bits can diminish the effectiveness of L1 DCache replacement policy. 2) Delayed update policy sets a pending LRU update tag when a speculative access hits in the L1 DCache and performs the actual LRU update when the access reaches the head of the ROB (or becomes non-speculative) and the corresponding LRU array is not being used by accesses from the load/store pipelines.

Trusted Pages Buffer Based Hazard Filter
Trusted Pages Buffer based Hazard Filter aims at the Spectre variants based on cache side channels with shared pages, which is realistic and widely used in existing Spectre variants. The specific analysis is described in Section 9.2. These variants obey a common feature, named as S-Pattern in this paper. By dynamically identifying more safe instructions based on Cache-hit Filter, this filter gets better performance.

1) S-Pattern
As shown in Fig. 4, the speculative execution of malicious gadgets can be concluded into a common feature. In particular, it is observed that the malicious speculative execution flow always contains two special memory instructions (A and B). These two instructions have the following usages and behaviors.
1) A is used to speculatively access sensitive data. And B speculatively accesses the memory region shared with the attacker, which is used for building the cache side-channel between the victim and the attacker.
Since secret data and shared memory regions usually locate at different memory pages, these two instructions access different pages. For example, confidential pages and non-secure pages are tagged with different flags in SGX and TrustZone. Therefore, if an attacker needs a shared non-secure page to construct a side channel, it always locates on a different page from secrets. 2) B is data-dependent on A. The result of A is used to calculate the index of the shared memory region. Such well-designed dependence is also another important point for the attacker to infer the secret values. 3) In order to build a cache side-channel, the attacker needs to first flush the specific shared memory data. Then, the induced speculative execution of B has a cache miss and thus reloads the cache line into L1 DCache. This change in state information can be perceived by the attacker through the cache side channel. Thus the cache miss of B is essential to leak sensitive information over the cache side channel. Motivated by the aforementioned observation, we call the above common characteristic behavior as S-pattern. Specifically, if the instruction sequence of speculative execution is observed to have the following characteristics, we consider this sequence of speculative instruction has S-pattern behavior.
1) There are at least two instructions (A and B) that separately access different memory pages. 2) Instruction B has data dependence on instruction A. 3) Instruction B results in an L1 DCache miss. Although the malicious gadgets of Spectre attacks are featured as S-Pattern behaviors, it should be noted that the instruction flow with S-Pattern is not necessarily a Spectre attack. For example, instruction A may read secret data from registers, and the instruction B may propagate transmits the secrets to attackers. While such leakage is important to address, it is clearly less dangerous than leaking from cache hierarchy. Therefore, we consider such leakage out of scope.
2) Microarchitecture implementation of TPBuf TPBuf is designed to capture memory access behaviors with S-Pattern from all speculative executions. It records all the on-the-fly speculative memory access requests and tracks their execution status (e.g., whether the requested cache line is refilled or not). When a new memory request which misses in the L1 DCache, TPBuf compares its page address with its history records. And it decides whether this new speculative instruction is safe based on the logic described in Table 2.
The microarchitecture of TPBuf is shown in Fig. 5. One main design principle is to utilize the existing logics as much as possible to reduce the complexity of implementation. TPBuf is placed close to the Load Store Queue (LSQ) and its entries have a 1:1 mapping with the entries of LSQ. The allocation, commit and squash of TPBuf's entries are operated along with the movement of the LSQ's Head and Tail pointers. Besides, TPBuf covers all on-the-fly speculative memory instructions in the speculative execution window. In order to prevent the attacker from speculatively accessing unauthorized data directly and then spreading the data to his own memory space, the access address must be checked

Query Result Decision
There is at least one valid entry whose request accesses different memory pages, and this request is in Writeback status.
UnSafe Others Safe and get physical page number (PPN) using TLB first. TPBuf records and uses the PPN as the tag of each entry (PPN is part of the physical address which is stored in PAddr). In addition, each TPBuf entry stores a mask and a number of status bits. TPBuf detects the S-pattern and passes the results to Cache-hit filter which decides whether a suspect speculative miss request is safe. In this way, the original memory consistency model and cache coherence are unaffected.
Allocation. When memory access instructions are allocated in LSQ, they also are allocated in TPBuf and A bit is set. And Mask is generated according to A bits in TPBuf. It indicates which memory instructions in TPBuf are older than the new entry in the program order.
Update. The S bit is updated with the suspect speculation flag attached with the memory instruction. When the PPN is recorded in TPBuf, the V bit is set. The W is set when data fetched by the memory instruction becomes available to other instructions.
Detection. When an incoming request enters TPbuf, the TPBuf compares its PPN with the PPN of existing entries and then generates an address-match vector(Match). These vectors, including Match,V, W, and S, are used as inputs of the logic of equation 1 to determine whether the requests are safe. Specially, ' j ' means reduction OR, which operates OR on all of the bits in a vector to generate 1-bit output.
(1) We quantify the performance impacts of deploying Speculative Buffers in different level caches. Experiments show that deploying a Speculative Buffer in LLC can achieve a similar performance as deploying in all caches because Hazard Filters can take advantage of the locality of the private caches even without Speculative Buffer. Therefore, we deploy a Speculative Buffer in LLC (LLC-SPBuf) only, which reduces the complexity to support cache coherence and maintain memory consistency for private caches.

SECURITY HAZARD RESPONSE
Similar to TPBuf, the LLC-SPBuf has the same size as the Load Queue. LLC-SPBuf is indexed with the index of Load Queue, and each entry consists of the valid flag, cache line, and corresponding physical address. For a single-core processor, when a SpecLoad misses in LLC, it first checks whether its cache line has been loaded in LLC-SPBuf by earlier SpecLoads. If it hits in LLC-SPBuf (same physical address), the cache line will be copied to its entry and responded to the private caches. Otherwise, the request is sent to the main memory and copies the responded cache line to the LLC-SPBuf. When the unsafe access is committed eventually, an Update request is sent to invalidate all entries with the same physical address and copy the cache line from LLC-SPBuf to LLC directly, eliminating the traffic overhead of reloading data from main memory. In this paper, we primarily focus on the effect of Conditional Speculation in single-core processors at the performance. As for the implementation details for multi-core processors, such as how to support memory consistency models, cache coherence and so on, we can make a reference to InvisiSpec.
2) Second lightweight feature: Transforming Validation into Exposure SpecLoad is a type of memory requests which is introduced to inform the cache hierarchy that this unsafe access only reads data back to the pipeline and does not change any cache state, such as coherence state, LRU. Therefore, the pipeline may fail to receive if there is any invalidations directed to the line loaded by SpecLoads. To avoid this violation of memory consistency, Validation and Exposure are proposed in InvisiSpec. Compared to Validation, Exposure has low overhead because a load instruction can be committed as soon as the Exposure is issued, whereas the instruction cannot be committed until the completion of a Validation. In Conditional Speculation, there are few SpecLoads for the benefits of Hazard Filter, so we can transform the Validation to Exposure by scheduling the issue of SpecLoads as following. Consider TSO first, SpecLoads are initiated when all of the earlier loads accessing the same cache line in LSQ have initiated their SpecLoads. As a result, TSO would not require squashing the load on the reception of invalidation to the line it loaded. The SpecLoad just needs Exposure, not Validation. Now consider RC, only speculative loads that read when there is at least an earlier fence in the LSQ will be squashed by an invalidation to the line. Hence, their Spec-Loads are initiated when the earlier fence is committed.
When the instruction is committed normally, an Exposure is issued to refill corresponding cache lines to the caches and update cache states. When a SpecLoad request is initialized for the instruction and sent to the lower caches, a new entry is allocated in the miss handler (MSHR)s. Traditionally, requests accessing the same cache line in MSHR can be merged to a single request and sent to the lower cache hierarchy. As for an incoming SpecLoad request, it can be merged into an earlier SpecLoad request to the same cache line, but cannot be merged to common requests for the consideration of security. Similarly, an Exposure request can be merged to earlier Exposure requests.

Methodology
We evaluate Conditional Speculation on a cycle-accurate simulator (Gem5) and a real hardware in terms of security, performance, and area cost. We simulate an Out-of-Order processor with Conditional Speculation to evaluate the features in terms of performance on the complexity of Out-of-Order core. Table 3  Baseline. Base out-of-order processor with the configuration listed in Table 3, without any defense mechanism.
Naive Policy. Simply considers all the security-dependent memory accesses as unsafe and block their execution.
[F]-Blocking. Conditional Speculation with Blocking Hazard Response. 'F' means the mechanism of Hazard Filter, with 'CF' indicating Cache-hit Filter and 'CTF' indicating Cache-hit Filter and TPBuf Filter working together.
[F]-SPBuf-[C]. Conditional Speculation with SPBuf Hazard Response. 'F' means the Hazard Filter and 'C' means which level cache is deployed with Speculative Buffer ('ALC' for all level caches and 'LLC' for last level cache respectively). For example, 'CF-SPBuf-ALC' employs Cache-hit Filter and deploys Speculative Buffer in all level caches.

Security Analysis
Hazard Detection is the initial detection of security dependence and only tags a suspect flag for each instruction, the security of Conditional Speculation depends on the combination of Hazard Filter and Hazard Response mechanisms. Table 4 summarizes the security analysis. In this table, Spectre variants based on cache side channels are divided into six typical scenarios classified by different combinations of cache side-channel attacks and page sharing mode.
Hazard Filter. Cache-hit Filter (CF) does not load data for speculative memory accesses that miss in L1 DCache, which is the prerequisite of observing secrets from cache side channels. Cache-hit Filter and TPBuf Filter (CTF) identifies the unsafe speculative memory accesses according to S-Pattern, which is the common feature of cache side-channel attacks based on shared data. Therefore, CF succeeds in defending against all six types of variants and CTF can defend against the first four types in Table 4. It should be noted that shared pages based channels are widely used in PoC codes of existing Spectre variants, including "Flush +Reload, share data" based variants (Spectre V1, V1.1, V2, V4), and "Prime+Probe, share data" based SpectrePrime.
Hazard Response. Both Blocking and SPBuf prevent unsafe accesses from leaving sensitive traces on caches, so they do not influence the security for their defense ability of Hazard Filter. Furthermore, it should be noted that the Blocking mechanism can additionally mitigate side channels through the memory system (e.g., DRAM contention).
In summary, both Blocking and SPBuf based Hazard Response mechanisms do not influence the defense ability for cache side channels. As for Hazard Filter mechanisms, CF defenses against all Spectre variants based on cache side channels, and TPBuf Filter is able to prevent widely used variants exploiting shared pages, but fails to block variants that don't need shared data. Fig. 6 compares the performance impacts of three different mechanisms of Conditional Speculation. Not surprisingly, the Naive Policy causes the largest performance degradation (54.6 percent performance degradation on average, and the worst case is 146.8 percent for hammer). In contrast, CF-Blocking provides a certain degree of relaxation. By dynamically identifying false security hazard, it allows the memory instructions that hit L1 DCache to be executed. Such a filter significantly improves the performance (on average reduce performance degradation from 54.6 to 13.2 percent). In particular, CF-Blocking recognizes 86.3 percent speculative accesses as safe due to the high L1 DCache hit rate for SPEC  Flush+Reload, @ @ @ @ share pages Flush+Flush, @ @ @ @ share pages Evict+Reload, @ @ @ @ share pages Prime+Probe, @ @ @ @ share pages Prime+Probe, @ @ Â Â no shared pages Evict+Time, @ @ Â Â no shared pages CPU benchmarks. CTF-Blocking gets further performance improvements. S-Pattern depicts the memory access pattern with malicious behaviors in Spectre attacks. For any speculative instruction which misses L1 DCache, if it does not match S-pattern, it is considered safe and can still be speculatively executed. It can be observed that CTF-Blocking further reduces the performance overhead to 7.0 percent on average.

1) Performance overhead for Naive Policy
The security dependence comes from two major situations, branch-memory speculation and memory-memory speculation. In order to have a better understanding on the performance loss, we make in-depth analysis as below.
We first model a branch-memory dependence matrix which recognizes speculative memory accesses dependent on branch instructions as unsafe. It introduces 23.1 percent performance degradation on average. As expected, the more branch instructions exist, the more speculative memory accesses will be tagged as unsafe. In addition, high misprediction rate might further reduce the performance. As shown in Table 5, the worst case of astar (65.5 percent overhead) has a high branch misprediction rate (8.5 percent). Besides, 16.7 percent instructions are unresolved branch instructions when they are allocated in the Issue Queue, and 27.2 percent memory instructions are marked as unsafe.  After appending the memory-memory dependence, the proportion of unsafe speculative memory accesses increases. More seriously, other instructions that are dependent on these unsafe operations have to be blocked in the issue queue. Experiments show that some cases are particularly sensitive to memory-memory security dependence. For example, the performance overhead of lbm increases from 53.5 to 92.4 percent, and it takes more than 180 times longer to resolve a branch than the Baseline case. As depicted in Table 6, the Naive policy will block almost 73.3 percent speculative memory accesses on correct execution path.
2) Performance overhead from secure update for Cache replacement logic To understand the performance implication of secure update policy, we evaluated them on top of CTF-Blocking using the same set of benchmarks in Fig. 6. The results show that no update policy introduces 0.71 percent performance degradation. For delayed update policy, our experiments show that it improves no update policy by 0.26 percent. There is little difference in the performance impact of the two policies. Considering the complexity of this policy, we believe that no update policy is the better option due to its simplicity and this policy is used for the following experiments.
3) Performance gain from Filters Cache-Hit Filter (CF). This filter exploits the locality of memory access behaviors of normal workloads. Compared to Naive Policy, CF improves the performance by 26.7 percent on average as shown in Fig. 6. Take GemsFDTD in Table 6 as an example, the cache hit rate is more than 99.9 percent, and then only 0.1 percent speculative memory accesses are recognized as unsafe. In case of lbm, milc and zeusmp, they have higher L1 DCache miss rates. Thus their performance improvements brought by CF are low. Such analysis is demonstrated in Fig. 6. Most of the programs in SPEC CPU 2006 have high cache hit rates, this filter on average successfully recognizes 86.3 percent speculative accesses as safe, and blocks only 3.8 percent speculative memory accesses on the correct execution path.
Cache-Hit Filter and TPBuf Filter (CTF): Besides exploiting the locality of memory access, the performance improvements are also related to the proportion of cache misses which do not match the S-Pattern. For most cases with high cache hit rates (such as dealII, hmmer and namd), there is little space for optimization. But for applications with low speculative cache hit rates, high S-Pattern mismatch rate denotes that there are large proportions of safe speculative cache misses, which can be recognized by TPBuf Filter and executed speculatively as normal. Therefore, significant performance improvements can be achieved after introducing TPBuf Filter to Cache-hit Filter. For instance, lbm has a lower L1 DCache hit rate (61.8 percent), and there are 86.2 percent speculative accesses mismatching S-Pattern. In this case, CTF captures those safe speculations and brings 38.1 percent performance improvement in comparison with CF. Another interesting case is libquantum. Although it has also a lower cache hit rate (79.6 percent), more than 99.9 percent accesses belong to S-Pattern. These operations are considered unsafe and are blocked for speculation. Thus the performance benefit from CTF is limited for libquantum. On average, this mechanism improves the performance by 5.4 percent in contrast with CF.
4)] Performance gain from SPBuf Fig. 7 shows Conditional Speculation configured with SPBufs has a further performance improvement. Speculative Buffer in LSU and LLC, there are two major reasons for performance improvement, one is spatial locality in L2, and the other is to shorten the time to access main memory. As demonstrated in Table 6, the majority of cases, such as astar, gamess, and hmmer, have excellent locality (more than 99.9 percent) for SpecLoads in L2, which succeeds in providing data to pipeline and accelerating subsequent execution. For cases with low spatial locality, the mechanism helps to shorten the waiting time to refill data from memory, which will improve performance effectively as well. For example, astar, libquantum and zeusmp have large amounts of Spec-Load requests to memory and there are good performance improvements as shown in Fig. 7. Although milc has poor locality in caches as well, there are too many consecutive SpecLoads accessing different cache lines, then the miss handler will always be blocked and cannot handle other Spec-Load requests. Therefore, there is less improvement for milc. Shown in Fig. 7, this mechanism significantly reduces the performance overhead to 1.7 percent on average. In addition, we find that there is a small difference in performance impact between CF-SPBuf-LLC and CF-SPBuf-ALC as shown in Fig. 7. This shows that deploying Speculative Buffers in private caches does not significantly improve performance because most of the performance improvement of Speculative Buffer in private cache comes from the locality of the private caches, while the locality has been fully utilized by Cache-hit Filter in Conditional Speculation. Considering the complexity of deploying SPBuf in private caches, we believe that deploying Speculative Buffer in LLC only is the better option due to its efficiency.

5) Breakdown of performance benefits from different components
We analyze in detail the percentage of each mechanism for performance gains, which is shown in Fig. 8. In conclusion, compared to Naive Policy blocking all unsafe speculative loads, the SPBuf mechanism significantly improves performance by 34.1 percent. On average, the improvement rates of Cache-hit Filter, TPBuf Filter, and SPBuf are 26.7, 4.1, and 3.3 percent respectively. Most cases have a high L1 hit rate, therefore, Cache-hit Filter is the major contributes the most. For the reason that the characteristics of dynamic execution vary from different cases, the effect of the other two mechanisms is not quite similar. In lbm, mcf, milc, and zeusmp, many unsafe speculative loads do not match S-Pattern, so there is a significant performance boost introduced by TPBuf Filter. For the cases of bwaves, libquantum, and zeusmp, SPBuf dramatically reduces the waiting time for loading data for unsafe speculative loads, resulting in great performance improvements.

6) Comparison with InvisiSpec
InvisiSpec [17] is a prior work for employing speculative buffer to block Spectre attacks. Based on the same simulation configuration, we compare the performance overhead of Conditional Speculation with InvisiSpec. It is noted that we used open-source code implemented by InvisiSpec, which does not have L3 Cache. In our evaluation, InvisiSpec is configured to defeat existing Spectre attacks through cache side channel, as known as InvisiSpec-Spectre. As shown in Fig. 7, the average performance loss of InvisiSpec is 6.5 percent, which is close to CTF-Blocking. For the cases of bwaves, bzip2, leslie3d and sphinx3 that benefit less from the Hazard Filters, InvisiSpec performs better than CTF-Blocking. However, for the cases that have high cache hit rates, such as GemsFDTD, hmmer, and soplex, CTF-Blocking reaches lower performance overhead. What's more, all configurations of Conditional Speculation with SPBuf Hazard Response (CF-SPBuf-LLC, CF-SPBuf-ALC, CTF-SPBuf-LLC, CTF-SPBuf-ALC) have better performance than InvisiSpec. Especially for libquantum that has nether good spatial locality nor high S-pattern mismatch rate, Conditional Speculation with SPBuf Hazard Response further improves the performance.   All the benchmarks of SPEC CPU 2006 with train input size run on the Xilinx VC709 FPGA platform at the frequency of 50 MHz. For the software transparency of Conditional Speculation, the benchmark cases can run successfully without any modification. As shown in Fig. 9, the performance overhead of CTF-Blocking on FPGA platform is 2.8 percent on average. Different from simulating a part of the programs in Gem5, the entire programs of benchmarks are simulated in experiment on the FPGA, resulting in the inconsistency of results. However, both of them demonstrate that Conditional Speculation does introduce only minimal performance overhead. Finally, we port the previously disclosed Spectre V1 PoC [1] to the processor to evaluate the effectiveness of Conditional Speculation against the realworld attack. And experiments show that this mechanism succeeds to defend against this attack.
2)] Hardware overhead evaluation The security dependence matrix is implemented at Register-Transfer Level (RTL). Based on SMIC 40 nm technology, we use Synopsys ASIC design flow and tools to assess the timing and area cost for such matrix and related control logic. For the issue queue with 64 entries, the additional area occupied by this matrix is 0.05 mm 2 on average, which is only 3.5 percent of a 4-way 32 KB Cache. Synthesizing with TT corner using Design Compiler, the timing of the critical path is only increased by 1.4 percent.
In the implementation of Cache-hit Filter, the RTL level modification is marginal, as it only needs to check the safe flag. For the TPBuf implementation, TPBuf is placed close to the Load Store Queue (LSQ) and its entries have a 1:1 mapping with the entries of LSQ. The additional area occupied by TPBuf is 0.00079 mm 2 on average, which is about 0.055 percent of a 4-way 32 KB Cache. Compared to the complexity of store-load forwarding and ordering-failure detection in LSQ, TPBuf only involves PPN address comparison logic and does not introduce new critical paths.
We implement the Speculative Buffer with a 1 KB SRAM. It takes up 0.017 mm 2 on average, which is about 1.2 percent of a 4-way 32 KB Cache. In general, its critical path is less than L1D Cache. So it doesn't have an impact on critical paths.

Extend Conditional Speculation Beyond DCache Side Channel
As explained in our threat model, this paper aims at Spectre attacks based on cache side channel. However, the microarchitecture states are far beyond cache contents, which can also include physical registers, various queues, TLB and ICache etc. Any other future unknown side channel might be fundamentally different from all the known side channels. Hence, we do not argue that our defense method can effectively cope with all the unknown covert channels. It is worth pointing out that this is a good research problem. We need to combine the idea of Conditional Speculation with a new kind of dedicated filter for the new side channel. One case study is to extend conditional speculation to defend the Spectre variants based on port-contention side channel, such as SMoTherSpectre [29]. The key of this attack is that attackers lure a victim process to speculatively execute a specific gadget. The gadget occupies different execution ports according to the jump direction of the conditional branch instruction, which is data-dependent on the secret. Therefore, the attackers can infer the outcome of conditional branch through port-contention side channel and finally steal the secrets. To defend such kind of attacks, Conditional Speculation marks all these speculative conditional branch instructions as unsafe, and blocks them in the issue stage until they become non-speculative. Since the speculative execution of conditional branch instructions are blocked, it is only possible for the branch predictor instead of the secret to cause the port contention, and then the attacker can't infer the secret by timing the port competition in the speculative window. In terms of performance, it is noted that Conditional Speculation delays the execution of speculative conditional branch, but their subsequent instructions are still allowed to be speculatively executed as normal. So it won't have a significant impact on performance. Even so, a hazard filter can be introduced to further reduce performance loss. One possible approach is to use the compiler to identify the gadgets that might be used to construct the port-contention side channel and mark their dependent conditional branch as unsafe. Compiler will insert instructions or attach additional tag to initialize the specific filter. With this hazard filter equipped, Conditional Speculation blocks only the marked conditional branch, and other conditional branches that are not marked are still allowed to execute normally. In summary, SMoTherSpectre can be defended by our extended Conditional Speculation effectively.

Extend Conditional Speculation Beyond Spectre
Naturally, there exists security dependence between the instruction which triggers the transient execution and the instruction which may leak secrets by any side channel. In addition to the two scenarios mentioned, security dependence covers both Meltdown-type and MDS-type attacks, as listed in Table 7. In Meltdown [2] and L1TF [34] attacks, a memory instruction (instruction i) that does not resolve the permission check speculatively access sensitive data, after which a memory instruction (instruction j) may disclose sensitive information through side channels. Furthermore, aggressive forward from stale data is a kind of speculation as well in MDS attacks, such as ZombieLoad [35], RIDL [36] and Fallout [37]. In this scenario, the instruction which gets the forwarded data acts as the instruction i, and the subsequent instruction acts as the instruction j. Conditional Speculation is a defense strategy established on security dependence. Therefore, this defense mechanism can extend to these scenarios. Take ZombieLoad as an example, it exploits the aggressive forwarding mechanism from the line fill buffer in Intel processors, where the processor forward data to subsequent loads even when the located page has been released. As a result, stale data may be forwarded to the subsequent loads. Security dependence can be established between the load which accepts forwarded data and subsequent memory accesses. A Security Hazard Detection can be deployed in line fill buffer to mark the forwarded loads as suspect instructions. Because this forwarded data may convey secrets, TPBuf Filter can be applied to detect S-Pattern between this operation and the first subsequent loads.

RELATED WORK
Many works are proposed to defend against Spectre attacks and they can be classified into software-based and hardware-based mitigation.
Software-Based Mitigations. Serializing instructions, such as LFENCE, can be inserted into critical sensitive gadgets by manually or compiler to mitigate V1 [40], [41]. In terms of V2, Intel has provided many microcode updates, such as IBRS and IBPB to avoid malicious branch training across protection domains [9]. As for V4, SSBD stalls speculative loads before calculating the addresses of older stores. Retpoline, proposed by Google, transfers indirect branches and jumps to secure instruction sequences to stall aggressive memory accesses [13]. Some researchers propose that a software developer indicates the branches capable of leaking information and the processor avoids predicting them for protection [42]. It is noted that software-based mitigation need code modification and recompilation. Furthermore, they are possible to be bypassed. For example, LFENCE defense can be bypassed by Spectre V1.1 and V1.2 [3], [20].
Hardware-Based Mitigations. Existing defense mechanisms can be grouped into the four categories: 1 Prevent malicious training. BRB [43] allocates separate branch predictor tables for different process to defend against training across process. SPECCFI [44] embeds Control Flow Integrity (CFI) principles into the branch prediction decisions to constrain dangerous speculation. These works are primarily targeted at specific Spectre variants. In comparison, Conditional Speculation is a more general defense framework. Typical examples are InvisiSpec [17] and SafeSpec [45]. They employ dedicated buffers to hold speculatively refilled data temporarily and update architectural caches when they are predicted correctly. Instead of buffering, Cleanup-Spec [46] allows caches to refill speculative accesses and then undo the changes of cache-states if they were on an incorrect execution path. Besides, DAWG [47] provides a strict cache partition for different security domains to prevent leaving traces of speculative execution on caches across security domains. These works primarily aim at addressing Spectre attacks based on cache side channels. 3 Restrict the speculative execution of memory access instructions. Second category conservatively considers all speculative accesses as suspicious, but actually only speculative instructions which access secrets-dependent data and change the cache states are malicious. Instead of pessimistic blocking, SpectreGuard [48] identifies confidential instructions by software (OS/library API) and Context-Sensitive Fencing [49] exploits taint tracking to identify potentially unsafe execution patterns. As a hardware solution with software-transparent, Conditional Speculation uses Hazard Filter to identify safe accesses. And inspired by InvisiSpec, the remaining few unsafe accesses that previously should have been blocked are allowed to execute speculatively. Similar to the Cache-hit Filter, Selective Delay [50] also considers speculative accesses that hit in L1 DCache as safe. The difference is that it employs a value prediction mechanism to reduce the performance overhead caused by blocking unsafe accesses. 4 Block the propagation of secrets. The works of the category two and three primarily mitigate Spectre attacks based on cache side channels. researchers pointed out that transient execution attacks require transferring secrets from the speculative domain into the architectural states of the processor no matter what kind of side channel is used to disclose sensitive information. Thus NDA [51], SpecShield [52], and STT [53] are proposed to break this requirement. These mechanisms assume that secrets need to be propagated speculatively at least once before information is leaked to a side channel. They prevent the use of potential secrets from suspicious accesses by downstream instructions and can mitigate information leakage through different side channels. Though current Conditional Speculation are designed to defeat cache-based Spectre attacks, new filters can be deployed to defend against other side channels. And Conditional Speculation can also introduce a secret data dependence matrix to track the propagation of secrets, further filtering out safe instructions that are independent on secrets and allowing them to execute normally.

CONCLUSION
An effective and software transparent hardware-assisted framework, named as Conditional Speculation, is proposed to mitigate Spectre variants based on cache side channel. This framework consists of three components: Security Hazard Detection, Security Hazard Filter and Security Hazard Response. As for the Security Hazard Detection, a bit-matrix is introduced in the Issue Queue to identify suspected memory instructions with security dependence. Then, Security Hazard Filters are deployed to pick out safe instructions which do not leave observable traces on cache side channels and allows them to execute normally. In this paper, we investigate two Security Hazard Filters: Cache-hit Filter aims at the safe instructions which hit the cache because they will not change cache (content). TPBuf Filter identifies safe instructions from a common feature of variants based on shared memory, named as S-Pattern. Cache-hit Filter succeeds to defense against all Spectre variants based on cache side channel. With a compromise of security, TPBuf Filter aims at the dangerous and widely-used attacks based on shared pages and gets a better performance. Security Hazard Response is employed to handle unsafe instructions determined by the first two components. Blocking unsafe execution is an easyto-implement method, but sacrifices the performance. SPBuf mechanism supports retrieving data to the pipeline without changing cache contents, and eliminates the pipeline stalls of subsequent execution. Specifically, we find that it is unnecessary to deploy SPBuf in private caches with Hazard Filter identifying most of safe instructions, and deploying SPBuf in last level cache is efficient and less complex. " For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/csdl.