To Cache or Not to Cache

: Unlike conventional CPU caches, non-datapath caches , such as host-side flash caches which are extensively used as storage caches, have distinct requirements. While every cache miss results in a cache update in a conventional cache, non-datapath caches allow for the flexibility of selective caching , i.e., the option of not having to update the cache on each miss. We propose a new, generalized, bimodal caching algorithm, Fear Of Missing Out ( FOMO ), for managing non-datapath caches. Being generalized has the benefit of allowing any datapath cache replacement policy, such as LRU, ARC, or LIRS, to be augmented by FOMO to make these datapath caching algorithms better suited for non-datapath caches. Operating in two states, FOMO is selective—it selectively disables cache insertion and replacement depending on the learned behavior of the workload. FOMO is lightweight and tracks inexpensive metrics in order to identify these workload behaviors effectively. FOMO is evaluated using three different cache replacement policies against the current state-of-the-art non-datapath caching algorithms, using five different storage system workload repositories (totaling 176 workloads) for six different cache size configurations, each sized as a percentage of each workload’s footprint. Our extensive experimental analysis reveals that FOMO can improve upon other non-datapath caching algorithms across a range of production storage workloads, while also reducing the write rate.


Introduction
Conventional caching algorithms (e.g., LRU [1], Clock [2], FIFO [1], ARC [3], MQ [4], LIRS [5], etc.) were designed for datapath caches.These datapath caches are defined by their requirement whereby every cache miss requires that a cache insertion occurs and, should that cache be full, a cache eviction is also incurred.While this is suitable for CPU and DRAM caches, where the write endurance cost of each insertion is relatively inexpensive, it is not true for flash or persistent memory SSD caches that are typically used to cache storage system data.
When using host-side SSD caches [6][7][8], data can be served to the application directly from back-end storage without having to first retrieve them into the cache.This opens up a unique opportunity: selective updates, whereby cache updates are not always made upon a cache miss but are instead decided on every access.Caches that allow for this flexibility are non-datapath caches.Non-datapath caching algorithms with selective updates, can improve the lifetime of cache devices that wear out on account of device-level writes [9,10].These algorithms, also have the ability to improve cache hit rate by avoiding cache replacement in cases wherein the evicted item is considered more valuable than the inserted item [11,12].
Prior research has demonstrated that state-of-the-art caching algorithms such as ARC can compromise both the cache hit rate as well as the cache write rate when applied to nondatapath caches [13,14].Santana et al. [13] demonstrated that workloads are heterogeneous and manifest different characteristics in different phases of their execution.They proposed mARC [13], a non-datapath caching algorithm that responds to workload phases by either turning on or off a filter that prevents items from entering an ARC-managed cache upon first access.However, it is not clear whether mARC would generalize to cache replacement policies other than ARC (e.g., LIRS [5]), some of which have been found valuable in addressing the wide array of storage workload types in production.Furthermore, mARC relied on a set of workload-sensitive constants (totaling eight in all) that defined cache behavior; we demonstrate that this compromises mARC's ability to adapt to the variations within and across workloads.
In this paper, we propose Fear of Missing Out (FOMO), a generalized non-datapath admission policy that can be used to augment any datapath cache replacement policy.FOMO augments datapath cache replacement policies to become better suited for nondatapath caches.FOMO exists in one of two possible states at any given time-Insert or Filter.When in the Insert state, FOMO forwards all cache requests to the underlying cache replacement policy while observing the behaviors of both the cache and the workload.When in the Filter state, FOMO selectively disables cache updates to improve both cache update rate (cache writes) and the cache hit rate by preventing cache pollution.Cache pollution is the insertion of items into the cache whose ultimate value is less than that of the item evicted from the cache, hence why the cache is polluted with worse items.
FOMO decides which state to be in by utilizing information about accesses to items that were in the cache or were recent cache misses.To determine the reuse of recent cache misses, FOMO maintains a Miss-History structure that keeps track of these items and their reuse.Using small periods of observation to come to these decisions, FOMO is capable of reacting rapidly to changes in workload behavior.This reaction speed is important for FOMO.FOMO does not want to miss out on inserting items into the cache for future hits nor does it want to miss out on preserving the items in the cache or preventing items lacking reuse from entering into the cache.Therefore, just as someone wishing to keep up with the latest trends, FOMO does not want to miss out on reacting to the latest workload behavior.
FOMO's contribution is the development and evaluation of novel cache admission policies and not in the latency and/or throughput optimization of the I/O access stream to non-datapath caches as has been considered in complementary, recent work [15].Consequently, we analyze and evaluate FOMO using three different eviction policies (LRU, ARC, and LIRS) against the current state-of-the-art non-datapath caching algorithms on a workload collection comprising of 176 workloads sourced from five different storage system workload repositories.For each workload, we evaluate against six different cache size configurations, each sized as a percentage of each workload's footprint, defined as the set of unique items accessed by the workload.FOMO improves upon state-of-the-art non-datapath caching algorithms, achieving improved hit rate consistency while also reducing cache writes significantly.The use of a Miss-History gives FOMO a more acute understanding of the workload behavior, compared to mARC, which depends exclusively on cache hit rate to make its decisions.When compared against state-of-the-art cache admission policies such as LARC, mARC, and TinyLFU, FOMO is able to see an average improvement of 2.64%, 1.16%, and 27.45%, respectively, while still reducing the writes made to the cache to levels similar to that of LARC.

Background and Motivation
Popular storage caching algorithms such as LRU, LFU, LIRS, ARC, and S3FIFO [16] attempt to cache items considered important based on different metrics such as recency, frequency, and reuse distance.However, they are all datapath caching algorithms and therefore always make cache updates on a cache miss.When these datapath cache replacement algorithms are used on non-datapath caches, such as host-side (flash or persistent memory) SSD caches, the amount of cache updates incurred wears out the cache at an alarming rate.Selective cache updates, on the other hand, allow non-datapath caching algorithms to improve the lifetime of cache devices, and improve cache hit rates by avoiding cache replacement in cases wherein the evicted item is considered more valuable than the inserted item.

Non-Datapath Caching Algorithms
Non-datapath caching algorithms have received attention in recent years, starting with the LARC [14] work.LARC is a static policy that admits items into the cache only upon their second miss.Since the LARC filter first accesses absolutely, it can prevent important items from entering the cache in a timely fashion.mARC, on the other hand, employs a dynamic policy and learns about workload states to allow updates or not into the cache.It uses a three state machine with unstable, stable, and unique access states, and seven distinct state transition conditions to learn the workload's state and adapt admission into the cache accordingly.mARC builds its mechanisms upon the classic ARC [3] datapath cache replacement algorithm.However, as Figure 1 illustrates, mARC's cache hit-rate compared to its datapath counterpart algorithm can at times be over 20% lower when such sub-optimal state transition or a delayed reaction to state change is encountered.Finally, TinyLFU [17] is a CDN cache admission policy that improves performance by keeping the items with the highest frequency within the cache.Presented are various moments where mARC is not performing or deciding as well as it could.

The Fear of Missing Out
Classic caching solutions are reactive and this impedes their ability to react to workload behavior changes.Many incorporate the notion of eviction history to evaluate the importance of an item [3,5,18].Their post-mortem evaluation of evictions, while valuable, is a belated reaction to the workload's effect on the cache, compared to a direct reaction to the workload itself.Of LRU, ARC, and LIRS, LIRS reacts the most directly to the workload itself through its focus on reuse distance.
Non-datapath caching algorithms, owing to the available flexibility of not having to perform cache updates, invariably embody a fear of missing out on responding in a timely fashion to changes in workload behavior.Unfortunately, reacting as a consequence of accesses to items that were evicted requires that items must first enter the cache.Furthermore, if the number of cache hits is not increasing, only having information about the evicted items can obscure the reason for low performance with a limited view of the workload.To respond to workload changes rapidly, observing accesses to newly requested items are crucial.In particular, keeping track of recent accesses that resulted in cache misses allows us to understand what the cache is "missing out" on.Here, timely knowledge of workload behavior helps improve the accuracy of the filtering mechanism.We assert that the rate of access to recent cache missed items provides crucial, complementary information about the workload that allows for better understanding of short-term workload behaviors.
FOMO is a general non-datapath admission policy that is capable of improving hit rate and write rate consistency by utilizing a simple model that avoids the issues present within the mARC state model.FOMO's Miss-History captures the reuse of recently missed items and does not require the items to be inserted into the cache first.As we shall show in the next section, the design of FOMO focuses on comparing the cache hit rates and reuse rate of recent cache misses to better understand the general workload behavior.

Design
Non-datapath algorithms such as LARC can be counter-productive to the hit rate as they do not cache an item until it is reused, incurring a compulsory additional cache miss per item.Additionally, when working sets change frequently, the requirement of proof of reuse can significantly impair hit rates.In the case of mARC, its high level of complexity comes not from using three states, but rather its seven, built-in, state transition conditions.However, even with these conditions, mARC still excluded the direct state transitions between the Unique Access and Stable states, thereby imposing unnecessary writes and potentially removing "soon-to-be-hit" items from the cache in exchange for "one-hit wonders".
The following section explains the design of Fear of Missing Out (FOMO) that takes a new approach to storage caching.FOMO learns from our findings with both mARC and LARC, but is itself based on a novel approach involving a simple, two-state design: Insert and Filter.FOMO is designed to be a generic non-datapath admission policy that could augment any datapath cache replacement algorithm and still provide the write reductions and performance that is expected from non-datapath caches.

Miss-History
FOMO's Miss-History is an LRU structure that primarily holds recently missed items.The Miss-History is set to the size of the cache.Whenever FOMO encounters a cache miss, FOMO's Miss-History is updated.Should an item that causes a cache hit also exist in the Miss-History, this item will be removed from the Miss-History.When FOMO is in the Insert state, as shown in Figure 2, the item is inserted into both the Cache and the Miss-History.The Miss-History is straightforward, as any item incurring a cache miss, including those already within the Miss-History, are moved to the MRU part of the Miss-History, removing the LRU item as needed.However, when FOMO is in the Filter state, as shown in Figure 3, FOMO will treat the Miss-History as a method to track the filtered items.When FOMO is in the Filter state, the Miss-History will treat items that incur a cache miss and exist within the Miss-History the same as a cache hit, removing the item from the Miss-History.Despite the differences in their function, FOMO in both states is utilizing the Miss-History to discover items with reuse, with the state as context.Since the Insert state has these items already entering the cache without knowing if they have reuse, the Miss-History is used to verify the reuse of the item and that the cache is seeing this reuse.With the Filter state, only the Miss-History is seeing this reuse, after which it inserts the item into the cache, passing it along to the underlying cache replacement algorithm, believing that it will see further reuse.In the next section, we go into further detail about FOMO's states and how they are determined.

FOMO States
FOMO incorporates a simple, two state design: Insert and Filter.This simplicity eliminates complex state transitions while still encompassing the necessary actions when identifying workload behaviors.
When FOMO is in the Insert state, all requests are passed to the underlying cache replacement algorithm, where it may cache it and choose what to evict, if needed.FOMO always begins in the Insert state to gather information while filling up the cache.
When FOMO is in the Filter state, FOMO decides whether or not a request will be passed to the underlying cache replacement algorithm.How FOMO decides this is similar to LARC: the request exists in the cache or reuse is observed for an item not in the cache.FOMO tracks this observable reuse with the Miss-History.Should FOMO encounter a cache miss that exists within the Miss-History while in the Filter state, it will observe that the item has reuse and pass it on to the cache to insert, as well as remove this item from the Miss-History.This removal from the Miss-History while in the Filter state is desirable mainly to avoid the possibilities of cache churning [13].Cache churning occurs when the working set is a superset of the items in the cache, but the lowest frequency items in the cache and some of those outside the cache are similar.This has meant that the lowest frequency items in the cache are evicted to make way for the similarly (or lower) frequency items outside of the cache, introducing several misses that instead could be hits by protecting the items in the cache from eviction.Figures 2 and 3 show examples of how FOMO operates within these states and how the Miss-History is affected.
With these two states (Insert and Filter), the conditions to transition between them must be defined.We define two conditions that represent each state: HR cache < HR Miss−History and HR cache ≥ HR Miss−History , where HR represents the hit-rate observed by either the cache or the Miss-History.The conditions we define below are a consequence of empirical observations and analysis of algorithm behavior.The transition from Filter to Insert was determined to be advantageous where the reuse in the Miss-History was significant, or greater than the hits to the cache.The transition from Insert to Filter was determined to be advantageous where the reuse of the Miss-History was not significant.In order to calculate both HR cache and HR Miss−History , the last period of requests are observed for hits/reuse, where the period's size is set to 1% of the cache size.This selection was motivated by having FOMO react swiftly to changes and protect the items in the cache.
We noticed that FOMO would at times encounter situations where both HR cache and HR Miss−History were low and the slightest changes encouraged FOMO too strongly to transition from the Filter state to the Insert state.This led to an additional condition being added to the Filter to Insert transition: HR Miss−History > 5%.The addition of this condition prevented this edge-case scenario from improperly using the reuse of a few items in the Miss-History as justification to change state to the Insert state.The final state design can be seen in Figure 4, while the finalized algorithm can be seen in Algorithm 1.

Overheads
FOMO, augmenting the underlying caching algorithm, adds its own overheads on top of the caching algorithm.As such, FOMO is designed to keep its own overheads low while achieving its goals.All of FOMO's operations are achievable in O(1) time complexity.In terms of space overhead, FOMO's only large requirement is an array of around cache size entries large enough to serve as a hash table.Each entry of the array consists of an integer to track the block address and two pointers (next and prev) for its place in the Miss-History.The rest are a few integers and pointers whose space requirements do not scale with the cache size.This includes the period countdown, both hit counters for the cache and the Miss-History, a bit for the current state of FOMO, and the list sentinel for the Miss-History, which itself is composed of two pointers (first and last).
Should FOMO become more integrated with any algorithm, the hash tables may be merged, along with the entries, to possibly reduce the space overhead significantly by eliminating redundancy.Furthermore, some space and time overhead may be removed by altering FOMO to use a CLOCK or FIFO structure for its Miss-History instead of an LRU list.State-of-the-art non-datapath caching algorithms LARC and mARC are compared against FOMO.Additionally, TinyLFU, just like FOMO, is a state-of-the-art admission policy that is also compared against FOMO.Being admission policies, both FOMO and TinyLFU have an underlying cache replacement algorithm manage the cache and making eviction decisions.Both TinyLFU and FOMO use LRU, ARC, and LIRS as underlying cache replacement algorithms to better compare the performance of these admission policies.Since TinyLFU and FOMO are evaluated with LRU, ARC, and LIRS as their underlying cache replacement algorithms, these algorithms are also included to demonstrate FOMO's benefits.TinyLFU uses the majority of the default configurations found in the Caffeine [19] repository with the exception that we explicitly enabled the DoorKeeper and used conservative increments to better align with the TinyLFU paper.TinyLFU's DoorKeeper is sized to 3N expected insertions and its reset period is set to 10N insertions.

Experimental Setup
We built cache simulators for every algorithm to process block I/O workloads and report on the number of reads, writes, cache hits and misses, and total writes incurred.When possible, we used the original authors' version of the algorithm implementation.To simulate sufficient cache space as well as I/O activity, the size of the cache was set to be a fraction of the workload footprint, defined as the total size of all unique data accessed.This fractional cache size was varied from 1% to 20% of each workload's footprint in our simulations (more specifically 1%, 2%, 5%, 10%, 15%, and 20%).

Workloads
To have a large, diverse set of I/O workloads, the workloads used for testing include FIU, MSR Cambridge, CloudCache, CloudVPS, and CloudPhysics block I/O workloads as detailed in Table 1 [20].The FIU, MSR Cambridge, CloudCache, and CloudVPS workloads are all run for their full duration, or for the full length of the workloads, with some requiring the merging of individual days into one large, continuous workload.Tests using Cloud-Physics workloads are limited to the first day in order to reduce the length and workload footprint so that these are runnable on the available resources within a reasonable amount of time.

Metrics
As previously mentioned, the cache simulators collect information about read, writes, cache hits and misses, and total writes incurred.FOMO aims to not only improve upon its underlying cache replacement algorithms, but also be comparatively better than its non-datapath cache algorithm peers and admission policy peers.For the evaluation, metrics include the mean cache hit rate and the mean write rate; greater write rates indicate lower flash cache device lifetime.To evaluate cache performance, the normalized results are compared across workloads and cache sizes to understand the performance of FOMO.Normalized hit rates and write rates are computed for a given workload-cache size combination with respect to the best performing algorithm.This normalization method better shows the general performance of an algorithm among its peers.Furthermore, it provides a fair method of comparison, as a total average hit rate comparison along would hide how well (or poorly) an algorithm was performing compared to others.When comparing write rates, the best performer has the lowest write rate, as this translates to the fewest writes/updates, which is one of the goals of non-datapath caching algorithms.In order to prevent small differences in metrics leading to large normalized differences, the results where the performance used for normalizing was less than 10% were not included.Caches for which the best performing algorithm produces extremely small hit-rates are unlikely to be of practical use in any case.
Each measurable aspect of FOMO is given its own section.First, we discuss how FOMO is able to limit writes to the cache in a manner similar to the state-of-the-art nondatapath caching policies.Second, we discuss FOMO's hit rate performance compared to other non-datapath caching algorithms.Third, we compare the FOMO and TinyLFU admission policies using different cache replacement algorithms.Fourth, we analyzed how FOMO's Miss-History is able to detect patterns of reuse within the workload, using a particular workload for a case study.Finally, we discuss the adversarial workloads of Cloud-Cache and why FOMO and the other non-datapath caching algorithms see performance degradation compared to datapath cache algorithms.

Non-Datapath Appropriate Write Rate
First, we shall establish that the write rate of FOMO is similar to that of leading nondatapath algorithms and admission policies.The lower this measure, the less often a write operation is performed on the cache and the longer the cache's lifespan would then be.
When compared to its peers, FOMO's write rate is primarily aligned with LARC, regardless of the internal cache replacement algorithm.As FOMO attempts to identify more opportunities to insert, we can see this affect the tail of its results, as seen in Figure 5. Ultimately, even with these additional writes, FOMO often writes similarly to LARC and more than TinyLFU.

Consistent Hit Rate Performance
Next, we shall establish that, when augmenting a datapath caching algorithm, FOMO is both able to improve or preserve hit rate performance of the internal cache replacement algorithm.In particular, we establish that, of the two admission policies (FOMO and TinyLFU), FOMO is better able to improve or preserve the hit rate of the internal cache replacement algorithm over TinyLFU by focusing more on the changing of working sets than the preservation of valuable items in the cache.
When we compare FOMO's overall performance, as seen in Figure 6, to other nondatapath algorithms, we can see that its performance is similar to that of mARC and LARC, regardless of the internal cache replacement algorithm.In particular, we can see that FOMO(ARC) has the most consistent performance, having a smaller tail performance degradation compared to other versions of FOMO.TinyLFU, another admission policy, has the most variance in its performance compared to whichever algorithm performed the best.
To simplify our observation of FOMO's performance in comparison to mARC, LARC, and TinyLFU, we focus upon a single version of both FOMO and TinyLFU that utilize ARC as their internal cache replacement algorithm.Figure 7 allows us to observe how well or poorly FOMO does in comparison to each algorithm on a more detailed basis.We can see that, when compared to LARC, FOMO(ARC) performs well on the MSR, CloudCache, and CloudPhysics workloads.Furthermore, LARC is able to show minor improvement over FOMO(ARC) in the FIU workload, while being mostly similar in the CloudVPS workload.When comparing FOMO(ARC) to mARC, FOMO(ARC) performs well on the FIU, MSR, and CloudVPS workloads.mARC is able to have both better and worse performance for the CloudCache workload and similar performance for the CloudPhysics workload.Lastly, when comparing FOMO(ARC) to TinyLFU(ARC), FOMO(ARC) has consistently better hit rate performance, due to the strong protection of the cache afforded by TinyLFU's histogram.Here is a quantitative summary FOMO(ARC)'s percentage improvement/degradation relative to each algorithm across all workloads.Against LARC, FOMO(ARC) experiences on average an improvement of 2.64%, a max degradation of 20.88%, and a max improvement of 145.66%.Against mARC, FOMO(ARC) experiences on average an improvement of 1.16%, a max degradation of 65.8%, and a max improvement of 84.32%.Against TinyLFU(ARC), FOMO(ARC) experiences on average an improvement of 27.45%, a max degradation of 27.71%, and a max improvement of 366.10%.Normalized hit rate summary results in percentage difference from the best result.This includes results for all five different workload sources (FIU, MSR, CloudCache, CloudVPS, and Cloud-Physics) measured for six different cache size configurations as a percentage of the workload footprint for each algorithm.TinyLFU has the overall worst performance when observing its performance across traces and cache sizes, regardless of internal cache replacement policy.mARC, LARC, and FOMO have similar hit rate performance characteristics when observing their performance as whole, but show their individual strengths and weaknesses when looking at a breakdown of the results, as seen in Figure 7.

Figure 7. FOMO(ARC)
hit-rate comparison.The X-axis varies the fractional cache size.For the FIU workloads, FOMO(ARC) generally performs similarly to LARC, with LARC's comparative performance improving with larger cache sizes.For the MSR workloads, FOMO(ARC) shows a strong performance against LARC, mARC, and TinyLFU(ARC).For the CloudCache workloads, FOMO(ARC) performs better than LARC and TinyLFU(ARC), but similarly to mARC, with variance toward better and poorer performance across cache sizes.For the CloudVPS workloads, FOMO(ARC) performs similarly to LARC and better than both mARC and TinyLFU(ARC).For the CloudPhysics workloads, FOMO(ARC) performs similarly to mARC, with some cases wherein each performs significantly better or worse, and better than LARC and TinyLFU(ARC).

Admission Policy Hit Rate Performance
We have analyzed and compared the hit rate performances of the admission policies FOMO and TinyLFU, but primarily for each augmenting ARC.We observe FOMO and TinyLFU's performance in different workloads in comparison to the lone internal cache replacement algorithm (as seen in Figure 8).FOMO is seen to consistently improve the performance of LRU and ARC, except in CloudCache.For FOMO(LIRS), it is more appropriate to consider that FOMO is able to have similar performance to LIRS, if not slightly worse, for most cases, with the exception of CloudCache.TinyLFU is consistently introducing performance degradation when compared to the lone cache replacement policy, with the exception of LRU in CloudVPS.
The CloudCache workload has the admission policies perform poorly.Of the two, we can see that FOMO is able to limit the degree of performance degradation it introduces considerably better than TinyLFU does.This appears to be primarily due to CloudCache's workloads having often changing behaviors whose benefits are primarily immediate to their appearance, leaving little benefit to filtering.The rate of change affected both FOMO and TinyLFU, but FOMO was better able to adapt rapidly and identify opportunities for hits that would otherwise incur an additional miss in the near future.Hit rate results for ARC, LRU, and LIRS, along with their FOMO and TinyLFU augmented counterparts, with their performances normalized to the best performing.Overall, we can see that FOMO is either able to improve the internal cache replacement algorithm, or at worst, as seen in CloudCache, preserve the internal's performance as much as possible.When observing how TinyLFU fared by comparison, we see it primarily performing worse to much worse than its internal cache replacement's lone performance.Such an example is observed with the CloudCache workload, but can be seen as well in LIRS and TinyLFU(LIRS)'s performance in FIU.

FOMO's Miss-History: A Case Study
FOMO's hit rate and write rate results show the benefit of FOMO's design.However, the structure central to FOMO's understanding of the workload: the Miss-History has not been analyzed for effectiveness.We present a case where the Miss-History reveals a pattern that FOMO takes advantage of, but neither LARC nor mARC do-the first 60 million requests of the CloudPhysics workload w54_vscsi2.itrace with a cache size of 1% of the number of unique addresses requested.Some of the previously noted shortcomings of mARC are present within Figure 9.These particular shortcomings are the lack of a direct transition between the Stable and the Unique Access states, incurring extra writes to the cache, and mARC's analysis of the workload using cache hit rate, which prevents it from noticing opportunities for hits.⃝, even though plenty of opportunities for cache hits exist, as can be seen in Figure 9b.(bottom) The cache hit rates of LRU and the reuse rates of FOMO's Miss-History.The caching algorithm that FOMO augments does not matter here, as the focus is on the Miss-History.To be able to see both hit rates more clearly, the hit rate of the Miss-History has been mirrored over the horizontal axis (negated).This, compounded with the hit rates of LARC and mARC, show both that LRU is capable of having hits and that FOMO's Miss-History is seeing the same hits.Taken together, these plots demonstrate that the workload is mostly composed of items limited to second accesses.As such, both LARC and mARC (when acting similar to LARC) are only caching items when they have the second access, but do not receive any benefit in doing so.
What is interesting about this workload is that the majority of repeated accesses are limited to the second access of an item, with very few third or fourth accesses within a reasonable time frame.This can be seen in Figure 9, where the LRU hit rate mostly mirrors that of the Miss-History, which, in general, removes items from its structure on the item's second access.
FOMO is capable of finding patterns within the workload, and makes reasonable decisions based on these patterns, which can be seen in Figure 10.FOMO's decisions here capture many opportunities for hits that go unrecognized by both LARC and mARC, whose hit rate (and mARC's states) can be seen in Figure 9.This is unfortunate for mARC, which places emphasis on identifying workload states.mARC, due to its focus on using cache hit rate to identify workload state instead of something similar to FOMO's Miss-History, cannot see the pattern within this workload, leaving mARC to stay in the Unique Access state.⃝ FOMO(ARC) is able to recognize a pattern of reuse and is able to promptly respond and have many cache hits that LARC and mARC instead miss.

Adversarial Workloads
FOMO improves upon the results of the underlying cache replacement algorithm for every workload and cache size, as seen in Figure 7, with the exception of CloudCache.In fact, CloudCache workloads did not appear to have any of the non-datapath caches perform well compared to their datapath caching algorithm counterparts.When investigating the reason for this behavior, several things of note were observed, which will be highlighted with a focused discussion on one of these CloudCache workloads: webserver-2012-11-22-1.blk.
Webserver-2012-11-22-1.blk has an instance that shows a large time frame where non-datapath caching algorithms do not perform well compared to the datapath cache algorithms, shown in Figure 11.In particular, the time period between one million and four million requests has this behavior.When observing the block address access pattern plots of this CloudCache workload (Figure 12), it is noticeable that the workloads include several concurrent working sets and patterns (combinations of scans, random accesses, looping and repeated accesses).This mixing of patterns increases the likelihood of both FOMO and mARC observing reuse and discerning patterns in the workload based on them.This is why, as the cache size increases, the difference between the datapath and non-datapath caches begins to decrease.
Each of the non-datapath cache algorithms have their own reasons for why these access patterns were problematic.For LARC, which observes reuse on an individual basis, an item being reused often would not be reused again prior to being evicted, leading to many missed opportunities that the datapath cache algorithms can take advantage of.mARC, with its dependence on cache hit rate for decisions, with its use of the Unstable state as a intermediary transition between Stable and Unique Access states, and the long evaluation times, finds itself within the Unstable state more often and gains some level of advantage over both LARC and FOMO due to it.Lastly, FOMO, looking for patterns in the workload, would periodically find a pattern of reuse, change the state to Insert, obtain some cache hits that eventually overtake the reuse found in the workload, change state to Filter and so on repeatedly during such highly overlapping periods.Finally, we note that when particular patterns, or reuse in general, were more significant within these workloads, all of the non-datapath cache algorithms would identify and react to achieve cache hits.blk that highlight the various patterns that could be seen simulataneously during the time period where non-datapath caching algorithms had poorer hit rate compared to their datapath counterparts.Among them we can notice scans, random accesses, and loops all occurring concurrently.These concurrent behaviors are why the non-datapath caching algorithms did not perform well.

Related Work
There has been extensive work on non-datapath caching algorithms with the goal of reducing the number of updates (i.e., writes) to a limited lifetime cache device.One line of work lies in deduplication-based filtering adaptation of conventional datapath caching algorithms.The techniques in this category address both read and write I/Os.Approaches to reduce read I/O traffic to the cache and thereby improve cache performance involve maintaining an in-memory content-addressed index of content.Prior work on eliminating I/O to content that is already available in memory falls into this category [22,26,27]; while these works targeted data in primary storage, the techniques translate fairly easily to data stored in a non-datapath cache.Approaches to reduce the number of cache writes work by filtering updates to content that may already exist in the cache device, regardless of its location.Examples of these solutions are D-LRU and D-ARC [28], where hits in metadata items with the same associated fingerprint as a cached item are considered hits in cache.These deduplication-based filtering techniques virtually increase the size of the cache and/or eliminate I/Os to cache entirely, thereby improving overall I/O performance.

Non-Datapath Caches
LARC [14] is a non-datapath caching algorithm that focuses on reducing forced updates to a non-datapath cache.At a high-level, LARC prevents inserting items not found in the cache in case they have not been accessed sufficiently recently.LARC consists of two LRU lists, one for cached items and a second first-in-first-out filter list for tracking non-cached items that have been accessed recently.A cache update is only performed when an item that is not found in the cache is found in the filter list.While LARC's filtering approach is straightforward, its filter is always operational and, as a result, can prevent important items from entering the cache in a timely fashion.In particular, LARC populates the cache at least twice as slowly as most other algorithms when workload working sets change.
mARC [13] is a selective caching algorithm that improves performance and endurance by using cache hit rate as a metric to selectively turn on/off cache insertions.To make this possible, mARC defines three distinct workload states: unstable, stable, and unique access and reflects these states within the mARC state model.mARC bases its state detection mechanisms by observing a single system metric, the cache hit rate.While mARC is able to utilize its knowledge about workload states to allow updates or not in the cache, it has important limitations.First, since it does not account for the stable → unique (and vice-versa) state transitions, it is unable to handle certain workloads.Second, mARC fixes the values of various parameters that compromises its ability to adapt and recognize workload states for arbitrary workloads.Thirdly, mARC relies on cache-hits in order to understand the workload behavior, which leaves mARC not analyzing the workload itself, but rather the effects of the workload on the cache.This method of analysis is slow, requiring the item to first be inserted into the cache.
FlashTier [8] is a system architecture built for solid-state caches.FlashTier aims to address the main limitations of caching when using traditional SSDs, providing an interface designed specifically for caching.The limitations FlashTier focuses on relate to reducing the cost of cache block management, providing cache consistency guarantees, and silent evictions of blocks during garbage collection for performance gains.While focused on improving non-datapath caches, FlashTier itself used Facebook's FlashCache for Linux [29] for evaluation, as FlashTier is not a cache algorithm itself.
Least Hit Density (LHD) [30] uses a calculation of hit density to determine what to filter and evict from a cache.Furthermore, hit density is a predicted value based on age, frequency, application ID, and size.Additionally, the implementation utilizes several techniques to improve throughput, such as random sampling of the cache for eviction and reducing lock contention by reducing the number of updates to the data structure.LHD relies on precomputed ranks that then adapt after a million requests, working under the intention that application behavior is stable for short periods of time.FOMO assumes there to both be more dynamism within the workload behavior while working in a context where objects are of a constant size.

Admission Policies
TinyLFU: TinyLFU [17] is a CDN cache admission policy that improves performance by keeping the items with the highest frequency within the cache.TinyLFU's motivation comes from how near-optimal LFU's hit rate can be on a static workload.However, LFU does not handle working set changes in the workload very well.TinyLFU addresses these issues by introducing probabilistic tracking of item frequency using efficient data structures (bloom filter and counting filter).To address the issue of working set changes in the workload, TinyLFU has a reset mechanism that triggers after a given number of insertions to its filter.
CacheSack [31] is a coarse-grained and static admission policy that performs cache admission at the workload granularity when multiple workloads share the cache.It assigns one of four types of static admission policies at the workload granularity: admit-on-writemiss, AdmitOnWrite, AdmitOnMiss, AdmitOnSecondMiss (LARC), and NeverAdmit.FOMO is distinct from CacheSack in two respects.First, FOMO is fine-grained.It operates at the workload phase granularity where it detects multiple distinct behaviors within a single workload and adjusts admissions accordingly.Second, FOMO is a dynamic cache admission policy.As demonstrated in this work, individual workloads go through multiple phases defined by varying reuse patterns; a static admission policy is unable to react to such changes.

Write Optimizing Caches
Another approach is Clean-First LRU (CFLRU) [32], a cache replacement algorithm that reduces the number of writes to flash memory by evicting clean pages first that belong to certain region.After evicting all the clean pages of the region, the algorithms choose the least dirty page available.CFLRU depends on the size of the clean-first region and also relies on recency just as LRU does.CFLRU prioritizes optimizing for minimizing dirty evictions and flash overwrites over maximizing cache hit rate.FOMO was designed with a different set of goals.It attempts to primarily maximize cache hit rate with the secondary goal of minimizing cache updates.

Offline Optimality
Another line of work focuses on understanding the limits of the class of non-datapath caching algorithms by investigating their offline optimality properties.In particular, the question of optimizing for two metrics, the cache hit rate and the cache write rate, were investigated; formulations that prioritize cache hit rate over cache write rate were argued as more reasonable on the grounds that you could trivially optimize for cache write rate by simply disallowing all cache insertions to the significant detriment of cache hit rate.M [11] is an offline datapath caching algorithm that focuses on reaching the optimal number of writes needed to achieve maximum performance.M is a version of Beladys' MIN algorithm that does not write into the cache items that are never accessed again in the future or that will be overwritten.On the other hand, Container-based (C) [11] uses similar strategy with a granularity of a container instead of blocks.In this paper, we consider the online version of the problem in an attempt to create practically usable solutions that do not rely on future knowledge.

Caches for CDNs
In the context of Content Delivery Networks (CDNs) caching, a recent approach proposes a machine learning framework LFO (Learn from OPT) [33] to learn the best decision by running an OPT approximation [34] over recent requests.LFO extracts multiple features to later use them in the process of deciding whether to cache or not incoming requests.LFO learns with an OPT approximation to learn and mirror OPT's decisions, the result is expected to maximize the number of hits with the minimum number of writes.In practice, LFO achieves competitive performance, but incurs overhead to keep track of all the online features.
As LFO learned from OPT for decision making within CDN caches, further research has focused on the application of Belady's MIN for such caches.The Belatedly algorithm [35] is one such algorithm that notes the lack of design consideration for recognizing the delay incurred when inserting an item into the cache, not only caching algorithms but also within Belady's MIN itself.Belatedly recognizes this along with the effect on latency imposed by this delay, especially when considering cache hits that must wait for the item to first enter the cache before returning.From Belatedly's insights, a method of evaluating items was implemented called Minimum-Aggregate Delay (MAD), which, when applied to existing algorithms, provided latency improvements like that found with Belatedly.Another is the Learning Relaxed Belady (LRB) algorithm [36], which approximates Belady's MIN by incorporating the concept of a Belady boundary to better have a machine learning caching prototype with limited memory overhead and lightweight training and predictions.Finally, machine learning techniques have been used to learn admission policies for CDNs .SLAP [37] uses a Long-Short-Term-Memory (LSTM) model and admits objects that will be reused (before eviction) given the current cache size.In contrast to these, FOMO focuses on caching for block storage workloads and using heuristics without the need to train any machine learning components to the algorithm to achieve its benefits with a lightweight design.Furthermore, FOMO aims for hits with a reduction on writes and as such does not consider the latency performance.
AdaptSize [38] is a non-datapath caching algorithm that probabilistically determines whether to admit an item of a given size into the cache.The strength of AdaptSize is how closely it performs compared to SIZE-OPT, an approximation of optimal from observing the next million requests.AdaptSize was designed primarily for CDNs and workloads where objects of various sizes are encountered.Due to the nature of AdaptSize to rank based on size and recency, when all items have the same size, then AdaptSize will only operate on a recency basis.Furthermore, as its calculated optimal size admission parameter (c) is recalculated after a period as part of a probability calculation that decreases with object size.However, should all objects have the same size, then these objects will all have the same probability of admission during that period.Comparatively, FOMO recognizes that workload behavior can change within an evaluation period and prefers items with reuse rather than small items with recency, though without various sizes the benefits of AdaptSize do not appear to be present.

Conclusions
Conventional caches are datapath caches due to each accessed item requiring that it must first reside in the cache before being consumed by the application.Non-datapath caches, being free of this requirement, demand entirely new techniques.Current nondatapath caches typically represented by flash-or NVM-based SSDs additionally have a limited number of write cycles, motivating cache management strategies that minimize cache updates.We developed FOMO, a non-datapath cache admission policy that operates in two states: insert and filter, allowing cache insertions in the former state, and only selectively enabling insertions in the latter.Using two simple and inexpensive metrics, the cache hit rate and the reuse rate of missed items, FOMO makes judicious decisions about cache admission dynamically.
FOMO was algorithmically analyzed and comparatively evaluated against state-ofthe-art datapath (LRU, ARC, LIRS) and non-datapath (LARC, mARC) caching algorithms and the TinyLFU cache admission policy.FOMO improves upon other non-datapath caching algorithms across a range of production storage workloads, while also reducing the write rate.
FOMO is able to better take advantage of moments where inserting into the cache is beneficial to the hit rate compared to LARC and TinyLFU.Compared to mARC, FOMO is better able to reduce writes and provides better performance overall.FOMO, by allowing for more writes and taking calculated risks by caching new items based on the surrounding behavior is able to provide better performance in workloads with changing working sets compared to LARC and TinyLFU.This reinforces the design decision that led to FOMO mitigating the problems of both LARC and mARC, strengthening the reasonability of FOMO's state decisions.The effectiveness of FOMO (and other non-datapath caching algorithms) relies heavily on the observability of reuse.For workloads that do not have a dominant behavior, as with CloudCache, FOMO is able to react quickly to any observed reuse patterns.It pairs observations from cache reuse with observations of reuse in FOMO's Miss-History to derive an excellent understanding of the workload behavior.Our implementation of FOMO is available publicly (https://github.com/sylab/fomo,accessed on 3 July 2024).

Figure 1 .
Figure 1.Performance comparison using mARC and ARC over time for MSR workload Proj3.Presented are various moments where mARC is not performing or deciding as well as it could.

Figure 2 .
Figure 2. Example of FOMO working in Insert state handling the request stream: X, Y, X.(a) Shows the starting state of the cache and Miss History.(b) Has X missing and being inserted into both the cache and the Miss History.(c) Has Y missing and being inserted into both the cache and the Miss History.(d) Has X hitting in the cache and being removed from the Miss History.

Figure 3 .
Figure 3. Example of FOMO working in Filter state handling the request stream: X, Y, X.(a) Shows the starting state of the cache and Miss History.(b) Has X missing and being inserted only in the Miss History.(c) Has Y missing and being inserted only in the Miss History.(d) Has X miss in the cache, but hit in the Miss History.Due to X showing reuse, it is then inserted into the cache and removed from the Miss History.

Figure 5 .
Figure 5. Normalized write rate summary results in percentage difference from the best result.This includes results for all five different workload sources (FIU, MSR, CloudCache, CloudVPS, and CloudPhysics) measured for six different cache size configurations as a percentage of the workload footprint for each algorithm.TinyLFU has the fewest cache updates, regardless of the internal cache replacement algorithm.mARC aimed to write more for performance gains, but incurs a great deal more cache updates than that of LARC and FOMO.Notably, LARC and FOMO have overall similar write rates, showing that FOMO is capable of reducing the number of cache updates of its underlying cache replacement algorithm to levels similar to that of the non-datapath caching algorithm LARC.

Figure 6 .
Figure 6.Normalized hit rate summary results in percentage difference from the best result.This includes results for all five different workload sources (FIU, MSR, CloudCache, CloudVPS, and Cloud-Physics) measured for six different cache size configurations as a percentage of the workload footprint for each algorithm.TinyLFU has the overall worst performance when observing its performance across traces and cache sizes, regardless of internal cache replacement policy.mARC, LARC, and FOMO have similar hit rate performance characteristics when observing their performance as whole, but show their individual strengths and weaknesses when looking at a breakdown of the results, as seen in Figure7.

Figure 8 .
Figure 8. Hit rate results for ARC, LRU, and LIRS, along with their FOMO and TinyLFU augmented counterparts, with their performances normalized to the best performing.Overall, we can see that FOMO is either able to improve the internal cache replacement algorithm, or at worst, as seen in CloudCache, preserve the internal's performance as much as possible.When observing how TinyLFU fared by comparison, we see it primarily performing worse to much worse than its internal cache replacement's lone performance.Such an example is observed with the CloudCache workload, but can be seen as well in LIRS and TinyLFU(LIRS)'s performance in FIU.

Figure 9 .
Figure 9. (top)The hit rates of both LARC and mARC for the CloudPhysics workload w54_vscsi2.itrace, in which LARC and mARC do not perform well.The background has colors that indicate the state of mARC.LARC stops seeing a significant amount of hits after it fills the cache.mARC stops seeing a significant amount of hits after it switches to the Stable state A ⃝. mARC incurs unnecessary writes by transitioning to the Unstable state before transitioning to the Unique Access state B ⃝. mARC does not see any cache hit rate activity and therefore cannot find a reason to change state C ⃝, even though plenty of opportunities for cache hits exist, as can be seen in Figure9b.(bottom) The cache hit rates of LRU and the reuse rates of FOMO's Miss-History.The caching algorithm that FOMO augments does not matter here, as the focus is on the Miss-History.To be able to see both hit rates more clearly, the hit rate of the Miss-History has been mirrored over the horizontal axis (negated).This, compounded with the hit rates of LARC and mARC, show both that LRU is capable of having hits and that FOMO's Miss-History is seeing the same hits.Taken together, these plots demonstrate that the workload is mostly composed of items limited to second accesses.As such, both LARC and mARC (when acting similar to LARC) are only caching items when they have the second access, but do not receive any benefit in doing so.

Figure 10 .
Figure 10.The hit rates of LARC, mARC, and FOMO(ARC) for the CloudPhysics w54_vscsi2.itrace.FOMO(ARC) is shown as it performs the worst among FOMO(LRU) and FOMO(ARC) in this instance.The background is colored with the state of FOMO, where red = Filter and blue = Insert.As FOMO state switches, it adapts to the workload for the chance to improve the hit rate and is able to achieve much more than both LARC and mARC due to FOMO's Miss-History.A ⃝ FOMO(ARC) started filtering prior to mARC, missing some hits.B ⃝ FOMO(ARC)'s changing states captures some opportunities for hits by switching to the Insert state rapidly.C⃝ FOMO(ARC) is able to recognize a pattern of reuse and is able to promptly respond and have many cache hits that LARC and mARC instead miss.

Figure 11 .
Figure 11.Hit rate plot of CloudCache workload webserver-2012-11-22-1.blk, focusing on ARC as the datapath caching algorithm to compare the performance of the non-datapath caching algorithm to (LARC, mARC, and FOMO(LRU)).The background colors correspond to the state of FOMO(LRU) at the time, with blue = Insert and red = Filter.From around one million to four million requests ARC is achieving hits that LARC, mARC, and FOMO(LRU) are not able to obtain, though FOMO(LRU) obtains the most among the non-datapath caching algorithms, as seen in A ⃝.Even as reuse ramps up at B⃝, FOMO(LRU) is able to achieve many more cache hits compared to LARC and mARC, while performing close to ARC.Afterwards, the algorithms perform similarly.

Figure 12 .
Figure 12.Address plots for CloudCache workload webserver-2012-11-22-1.blk that highlight the various patterns that could be seen simulataneously during the time period where non-datapath caching algorithms had poorer hit rate compared to their datapath counterparts.Among them we can notice scans, random accesses, and loops all occurring concurrently.These concurrent behaviors are why the non-datapath caching algorithms did not perform well.

Table 1 .
Sources and descriptions for the five storage datasets used in this paper.Due to the size of many of the traces within this set, only the first day of each trace was used for tests.