research-article

Efficient Memory-Mapped I/O on Fast Storage Device

Authors:
Nae Young Song

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Yongseok Son

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Hyuck Han

Dongduk Women's University, Seoul, Republic of Korea

Dongduk Women's University, Seoul, Republic of Korea
View Profile

,
Heon Young Yeom

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 12 Issue 4Article No.: 19pp 1–27https://doi.org/10.1145/2846100

Published:20 May 2016Publication History

ACM Transactions on Storage

Abstract

In modern operating systems, memory-mapped I/O (mmio) is an important access method that maps a file or file-like resource to a region of memory. The mapping allows applications to access data from files through memory semantics (i.e., load/store) and it provides ease of programming. The number of applications that use mmio are increasing because memory semantics can provide better performance than file semantics (i.e., read/write). As more data are located in the main memory, the performance of applications can be enhanced owing to the effect of a large cache. When mmio is used, hot data tend to reside in the main memory and cold data are located in storage devices such as HDD and SSD; data placement in the memory hierarchy depends on the virtual memory subsystem of the operating system. Generally, the performance of storage devices has a direct impact on the performance of mmio. It is widely expected that better storage devices will lead to better performance. However, the expectation is limited when fast storage devices are used since the virtual memory subsystem does not reflect the performance feature of those devices.

In this article, we examine the Linux virtual memory subsystem and mmio path to determine the influence of fast storage on the existing Linux kernel. Throughout our investigation, we find that the overhead of the Linux virtual memory subsystem, negligible on the HDD, prevents applications from using the full performance of fast storage devices. To reduce the overheads and fully exploit the fast storage devices, we present several optimization techniques. We modify the Linux kernel to implement our optimization techniques and evaluate our prototyped system with low-latency storage devices. Experimental results show that our optimized mmio has up to 7x better performance than the original mmio. We also compare our system to a system that has enough memory to keep all data in the main memory. The system with insufficient memory and our mmio achieves 92% performance of the resource-rich system. This result implies that our virtual memory subsystem for mmap can effectively extend the main memory with fast storage devices.

References

Nadav Amit, Dan Tsafrir, and Assaf Schuster. 2014. VSwapper: A memory swapper for virtualized environments. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 349--366. DOI:http://dx.doi.org/10.1145/2541940.2541969 Google ScholarDigital Library
Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 1185--1196. DOI:http://dx.doi.org/10.1145/2463676.2465296 Google ScholarDigital Library
Anirudh Badam and Vivek S. Pai. 2011. SSDAlloc: Hybrid SSD/RAM memory management made easy. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI’11). USENIX Association, Berkeley, CA, 16--16. Google ScholarDigital Library
Adrian M. Caulfield and Joel Coburn. 2010. Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. Micro. Google ScholarDigital Library
Adrian M. Caulfield, Joel Coburn, Todor Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K. Gupta, Allan Snavely, and Steven Swanson. 2010. Understanding the impact of emerging non-volatile memories on high-performance, IO-intensive computing. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). IEEE Computer Society, Washington, DC, 1--11. Google ScholarDigital Library
Adrian M. Caulfield, Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing safe, user space access to fast, solid state disks. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, 387--400. DOI:http://dx.doi.org/10.1145/2150976.2151017 Google ScholarDigital Library
Adrian M. Caulfield, Todor I. Mollov, and Steven Swanson. 2011. Onyx: A protoype phase change memory storage array. HotStorage.Google Scholar
Jae Woo Choi, Dong In Shin, Young Jin Yu, Hyeonsang Eom, and Heon Young Yeom. 2014. Towards high-performance SAN with fast storage devices. ACM Transactions on Storage 10, 2, Article 5, 18 pages. DOI:http://dx.doi.org/10.1145/2577385 Google ScholarDigital Library
Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. 2013. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 197--212. DOI:http://dx.doi.org/10.1145/2517349.2522724 Google ScholarDigital Library
Joel Coburn, Adrian M. Caulfield, Laura M. Grupp, Rajesh K. Gupta, and Steven Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. ASPLOS’11. 105--117. Google ScholarDigital Library
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). ACM, New York, NY, 133--146. DOI:http://dx.doi.org/10.1145/1629575.1629589 Google ScholarDigital Library
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). 143--154. Google ScholarDigital Library
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). ACM, New York, NY, Article 15, 15 pages. DOI:http://dx.doi.org/10.1145/2592798.2592814 Google ScholarDigital Library
Extreme3804. 2015. Retrieved April 10, 2016 from http://www.taejin.co.kr/features/jetspeed/.Google Scholar
Brad Fitzpatrick. 2011. Memcached: A distributed memory object caching system. Retrieved April 10, 2016 from http://memcached.org/.Google Scholar
Hyuck Han, Hyungsoo Jung, Sooyong Kang, and HeonY. Yeom. 2011. Performance evaluation of a remote memory system with commodity hardware for large-memory data processing. Cluster Computing 14, 4, 325--344. DOI:http://dx.doi.org/10.1007/s10586-011-0164-9Google ScholarCross Ref
Romney R. Katti, Henry L. Stadler, and Jiin-Chuan Wu. 1994. Non-volatile magnetic random access memory. US Patent 5,289,410.Google Scholar
Hyojun Kim, Sangeetha Seshadri, Clement L. Dickey, and Lawrence Chiu. 2014. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). USENIX, Santa Clara, CA, 33--45. Retrieved April 10, 2016 from https://www.usenix.org/conference/fast14/technical-sessions/presentation/kim. Google ScholarDigital Library
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. ISCA 37, 3, 2. DOI:http://dx.doi.org/10.1145/1555815.1555758 Google ScholarDigital Library
Dong Li, J. S. Vetter, G. Marin, C. McCurdy, C. Cira, Zhuo Liu, and Weikuan Yu. 2012. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In IEEE 26th International Parallel Distributed Processing Symposium (IPDPS’12), 945--956. DOI:http://dx.doi.org/10.1109/IPDPS.2012.89 Google ScholarDigital Library
Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. 2009. Operating system support for NVM+DRAM hybrid main memory. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS’09). USENIX Association, Berkeley, CA, 14--14. Google ScholarDigital Library
William D. Norcott and Don Capps. 2003. Iozone filesystem benchmark. Retrieved April 10, 2016 from http://www.iozone.org/.Google Scholar
NVM express. 2012. Retrieved April 10, 2016 from http://www.nvmexpress.org.Google Scholar
Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic Msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys’13). ACM, New York, NY, 225--238. DOI:http://dx.doi.org/10.1145/2465351.2465374 Google ScholarDigital Library
Seon-yeong Park, Dawoon Jung, Jeong-uk Kang, Jin-soo Kim, and Joonwon Lee. 2006. CFLRU: A replacement algorithm for flash memory. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 234--241. DOI:http://dx.doi.org/10.1145/1176760.1176789 Google ScholarDigital Library
S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y. C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H. L. Lung, and C. H. Lam. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4.5. Google ScholarDigital Library
Yongseok Son, Hyuck Han, and Heon Young Yeom. 2015a. Optimizing file systems for fast storage devices. In Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR’15). Google ScholarDigital Library
Yongseok Son, NaeYoung Song, Hyuck Han, Hyeonsang Eom, and HeonYoung Yeom. 2015b. Design and evaluation of a user-level file system for fast storage devices. Cluster Computing 18, 3, 1075--1086. Google ScholarDigital Library
Nae Young Song, Young Jin Yu, Woong Shin, Hyeonsang Eom, and Heon Young Yeom. 2012. Low-latency memory-mapped I/O for data-intensive applications on fast storage devices. In SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC’12). 766--770. DOI:http://dx.doi.org/10.1109/SC.Companion.2012.105 Google ScholarDigital Library
Uresh Vahalia. 1996. UNIX Internals: The New Frontiers. Pearson Education India, Chennai, India. Google ScholarDigital Library
B. Van Essen, H. Hsieh, S. Ames, and M. Gokhale. 2012a. DI-MMAP: A high performance memory-map runtime for data-intensive applications. In SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC’12). 731--735. DOI:http://dx.doi.org/10.1109/SC.Companion.2012.99 Google ScholarDigital Library
B. Van Essen, R. Pearce, S. Ames, and M. Gokhale. 2012b. On the role of NVRAM in data-intensive architectures: An evaluation. In IEEE 26th International Parallel Distributed Processing Symposium (IPDPS). 703--714. DOI:http://dx.doi.org/10.1109/IPDPS.2012.69 Google ScholarDigital Library
Vijay Vasudevan, Michael Kaminsky, and David G. Andersen. 2012. Using vector interfaces to deliver millions of IOPS from a networked key-value storage server. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 8, 13 pages. DOI:http://dx.doi.org/10.1145/2391229.2391237 Google ScholarDigital Library
Chao Wang, S. S. Vazhkudai, Xiaosong Ma, Fei Meng, Youngjae Kim, and C. Engelmann. 2012. NVMalloc: Exposing an aggregate SSD store as a memory partition in extreme-scale machines. In IEEE 26th International Parallel Distributed Processing Symposium (IPDPS’12). 957--968. DOI:http://dx.doi.org/10.1109/IPDPS.2012.90 Google ScholarDigital Library
Michael Wu and Willy Zwaenepoel. 1994. envy: A non-volatile, main memory storage system. In ACM SigPlan Notices. Vol. 29. ACM, 86--97. Google ScholarDigital Library
Xiaojian Wu, Sheng Qiu, and A. L. Narasimha Reddy. 2013. SCMFS: A file system for storage class memory and its extensions. ACM Transations on Storage 9, 3, Article 7, 23 pages. DOI:http://dx.doi.org/10.1145/2501620.2501621 Google ScholarDigital Library
Jisoo Yang, Dave B. Minturn, and Frank Hady. 2012. When poll is better than interrupt. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12), San Jose, CA, February 14--17, 2012. 3. Google ScholarDigital Library
Young Jin Yu, Dong In Shin, Woong Shin, Nae Young Song, Jae Woo Choi, Hyeong Seog Kim, Hyeonsang Eom, and Heon Young Yeom. 2014. Optimizing the block I/O subsystem for fast storage devices. ACM Transactions on Computer Systems 32, 2, Article 6, 48 pages. DOI:http://dx.doi.org/10.1145/2619092 Google ScholarDigital Library
Young Jin Yu, Dong In Shin, Woong Shin, Nae Young Song, Hyeonsang Eom, and Heon Young Yeom. 2012. Exploiting peak device throughput from random access workload. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’12). USENIX Association, Berkeley, CA, 7--7. Google ScholarDigital Library

Index Terms

Efficient Memory-Mapped I/O on Fast Storage Device
1. Information systems
  1. Information storage systems
    1. Storage management
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management
        Memory management

Recommendations

Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation

Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Read More
Having Memory Storage Under Control of a File System
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

The development of memory storage device technologies, such as next generation non-volatile (NV) memory and battery backed NV-DIMM, has been advanced recently, and they became widely recognized. They provide high performance and persistency along with ...
Read More
Using DRAM as Cache for Non-Volatile Main Memory Swapping

The performance of mobile devices such as smartphones and tablets has been rapidly improving in recent years. However, these improvements have been seriously affecting power consumption. One of the greatest challenges is to achieve efficient power ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 12, Issue 4
August 2016
213 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2940403
Editor:
Darrell D. E. Long
University of California Santa Cruz, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2016
- Accepted: 1 November 2015
- Revised: 1 October 2015
- Received: 1 January 2015
Published in tos Volume 12, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Memory-mapped
data-intensive
nonvolatile memory
virtual memory system
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 1,179
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Memory-Mapped I/O on Fast Storage Device

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation

Having Memory Storage Under Control of a File System

Using DRAM as Cache for Non-Volatile Main Memory Swapping