research-article

Membrane: Operating system support for restartable file systems

Authors:
Swaminathan Sundararaman

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Sriram Subramanian

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Abhishek Rajimwale

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Andrea C. Arpaci-Dusseau

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Remzi H. Arpaci-Dusseau

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Michael M. Swift

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 6 Issue 3Article No.: 11pp 1–30https://doi.org/10.1145/1837915.1837919

Published:28 September 2010Publication History

ACM Transactions on Storage

Abstract

We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system failures, and does so while remaining transparent to running applications; upon failure, the file system restarts, its state is restored, and pending application requests are serviced as if no failure had occurred. Membrane provides transparent recovery through a lightweight logging and checkpoint infrastructure, and includes novel techniques to improve performance and correctness of its fault-anticipation and recovery machinery. We tested Membrane with ext2, ext3, and VFAT. Through experimentation, we show that Membrane induces little performance overhead and can tolerate a wide range of file system crashes. More critically, Membrane does so with little or no change to existing file systems, thus improving robustness to crashes without mandating intrusive changes to existing file-system code.

References

}}Bonwick, J. and Moore, B. 2007. ZFS: The last word in file systems. http://opensolaris.org/os/community/zfs/docs/zfs last.pdf.Google Scholar
}}Candea, G. and Fox, A. 2003. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX). Google ScholarDigital Library
}}Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. 2004. Microreboot -- A technique for cheap recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 31--44. Google ScholarDigital Library
}}Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., and Gupta, A. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP'95). ACM, New York. Google ScholarDigital Library
}}Chou, A., Yang, J., Chelf, B., Hallem, S., and Engler, D. 2001. An empirical study of operating system errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 73--88. Google ScholarDigital Library
}}Cranor, C.D. and Parulkar, G.M. 1999. The UVM virtual memory system. In Proceedings of the USENIX Annual Technical Conference (USENIX'99). USENIX Association, Monterey, CA. Google ScholarDigital Library
}}David, F.M., Chan, E.M., Carlyle, J.C., and Campbell, R.H. 2008. CuriOS: Improving reliability through operating system structure. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08). Google ScholarDigital Library
}}Demsky, B. and Rinard, M. 2003. Automatic detection and repair of errors in data structures. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'03). ACM, New York. Google ScholarDigital Library
}}Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 57--72. Google ScholarDigital Library
}}Erlingsson, U., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. 2006. XFI: Software guards for system address spaces. In Proceedings of the 7th USENIX OSDI. USENIX Association, Monterey, CA, 75--88. Google ScholarDigital Library
}}Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield, A., and Williamson, M. 2004. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the Workshop on Operating System and Architectural Support for the On-Demand IT Infrastructure.Google Scholar
}}Gunawi, H.S., Rubio-Gonzalez, C., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., and Liblit, B. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST'08). USENIX Association, Monterey, CA, 207--222. Google ScholarDigital Library
}}Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP'87). ACM, New York. Google ScholarDigital Library
}}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2006. Construction of a highly dependable operating system. In Proceedings of the 6th European Dependable Computing Conference. Google ScholarDigital Library
}}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2007. Failure resilience for device drivers. In Proceedings of the 2007 IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA, 41--50. Google ScholarDigital Library
}}Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference.USENIX Association, Monterey, CA. Google ScholarDigital Library
}}Kidder, T. 1981.Soul of a New Machine. Little, Brown, Boston, MA. Google ScholarDigital Library
}}Kleiman, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA, 238--247.Google Scholar
}}Koldinger, E., Chase, J., and Eggers, S. 1992. Architectural support for single address space operating systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). Google ScholarDigital Library
}}Kropp, N.P., Koopman, P.J., and Siewiorek, D.P. 1998. Automated robustness testing of off-the-shelf software components. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS-28). Google ScholarDigital Library
}}Larus, J. 2005. The singularity operating system. Seminar, University of Wisconsin, Madison.Google Scholar
}}LeVasseur, J., Uhlig, V., Stoess, J., and Gotz, S. 2004. Unmodified device driver reuse and improved system dependability via virtual machines. In Proceedings of the 6th USENIX OSDI. USENIX Association, Monterey, CA Google ScholarDigital Library
}}Lu, S., Park, S., Seo, E., and Zhou, Y. 2008. Learning from Mistakes — A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Google ScholarDigital Library
}}Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L., and S.A.S., B. 2007. The new Ext4 filesystem: current status and future plans. In Ottawa Linux Symposium (OLS'07).Google Scholar
}}McVoy, L.W. and Kleiman, S.R. 1991. Extent-like performance from a UNIX file system. In Proceedings of the USENIX Winter Technical Conference. USENIX Association, Monterey, CA, 33--43.Google Scholar
}}Milojicic, D., Messer, A., Shau, J., Fu, G., and Munoz, A. 2000. Increasing relevance of memory hardware errors: A case for recoverable programming models. In Proceedings of the 9th ACM SIGOPS European Workshop. ACM, New York. Google ScholarDigital Library
}}Mogul, J. C. 1994. A better update policy. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA. Google ScholarDigital Library
}}Peterson, Z. and Burns, R. 2005. Ext3cow: a time-shifting file system for regulatory compliance. Trans. Storage 1, 2, 190--212. Google ScholarDigital Library
}}Prabhakaran, V., Bairavasundaram, L.N., Agrawal, N., Gunawi, H.S., Arpaci-Dusseau, A.C., and Arpaci-Dusseau, R.H. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York, 206--220. Google ScholarDigital Library
}}Qin, F., Tucek, J., Sundaresan, J., and Zhou, Y. 2005. Rx: Treating bugs as allergies. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York. Google ScholarDigital Library
}}Reiser, H. 2004. ReiserFS. www.namesys.com.Google Scholar
}}Rosenblum, M. and Ousterhout, J. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarDigital Library
}}Schrock, E. 2005. UFS/SVM vs. ZFS: Code complexity. http://blogs.sun.com/eschrock/.Google Scholar
}}Shapiro, J.S. and Hardy, N. 2002. EROS: A principle-driven operating system from the ground up. IEEE Softw. 19, 1. Google ScholarDigital Library
}}Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Monterey, CA. Google ScholarDigital Library
}}Swift, M.M., Bershad, B.N., and Levy, H.M. 2003. Improving the reliability of commodity operating systems. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, New York. Google ScholarDigital Library
}}Swift, M.M., Annamalai, M., Bershad, B.N., and Levy, H.M. 2004. Recovering device drivers. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 1--16. Google ScholarDigital Library
}}Talagala, N. and Patterson, D. 1999. An analysis of error behaviour in a large storage system. In Proceedings of the IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, IEEE, Los Alamitos, CA.Google Scholar
}}Ts'o, T. 2001. http://e2fsprogs.sourceforge.net.Google Scholar
}}Ts'o, T. and Tweedie, S. 2002. Future directions for the ext2/3 filesystem. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track. USENIX Association, Monterey, CA.Google Scholar
}}Weimer, W. and Necula, G.C. 2004. Finding and preventing run-time error-handling mistakes. In Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04). ACM, New York. Google ScholarDigital Library
}}Wikipedia. 2009. Btrfs. en.wikipedia.org/wiki/Btrfs.Google Scholar
}}Williams, D., Reynolds, P., Walsh, K., Sirer, E.G., and Schneider, F.B. 2008. Device driver safety through a reference validation mechanism. In Proceedings of the 8th USENIX OSDI. USENIX Association, Monterey, CA. Google ScholarDigital Library
}}Yang, J., Sar, C., and Engler, D. 2006. EXPLODE: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating SystemsDesign and Implementation (OSDI'06). Google ScholarDigital Library
}}Yang, J., Twohey, P., Engler, D., and Musuvathi, M. 2004. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). Google ScholarDigital Library
}}Zhou, F., Condit, J., Anderson, Z., Bagrak, I., Ennals, R., Harren, M., Necula, G., and Brewer, E. 2006. SafeDrive: Safe and recoverable extensions using language-based techniques. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). Google ScholarDigital Library

Index Terms

Membrane: Operating system support for restartable file systems
1. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance

Recommendations

Elephant: The File System that Never Forgets
HOTOS '99: Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems

Modern file systems associate the deletion of a file with the release of the storage associated with that file, and file writes with the irrevocable change of file contents. We propose that this model of file system behavior is a relic of the past, when ...
Read More
Replication-Based Fault Tolerance for MPI Applications

As computational clusters increase in size, their mean time to failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. ...
Read More
A fully informed model-based checkpointing protocol for preventing useless checkpoints

Checkpointing and rollback recovery are widely used techniques for handling failures in distributed systems. When processes involved in a distributed computation are allowed to take checkpoints independently without any coordination with each other, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Storage Volume 6, Issue 3
September 2010
165 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1837915
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2010
- Accepted: 1 June 2010
- Revised: 1 May 2010
- Received: 1 April 2010
Published in tos Volume 6, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Checkpointing
fault recovery
file systems
restartability
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 623
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Membrane: Operating system support for restartable file systems

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Elephant: The File System that Never Forgets

Replication-Based Fault Tolerance for MPI Applications

A fully informed model-based checkpointing protocol for preventing useless checkpoints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Membrane: Operating system support for restartable file systems

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Elephant: The File System that Never Forgets

Replication-Based Fault Tolerance for MPI Applications

A fully informed model-based checkpointing protocol for preventing useless checkpoints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media