Abstract
We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system failures, and does so while remaining transparent to running applications; upon failure, the file system restarts, its state is restored, and pending application requests are serviced as if no failure had occurred. Membrane provides transparent recovery through a lightweight logging and checkpoint infrastructure, and includes novel techniques to improve performance and correctness of its fault-anticipation and recovery machinery. We tested Membrane with ext2, ext3, and VFAT. Through experimentation, we show that Membrane induces little performance overhead and can tolerate a wide range of file system crashes. More critically, Membrane does so with little or no change to existing file systems, thus improving robustness to crashes without mandating intrusive changes to existing file-system code.
- }}Bonwick, J. and Moore, B. 2007. ZFS: The last word in file systems. http://opensolaris.org/os/community/zfs/docs/zfs last.pdf.Google Scholar
- }}Candea, G. and Fox, A. 2003. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX). Google ScholarDigital Library
- }}Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. 2004. Microreboot -- A technique for cheap recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 31--44. Google ScholarDigital Library
- }}Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., and Gupta, A. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP'95). ACM, New York. Google ScholarDigital Library
- }}Chou, A., Yang, J., Chelf, B., Hallem, S., and Engler, D. 2001. An empirical study of operating system errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 73--88. Google ScholarDigital Library
- }}Cranor, C.D. and Parulkar, G.M. 1999. The UVM virtual memory system. In Proceedings of the USENIX Annual Technical Conference (USENIX'99). USENIX Association, Monterey, CA. Google ScholarDigital Library
- }}David, F.M., Chan, E.M., Carlyle, J.C., and Campbell, R.H. 2008. CuriOS: Improving reliability through operating system structure. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08). Google ScholarDigital Library
- }}Demsky, B. and Rinard, M. 2003. Automatic detection and repair of errors in data structures. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'03). ACM, New York. Google ScholarDigital Library
- }}Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 57--72. Google ScholarDigital Library
- }}Erlingsson, U., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. 2006. XFI: Software guards for system address spaces. In Proceedings of the 7th USENIX OSDI. USENIX Association, Monterey, CA, 75--88. Google ScholarDigital Library
- }}Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield, A., and Williamson, M. 2004. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the Workshop on Operating System and Architectural Support for the On-Demand IT Infrastructure.Google Scholar
- }}Gunawi, H.S., Rubio-Gonzalez, C., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., and Liblit, B. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST'08). USENIX Association, Monterey, CA, 207--222. Google ScholarDigital Library
- }}Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP'87). ACM, New York. Google ScholarDigital Library
- }}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2006. Construction of a highly dependable operating system. In Proceedings of the 6th European Dependable Computing Conference. Google ScholarDigital Library
- }}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2007. Failure resilience for device drivers. In Proceedings of the 2007 IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA, 41--50. Google ScholarDigital Library
- }}Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference.USENIX Association, Monterey, CA. Google ScholarDigital Library
- }}Kidder, T. 1981.Soul of a New Machine. Little, Brown, Boston, MA. Google ScholarDigital Library
- }}Kleiman, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA, 238--247.Google Scholar
- }}Koldinger, E., Chase, J., and Eggers, S. 1992. Architectural support for single address space operating systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). Google ScholarDigital Library
- }}Kropp, N.P., Koopman, P.J., and Siewiorek, D.P. 1998. Automated robustness testing of off-the-shelf software components. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS-28). Google ScholarDigital Library
- }}Larus, J. 2005. The singularity operating system. Seminar, University of Wisconsin, Madison.Google Scholar
- }}LeVasseur, J., Uhlig, V., Stoess, J., and Gotz, S. 2004. Unmodified device driver reuse and improved system dependability via virtual machines. In Proceedings of the 6th USENIX OSDI. USENIX Association, Monterey, CA Google ScholarDigital Library
- }}Lu, S., Park, S., Seo, E., and Zhou, Y. 2008. Learning from Mistakes — A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Google ScholarDigital Library
- }}Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L., and S.A.S., B. 2007. The new Ext4 filesystem: current status and future plans. In Ottawa Linux Symposium (OLS'07).Google Scholar
- }}McVoy, L.W. and Kleiman, S.R. 1991. Extent-like performance from a UNIX file system. In Proceedings of the USENIX Winter Technical Conference. USENIX Association, Monterey, CA, 33--43.Google Scholar
- }}Milojicic, D., Messer, A., Shau, J., Fu, G., and Munoz, A. 2000. Increasing relevance of memory hardware errors: A case for recoverable programming models. In Proceedings of the 9th ACM SIGOPS European Workshop. ACM, New York. Google ScholarDigital Library
- }}Mogul, J. C. 1994. A better update policy. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA. Google ScholarDigital Library
- }}Peterson, Z. and Burns, R. 2005. Ext3cow: a time-shifting file system for regulatory compliance. Trans. Storage 1, 2, 190--212. Google ScholarDigital Library
- }}Prabhakaran, V., Bairavasundaram, L.N., Agrawal, N., Gunawi, H.S., Arpaci-Dusseau, A.C., and Arpaci-Dusseau, R.H. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York, 206--220. Google ScholarDigital Library
- }}Qin, F., Tucek, J., Sundaresan, J., and Zhou, Y. 2005. Rx: Treating bugs as allergies. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York. Google ScholarDigital Library
- }}Reiser, H. 2004. ReiserFS. www.namesys.com.Google Scholar
- }}Rosenblum, M. and Ousterhout, J. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarDigital Library
- }}Schrock, E. 2005. UFS/SVM vs. ZFS: Code complexity. http://blogs.sun.com/eschrock/.Google Scholar
- }}Shapiro, J.S. and Hardy, N. 2002. EROS: A principle-driven operating system from the ground up. IEEE Softw. 19, 1. Google ScholarDigital Library
- }}Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Monterey, CA. Google ScholarDigital Library
- }}Swift, M.M., Bershad, B.N., and Levy, H.M. 2003. Improving the reliability of commodity operating systems. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, New York. Google ScholarDigital Library
- }}Swift, M.M., Annamalai, M., Bershad, B.N., and Levy, H.M. 2004. Recovering device drivers. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 1--16. Google ScholarDigital Library
- }}Talagala, N. and Patterson, D. 1999. An analysis of error behaviour in a large storage system. In Proceedings of the IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, IEEE, Los Alamitos, CA.Google Scholar
- }}Ts'o, T. 2001. http://e2fsprogs.sourceforge.net.Google Scholar
- }}Ts'o, T. and Tweedie, S. 2002. Future directions for the ext2/3 filesystem. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track. USENIX Association, Monterey, CA.Google Scholar
- }}Weimer, W. and Necula, G.C. 2004. Finding and preventing run-time error-handling mistakes. In Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04). ACM, New York. Google ScholarDigital Library
- }}Wikipedia. 2009. Btrfs. en.wikipedia.org/wiki/Btrfs.Google Scholar
- }}Williams, D., Reynolds, P., Walsh, K., Sirer, E.G., and Schneider, F.B. 2008. Device driver safety through a reference validation mechanism. In Proceedings of the 8th USENIX OSDI. USENIX Association, Monterey, CA. Google ScholarDigital Library
- }}Yang, J., Sar, C., and Engler, D. 2006. EXPLODE: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating SystemsDesign and Implementation (OSDI'06). Google ScholarDigital Library
- }}Yang, J., Twohey, P., Engler, D., and Musuvathi, M. 2004. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). Google ScholarDigital Library
- }}Zhou, F., Condit, J., Anderson, Z., Bagrak, I., Ennals, R., Harren, M., Necula, G., and Brewer, E. 2006. SafeDrive: Safe and recoverable extensions using language-based techniques. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). Google ScholarDigital Library
Index Terms
- Membrane: Operating system support for restartable file systems
Recommendations
Elephant: The File System that Never Forgets
HOTOS '99: Proceedings of the The Seventh Workshop on Hot Topics in Operating SystemsModern file systems associate the deletion of a file with the release of the storage associated with that file, and file writes with the irrevocable change of file contents. We propose that this model of file system behavior is a relic of the past, when ...
Replication-Based Fault Tolerance for MPI Applications
As computational clusters increase in size, their mean time to failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. ...
A fully informed model-based checkpointing protocol for preventing useless checkpoints
Checkpointing and rollback recovery are widely used techniques for handling failures in distributed systems. When processes involved in a distributed computation are allowed to take checkpoints independently without any coordination with each other, ...
Comments