skip to main content
research-article

Membrane: Operating system support for restartable file systems

Published:28 September 2010Publication History
Skip Abstract Section

Abstract

We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system failures, and does so while remaining transparent to running applications; upon failure, the file system restarts, its state is restored, and pending application requests are serviced as if no failure had occurred. Membrane provides transparent recovery through a lightweight logging and checkpoint infrastructure, and includes novel techniques to improve performance and correctness of its fault-anticipation and recovery machinery. We tested Membrane with ext2, ext3, and VFAT. Through experimentation, we show that Membrane induces little performance overhead and can tolerate a wide range of file system crashes. More critically, Membrane does so with little or no change to existing file systems, thus improving robustness to crashes without mandating intrusive changes to existing file-system code.

References

  1. }}Bonwick, J. and Moore, B. 2007. ZFS: The last word in file systems. http://opensolaris.org/os/community/zfs/docs/zfs last.pdf.Google ScholarGoogle Scholar
  2. }}Candea, G. and Fox, A. 2003. Crash-only software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. 2004. Microreboot -- A technique for cheap recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 31--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., and Gupta, A. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP'95). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Chou, A., Yang, J., Chelf, B., Hallem, S., and Engler, D. 2001. An empirical study of operating system errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 73--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Cranor, C.D. and Parulkar, G.M. 1999. The UVM virtual memory system. In Proceedings of the USENIX Annual Technical Conference (USENIX'99). USENIX Association, Monterey, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}David, F.M., Chan, E.M., Carlyle, J.C., and Campbell, R.H. 2008. CuriOS: Improving reliability through operating system structure. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}Demsky, B. and Rinard, M. 2003. Automatic detection and repair of errors in data structures. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'03). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B. 2001. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 57--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Erlingsson, U., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. 2006. XFI: Software guards for system address spaces. In Proceedings of the 7th USENIX OSDI. USENIX Association, Monterey, CA, 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield, A., and Williamson, M. 2004. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the Workshop on Operating System and Architectural Support for the On-Demand IT Infrastructure.Google ScholarGoogle Scholar
  12. }}Gunawi, H.S., Rubio-Gonzalez, C., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., and Liblit, B. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST'08). USENIX Association, Monterey, CA, 207--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP'87). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2006. Construction of a highly dependable operating system. In Proceedings of the 6th European Dependable Computing Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}Herder, J.N., Bos, H., Gras, B., Homburg, P., and Tanenbaum, A. S. 2007. Failure resilience for device drivers. In Proceedings of the 2007 IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference.USENIX Association, Monterey, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Kidder, T. 1981.Soul of a New Machine. Little, Brown, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Kleiman, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA, 238--247.Google ScholarGoogle Scholar
  19. }}Koldinger, E., Chase, J., and Eggers, S. 1992. Architectural support for single address space operating systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}Kropp, N.P., Koopman, P.J., and Siewiorek, D.P. 1998. Automated robustness testing of off-the-shelf software components. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS-28). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Larus, J. 2005. The singularity operating system. Seminar, University of Wisconsin, Madison.Google ScholarGoogle Scholar
  22. }}LeVasseur, J., Uhlig, V., Stoess, J., and Gotz, S. 2004. Unmodified device driver reuse and improved system dependability via virtual machines. In Proceedings of the 6th USENIX OSDI. USENIX Association, Monterey, CA Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}Lu, S., Park, S., Seo, E., and Zhou, Y. 2008. Learning from Mistakes — A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L., and S.A.S., B. 2007. The new Ext4 filesystem: current status and future plans. In Ottawa Linux Symposium (OLS'07).Google ScholarGoogle Scholar
  25. }}McVoy, L.W. and Kleiman, S.R. 1991. Extent-like performance from a UNIX file system. In Proceedings of the USENIX Winter Technical Conference. USENIX Association, Monterey, CA, 33--43.Google ScholarGoogle Scholar
  26. }}Milojicic, D., Messer, A., Shau, J., Fu, G., and Munoz, A. 2000. Increasing relevance of memory hardware errors: A case for recoverable programming models. In Proceedings of the 9th ACM SIGOPS European Workshop. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. }}Mogul, J. C. 1994. A better update policy. In Proceedings of the USENIX Summer Technical Conference. USENIX Association, Monterey, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}Peterson, Z. and Burns, R. 2005. Ext3cow: a time-shifting file system for regulatory compliance. Trans. Storage 1, 2, 190--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Prabhakaran, V., Bairavasundaram, L.N., Agrawal, N., Gunawi, H.S., Arpaci-Dusseau, A.C., and Arpaci-Dusseau, R.H. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York, 206--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}Qin, F., Tucek, J., Sundaresan, J., and Zhou, Y. 2005. Rx: Treating bugs as allergies. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}Reiser, H. 2004. ReiserFS. www.namesys.com.Google ScholarGoogle Scholar
  32. }}Rosenblum, M. and Ousterhout, J. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}Schrock, E. 2005. UFS/SVM vs. ZFS: Code complexity. http://blogs.sun.com/eschrock/.Google ScholarGoogle Scholar
  34. }}Shapiro, J.S. and Hardy, N. 2002. EROS: A principle-driven operating system from the ground up. IEEE Softw. 19, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Monterey, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}Swift, M.M., Bershad, B.N., and Levy, H.M. 2003. Improving the reliability of commodity operating systems. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}Swift, M.M., Annamalai, M., Bershad, B.N., and Levy, H.M. 2004. Recovering device drivers. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. }}Talagala, N. and Patterson, D. 1999. An analysis of error behaviour in a large storage system. In Proceedings of the IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  39. }}Ts'o, T. 2001. http://e2fsprogs.sourceforge.net.Google ScholarGoogle Scholar
  40. }}Ts'o, T. and Tweedie, S. 2002. Future directions for the ext2/3 filesystem. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track. USENIX Association, Monterey, CA.Google ScholarGoogle Scholar
  41. }}Weimer, W. and Necula, G.C. 2004. Finding and preventing run-time error-handling mistakes. In Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. }}Wikipedia. 2009. Btrfs. en.wikipedia.org/wiki/Btrfs.Google ScholarGoogle Scholar
  43. }}Williams, D., Reynolds, P., Walsh, K., Sirer, E.G., and Schneider, F.B. 2008. Device driver safety through a reference validation mechanism. In Proceedings of the 8th USENIX OSDI. USENIX Association, Monterey, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. }}Yang, J., Sar, C., and Engler, D. 2006. EXPLODE: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating SystemsDesign and Implementation (OSDI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. }}Yang, J., Twohey, P., Engler, D., and Musuvathi, M. 2004. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. }}Zhou, F., Condit, J., Anderson, Z., Bagrak, I., Ennals, R., Harren, M., Necula, G., and Brewer, E. 2006. SafeDrive: Safe and recoverable extensions using language-based techniques. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Membrane: Operating system support for restartable file systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Storage
      ACM Transactions on Storage  Volume 6, Issue 3
      September 2010
      165 pages
      ISSN:1553-3077
      EISSN:1553-3093
      DOI:10.1145/1837915
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 September 2010
      • Accepted: 1 June 2010
      • Revised: 1 May 2010
      • Received: 1 April 2010
      Published in tos Volume 6, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader