Skip to main content

Pantheon: Exascale File System Search for Scientific Computing

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

Abstract

Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community’s push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Inc., G.: Google enterprise, http://www.google.com/enterprise

  2. Inc., M.: Enterprise search from microsoft, http://www.microsoft.com/enterprisesearch

  3. Apple, Spotlight server: Stop searching, start finding, http://www.apple.com/server/macosx/features/spotlight

  4. Kazeon: Kazeon Search the enterprise, http://www.kazeon.com

  5. Leung, A., Shao, M., Bisson, T., Pasupathy, S., Miller, E.: Spyglass: Fast, scalable metadata search for large-scale storage systems. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 153–166. USENIX Association (2009)

    Google Scholar 

  6. Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM, New York (1979)

    Chapter  Google Scholar 

  7. Weil, S., Pollack, K., Brandt, S., Miller, E.: Dynamic metadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  8. Bentley, J.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  9. Ousterhout, J., Cherenson, A., Douglis, F., Nelson, M., Welch, B.: The Sprite network operating system. Computer 21(2), 23–36 (1988)

    Article  Google Scholar 

  10. Weil, S., Brandt, S., Miller, E., Long, D., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, p. 320. USENIX Association (2006)

    Google Scholar 

  11. Pawlowski, B., Juszczak, C., Staubach, P., Smith, C., Lebel, D., Hitz, D.: NFS version 3 design and implementation. In: Proceedings of the Summer 1994 USENIX Technical Conference, pp. 137–151 (1994)

    Google Scholar 

  12. Morris, J., Satyanarayanan, M., Conner, M., Howard, J., Rosenthal, D., Smith, F.: Andrew: A distributed personal computing environment. Communications of the ACM 29(3), 201 (1986)

    Article  Google Scholar 

  13. Agrawal, N., Bolosky, W., Douceur, J., Lorch, J.: A five-year study of file-system metadata. ACM Transactions on Storage (TOS) 3(3), 9 (2007)

    Article  Google Scholar 

  14. Douceur, J., Bolosky, W.: A large-scale study of file-system contents. ACM SIGMETRICS Performance Evaluation Review 27(1), 70 (1999)

    Article  Google Scholar 

  15. Leung, A., Pasupathy, S., Goodson, G., Miller, E.: Measurement and analysis of large-scale network file system workloads. In: USENIX 2008 Annual Technical Conference, pp. 213–226. USENIX Association (2008)

    Google Scholar 

  16. Diwan, A., Rane, S., Seshadri, S., Sudarshan, S.: Clustering techniques for minimizing external path length. In: Proceedings of the International Conference on Very Large Data Bases, Citeseer, pp. 342–353 (1996)

    Google Scholar 

  17. FUSE, File system in user space, http://fuse.sourceforge.net

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Naps, J.L., Mokbel, M.F., Du, D.H.C. (2011). Pantheon: Exascale File System Search for Scientific Computing. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22351-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22350-1

  • Online ISBN: 978-3-642-22351-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics