Abstract
Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community’s push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Inc., G.: Google enterprise, http://www.google.com/enterprise
Inc., M.: Enterprise search from microsoft, http://www.microsoft.com/enterprisesearch
Apple, Spotlight server: Stop searching, start finding, http://www.apple.com/server/macosx/features/spotlight
Kazeon: Kazeon Search the enterprise, http://www.kazeon.com
Leung, A., Shao, M., Bisson, T., Pasupathy, S., Miller, E.: Spyglass: Fast, scalable metadata search for large-scale storage systems. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 153–166. USENIX Association (2009)
Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM, New York (1979)
Weil, S., Pollack, K., Brandt, S., Miller, E.: Dynamic metadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Computer Society, Los Alamitos (2004)
Bentley, J.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Ousterhout, J., Cherenson, A., Douglis, F., Nelson, M., Welch, B.: The Sprite network operating system. Computer 21(2), 23–36 (1988)
Weil, S., Brandt, S., Miller, E., Long, D., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, p. 320. USENIX Association (2006)
Pawlowski, B., Juszczak, C., Staubach, P., Smith, C., Lebel, D., Hitz, D.: NFS version 3 design and implementation. In: Proceedings of the Summer 1994 USENIX Technical Conference, pp. 137–151 (1994)
Morris, J., Satyanarayanan, M., Conner, M., Howard, J., Rosenthal, D., Smith, F.: Andrew: A distributed personal computing environment. Communications of the ACM 29(3), 201 (1986)
Agrawal, N., Bolosky, W., Douceur, J., Lorch, J.: A five-year study of file-system metadata. ACM Transactions on Storage (TOS) 3(3), 9 (2007)
Douceur, J., Bolosky, W.: A large-scale study of file-system contents. ACM SIGMETRICS Performance Evaluation Review 27(1), 70 (1999)
Leung, A., Pasupathy, S., Goodson, G., Miller, E.: Measurement and analysis of large-scale network file system workloads. In: USENIX 2008 Annual Technical Conference, pp. 213–226. USENIX Association (2008)
Diwan, A., Rane, S., Seshadri, S., Sudarshan, S.: Clustering techniques for minimizing external path length. In: Proceedings of the International Conference on Very Large Data Bases, Citeseer, pp. 342–353 (1996)
FUSE, File system in user space, http://fuse.sourceforge.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Naps, J.L., Mokbel, M.F., Du, D.H.C. (2011). Pantheon: Exascale File System Search for Scientific Computing. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-22351-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)