ABSTRACT
Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications. These applications produce and analyze terabytes and petabytes of data that are distributed in millions of files or objects. To manage these large data sets efficiently, metadata or descriptive information about the data needs to be managed. There are various types of metadata, and it is likely that a range of metadata services will exist in Grid environments that are specialized for particular types of metadata cataloguing and discovery. In this paper, we present the design of a Metadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attributes. We describe our experience in using the MCS with several applications and present a scalability study of the service.
- {1} I. Foster and C. Kesselman, "The Grid: Blueprint for a New Computing Infrastructure," Morgan Kaufmann, 1999. Google ScholarDigital Library
- {2} I. Foster, "Grid Computing," presented at Advanced Computing and Analysis Techniques in Physics Research (ACAT), 2000.Google Scholar
- {3} I. Foster, C. Kesselman, and S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," International Journal of High Performance Computing Applications, vol. 15, pp. 200-222, 2001. Google ScholarDigital Library
- {4} A. Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, A. Iamnitchi, C. Kesselman, P. Kunst, M. Ripeanu, B, Schwartzkopf, H, Stockinger, K. Stockinger, B. Tierney, "Giggle: A Framework for Constructing Sclable Replica Location Services," presented at SC2002, Baltimore, MD, 2002. Google ScholarDigital Library
- {5} ESG, "The Earth Systems Grid." http://www.earthsystemgrid.orgGoogle Scholar
- {6} E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, A. Arbree, R. Cavanaugh, K. Blackburn, A. Lazzarini, and S. Koranda, "Mapping Abstract Complex Workflows onto Grid Environments," Journal of Grid Computing, vol. 1, pp. 25-39, 2003.Google ScholarCross Ref
- {7} B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Leigh, A. Sim, A. Shoshani, B. Drach, D. Williams, "High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies," presented at SC2001, 2001. Google ScholarDigital Library
- {8} L. Pearlman, V. Welch, I. Foster, C. Kesselman, and S. Tuecke, "A Community Authorization Service for Group Collaboration.," presented at IEEE 3rd International Workshop on Policies for Distributed Systems and Networks, 2002. Google ScholarDigital Library
- {9} A. Chervenak, E. Deelman, C. Kesselman, L. Pearlman, and G. Singh, "A Metadata Catalog Service for Data Intensive Applications," GriPhyN technical report, 2002-11 2002.Google Scholar
- {10} E. Deelman, J. Blythe, Y. Gil, and C. Kesselman, "Pegasus: Planning for Execution in Grids," GriPhyN 2002-20, 2002.Google Scholar
- {11} A. Abramovici, W. E. Althouse, and e. al., "LIGO: The Laser Interferometer Gravitational-Wave Observatory (in Large Scale Measurements)," Science, vol. 256, pp. 325-333, 1992.Google ScholarCross Ref
- {12} E. Deelman, K. Blackburn, P. Ehrens, C. Kesselman, S. Koranda, A. Lazzarini, G. Mehta, L. Meshkat, L. Pearlman, K. Blackburn, and R. Williams., "GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists," presented at 11th Intl Symposium on High Performance Distributed Computing, 2002. Google ScholarDigital Library
- {13} MCAT, "MCAT - A Meta Information Catalog (Version 1.1)."Google Scholar
- {14} C. Baru, R. Moore, A. Rajasekar, and M. Wan, "The SDSC Storage Resource Broker," presented at Proc. CASCON'98 Conference, 1998. Google ScholarDigital Library
- {15} Guy, L., P. Kunszt, E. Laure, H. Stockinger, K. Stockinger (2002). Replica Management in Data Grids. Global Grid Forum 5.Google Scholar
- {16} K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, "Grid Information Services for Distributed Resource Sharing," presented at Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), 2001. Google ScholarDigital Library
Recommendations
An adaptive meta-scheduler for data-intensive applications
In data-intensive applications, such as high-energy physics, bio-informatics, we encounter applications involving numerous jobs that access and generate large datasets. Effective scheduling of such applications is a challenge, due to the need to ...
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
A semantic metadata catalog service for grid
GCC'05: Proceedings of the 4th international conference on Grid and Cooperative ComputingMetadata is the information that describes the most important feature of an object. In recent years, metadata plays a more and more important role in data intensive applications. In this paper, we propose a semantic metadata catalog service, Semantic ...
Comments