Skip to main content
Log in

File and Object Replication in Data Grids

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Data replication is a key issue in a Data Grid and can be managed in different ways and at different levels of granularity: for example, at the file level or object level. In the High Energy Physics community, Data Grids are being developed to support the distributed analysis of experimental data. We have produced a prototype data replication tool, the Grid Data Mirroring Package (GDMP) that is in production use in one physics experiment, with middleware provided by the Globus Toolkit used for authentication, data movement, and other purposes. We present here a new, enhanced GDMP architecture and prototype implementation that uses Globus Data Grid tools for efficient file replication. We also explain how this architecture can address object replication issues in an object-oriented database management system. File transfer over wide-area networks requires specific performance tuning in order to gain optimal data transfer rates. We present performance results obtained with GridFTP, an enhanced version of FTP, and discuss tuning parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel and S. Tuecke, Secure, effi-cient data transport and replica management for high-performance dataintensive computing, in: 18th IEEE Symposium on Mass Storage Systems and 9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego (April 2001).

  2. W. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Tierney, B. Drach and D. Williams, High-performance remote access to climate simulation data: A challenge problem for Data Grid technologies, Preprint, Argonne National Laboratory (2001).

  3. C. Baru, R. Moore, A. Rajasekar and M. Wan, The SDSC storage resource broker, in: CASCON'98 Conference (1998).

  4. L.M. Bernardo, A. Shoshani, A. Sim and H. Nordberg, Access coordination of tertiary storage for high energy physics application, in: 17th IEEE Symposium on Mass Storage Systems and 8th NASA Goddard Conference on Mass Storage Systems and Technologies, Maryland, USA (27-30 March 2000).

  5. L. Breslau, P. Cao, L. Fan, G. Phillips and S. Shenker, Web caching and Zipf-like distributions: Evidence and implications, in: Proceedings of IEEE Infocom (1999).

  6. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke, The Data Grid: Towards an architecture for the distributed management and analysis of large scientific data sets, J. Network and Computer Applications (2000).

  7. Data Intensive Distributed Computing Group, Lawrence Berkeley National Laboratory, Tuning Guide for Distributed Application on Wide Area Networks, http://www-didc.lbl.gov/tcp-wan.html (March 2001).

  8. European Data Grid project: http://www.eu-datagrid.org.

  9. I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, A security architecture for computational Grids, in: ACM Conference on Computers and Security (1998) pp. 83-91.

  10. I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure (Morgan-Kaufmann, 1999).

  11. I. Foster and C. Kesselman, The Globus Toolkit, in: The Grid: Blueprint for a New Computing Infrastructure (Morgan-Kaufmann, 1999) pp. 259-278.

  12. I. Foster, A. Roy and V. Sander, A quality of service architecture that combines resource reservation and application adaptation, in: Proc. 8th International Workshop on Quality of Service (2000).

  13. GDMP web page: http://cmsdoc.cern.ch/cms/grid (February 2001).

  14. Grid Physics Network (GriPhyN): http://www.griphyn.org (February 2001).

  15. A. Hanushevsky, Obejectivity/DB Advanced Multi_threaded Server (AMS) www.slac.stanford.edu/~abh/objy.html (April 2000).

  16. K. Holtman, P. van der Stok and I. Willers, Automatic reclustering of objects in very large databases for high energy physics, in: Proc. of IDEAS '98, Cardiff, UK (1998).

  17. K. Holtman and H. Stockinger, Building a large location table to find replicas of physics objects, in: Computing in High Energy Physics (CHEP 2000), Padova, Italy (February 2000).

  18. K. Holtman, Object level physics data replication in the Grid, in: VII International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT'2000, Chicago, USA (16-20 October 2000).

  19. W. Hoschek, J. Jaen-Martinez, A. Samar, H. Stockinger and K. Stockinger, Data Management in an International Data Grid Project, in: 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India (17-20 December 2000).

  20. iperf: http://dast.nlanr.net/Projects/Iperf/index.html.

  21. G. Jin, G. Yang, B. Crowley and D. Agarwal, Network Characterization Service, in: 10th IEEE Symposium on High Performance Distributed Computing, San Francisco, CA (7-9 August 2001).

  22. D. Karger, A. Sherman, A. Berkheimer, B. Bogstad, R. Dhanidina, K. Iwamoto, B. Kim, L. Matkins and Y. Yerushalmi, Web caching with consistent hashing, in: 8th International World Wide Web Conference (1999).

  23. J. Linn, Generic Security Service Application Program Interface Version 2, Update 1, IETF, RFC 2743 (2000) http://www.ietf.org/ rfc/rfc2743.

  24. R. Moore, C. Baru, R. Marciano, A. Rajasekar and M. Wan, Dataintensive computing, in: The Grid: Blueprint for a New Computing Infrastructure, eds. I. Foster and C. Kesselman (Morgan Kaufmann, 1999) pp. 105-129.

  25. R.Morris, TCP behavior with many flows, in: IEEE International Conference on Network Protocols (IEEE Press, 1997).

  26. H. Newman, Worldwide distributed analysis for the next generations of HENP experiments, in: Computing in High Energy Physics (February 2000).

  27. Objectivity, Inc., http://www.objectivity.com (February 2001).

  28. Particle Physics Data Grid (PPDG), http://www.ppdg.net (February 2001).

  29. L. Qiu, Y. Zhang and S. Keshav, On individual and aggregate TCP performance, in: 7th International Conference on Network Protocols (1999).

  30. A. Samar and H. Stockinger, Grid Data Management Pilot (GDMP): A tool for wide area replication, in: IASTED International Conference on Applied Informatics (AI2001), Innsbruck, Austria (February 2001).

  31. H. Sato and Y. Morita, Evaluation of objectivity/AMS on the wide area network, in: Computing in High Energy Physics (CHEP 2000), Padova, Italy (February 2000).

  32. M. Schaller, Reclustering of high energy physics data, in: Proc. of SSDBM'99, Cleveland, OH (28-30 July 1999).

  33. H. Stockinger, Distributed database management systems and the Data Grid, in: 18th IEEE Symposium on Mass Storage Systems and 9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego (17-20 April 2001).

  34. R. Tewari, M. Dahlin, H. Vin and J. Kay, Design considerations for distributed caching on the Internet, in: 19th IEEE International Conference on Distributed Computing Systems (1999).

  35. B. Tierney, W. Johnston, L. Chen, H. Herzog, G. Hoo, G., Jin and J. Lee, Distributed parallel data storage systems: A scalable approach to high speed image servers, in: ACM Multimedia 94 (1994).

  36. B. Tierney, TCP tuning guide for distributed application on wide area networks, in: Usenix; login (February 2001).

  37. S. Vazhkudai, S. Tuecke and I. Foster, Replica selection in the Globus Data Grid, in: IEEE International Symposium on Cluster Computing and the Grid (CCGrid2001), Brisbane, Australia (May 2001).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stockinger, H., Samar, A., Holtman, K. et al. File and Object Replication in Data Grids. Cluster Computing 5, 305–314 (2002). https://doi.org/10.1023/A:1015681406220

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015681406220

Navigation