Skip to main content

Persistence Management in Digital Document Repository

  • Conference paper
  • First Online:
  • 3140 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Abstract

The CREDO Digital Document Repository enables short-and long-term archiving of large volumes of digital resources, ensuring bitstream preservation and providing most of the technical means to ensure content preservation of digital resources. The goal of the paper is to describe the design and implementation an innovative component of the CREDO Repository: the Persistence Management Subsystem (PMS). This subsystem sets guidelines for the file management system on replicas placement, and data relocation. The module responsible for scheduling access to the archive provides energy efficiency by setting suboptimal schedules. The module responsible for diagnose and exchange of data carriers calculates the probabilities of failure, and the information is used by the scheduling module to select appropriate storage areas for reading or writing of data, and for marking the areas as obsolete. Finally, the power management module is responsible for starting-up the storage areas only when necessary.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    CREDO – the acronym of Polish name Cyfrowe REpozytorium DOkumentów, which means ‘Digital Document Repository’. In Latin credo means ‘I believe’, which seems to be quite a good watchword for trustworthy digital repository.

  2. 2.

    Mean Time Between Failures, parameter given by the media producers.

References

  1. Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: NSDI, vol. 10, p. 19 (2010)

    Google Scholar 

  2. Beloglazov, A., Buyya, R.: Energy efficient resource management in virtualized cloud data centers. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 826–831. IEEE Computer Society (2010)

    Google Scholar 

  3. Consultative Committee for Space Data Systems: Reference model for an open archival information system (OAIS). Recommended practice, June 2012. http://public.ccsds.org/publications/archive/650x0m2.pdf. Access: 01 Dec 2015

  4. Denning, P.J.: Effects of scheduling on file memory operations. In: Proceedings of the Spring Joint Computer Conference, pp. 9–21. ACM, 18–20 April 1967

    Google Scholar 

  5. Giaretta, D.: Advanced Digital Preservation. Springer, Heidelberg (2011)

    Book  Google Scholar 

  6. Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, pp. 202–209. Citeseer (2001)

    Google Scholar 

  7. Kliazovich, D., Bouvry, P., Khan, S.U.: Dens: data center energy-efficient network-aware scheduling. Cluster Comput. 16(1), 65–75 (2013)

    Article  Google Scholar 

  8. Lu, M., Chiueh, T.: Challenges of long-term digital archiving: a survey. Tech. rep., Experimental Computer Systems Lab, Department of Computer Science, State University of New York, October 2006. http://www.ecsl.cs.sunysb.edu/tr/rpe19.pdf

  9. Mao, S., Chen, Y., Liu, F., Chen, X., Xu, B., Lu, P., Patwari, M., Xi, H., Chang, C., Miller, B., et al.: Commercial TMR heads for hard disk drives: characterization and extendibility at 300 gbit/in 2. IEEE Trans. Magn. 42(2), 97–102 (2006)

    Article  Google Scholar 

  10. Marasek, K., Walczak, J., Traczyk, T., Płoszajski, G., Kaźmierski, A.: Koncepcja elektronicznego archiwum wieczystego. Stud. Inform. 30(2B), 275–307 (2009). http://zti.inf.polsl.pl/BDAS/2009/BDAS‘09%20-%20KONCEPCJA%20ELEKTRONICZNEGO%20ARCHIWUM%20WIECZYSTEGO.pdf?Id=646&val=1

  11. Marasek, K., Walczak, J.: Long-term preservation of digital files in data network structures, 01 Dec 2015. http://www.ci.pw.edu.pl/content/download/1426/11818/file/KMJPW06102015-fin.pdf (in Polish)

  12. Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: 2010 Proceedings of the IEEE INFOCOM, pp. 1–9. IEEE (2010)

    Google Scholar 

  13. Merten, A.G.: Some quantitative techniques for file organization (1970)

    Google Scholar 

  14. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of ICANN/ICONIP. Citeseer (2003)

    Google Scholar 

  15. Pater, K., Traczyk, T.: Opakowanie zasobów cyfrowych na potrzeby archiwizacji długoterminowej. Stud. Inform. 34(2B(112)), 898–103 (2013). http://www.znsi.aei.polsl.pl/materialy/SI112/SI112_8.pdf

  16. Płoszajski, G. (ed.): Standardy techniczne obiektów cyfrowych przy digitalizacji dziedzictwa kulturowego. Biblioteka Główna Politechniki Warszawskiej, Warszawa (2008). http://bcpw.bg.pw.edu.pl/dlibra/docmetadata?id=1262

  17. Rabinovici-Cohen, S., Marberg, J., Nagin, K., Pease, D.: PDS cloud: Long term digital preservation in the cloud. In: 2013 IEEE International Conference on Cloud Engineering (IC2E), pp. 38–45, March 2013

    Google Scholar 

  18. Schroeder, B., Gibson, G.A.: Disk failures in the real world: What does an MTTF of 1, 000, 000 hours mean to you? In: FAST, vol. 7, pp. 1–16 (2007)

    Google Scholar 

  19. Schwarz, T., Baker, M., Bassi, S., Baumgart, B., Flagg, W., van Ingen, C., Joste, K., Manasse, M., Shah, M.: Disk failure investigations at the internet archive. Work-in-Progess Session, NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2006) (2006)

    Google Scholar 

  20. Seaman, P.H., Lind, R.A., Wilson, T.L.: On teleprocessing system design: part iv an analysis of auxiliary-storage activity. IBM Syst. J. 5(3), 158–170 (1966)

    Article  Google Scholar 

  21. Stage, A., Setzer, T.: Network-aware migration control and scheduling of differentiated virtual machine workloads. In: Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, pp. 9–14. IEEE Computer Society (2009)

    Google Scholar 

  22. Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach. IEEE Trans. Parallel Distrib. Syst. 19(11), 1458–1472 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

The project entitled Cyfrowe Repozytorium DOkumentów CREDO (Digital Document Repository CREDO) is co-financed by the European Union through the European Regional Development Fund under the Operational Programme ‘Innovative Economy’ for the years 2007–2013, Priority Axis 1 – Research and development of modern technologies, Grant No. WND-DEM-1-385/00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Pałka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pałka, P., Śliwiński, T., Traczyk, T., Ogryczak, W. (2016). Persistence Management in Digital Document Repository. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics