Abstract
The CREDO Digital Document Repository enables short-and long-term archiving of large volumes of digital resources, ensuring bitstream preservation and providing most of the technical means to ensure content preservation of digital resources. The goal of the paper is to describe the design and implementation an innovative component of the CREDO Repository: the Persistence Management Subsystem (PMS). This subsystem sets guidelines for the file management system on replicas placement, and data relocation. The module responsible for scheduling access to the archive provides energy efficiency by setting suboptimal schedules. The module responsible for diagnose and exchange of data carriers calculates the probabilities of failure, and the information is used by the scheduling module to select appropriate storage areas for reading or writing of data, and for marking the areas as obsolete. Finally, the power management module is responsible for starting-up the storage areas only when necessary.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
CREDO – the acronym of Polish name Cyfrowe REpozytorium DOkumentów, which means ‘Digital Document Repository’. In Latin credo means ‘I believe’, which seems to be quite a good watchword for trustworthy digital repository.
- 2.
Mean Time Between Failures, parameter given by the media producers.
References
Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: NSDI, vol. 10, p. 19 (2010)
Beloglazov, A., Buyya, R.: Energy efficient resource management in virtualized cloud data centers. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 826–831. IEEE Computer Society (2010)
Consultative Committee for Space Data Systems: Reference model for an open archival information system (OAIS). Recommended practice, June 2012. http://public.ccsds.org/publications/archive/650x0m2.pdf. Access: 01 Dec 2015
Denning, P.J.: Effects of scheduling on file memory operations. In: Proceedings of the Spring Joint Computer Conference, pp. 9–21. ACM, 18–20 April 1967
Giaretta, D.: Advanced Digital Preservation. Springer, Heidelberg (2011)
Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, pp. 202–209. Citeseer (2001)
Kliazovich, D., Bouvry, P., Khan, S.U.: Dens: data center energy-efficient network-aware scheduling. Cluster Comput. 16(1), 65–75 (2013)
Lu, M., Chiueh, T.: Challenges of long-term digital archiving: a survey. Tech. rep., Experimental Computer Systems Lab, Department of Computer Science, State University of New York, October 2006. http://www.ecsl.cs.sunysb.edu/tr/rpe19.pdf
Mao, S., Chen, Y., Liu, F., Chen, X., Xu, B., Lu, P., Patwari, M., Xi, H., Chang, C., Miller, B., et al.: Commercial TMR heads for hard disk drives: characterization and extendibility at 300 gbit/in 2. IEEE Trans. Magn. 42(2), 97–102 (2006)
Marasek, K., Walczak, J., Traczyk, T., Płoszajski, G., Kaźmierski, A.: Koncepcja elektronicznego archiwum wieczystego. Stud. Inform. 30(2B), 275–307 (2009). http://zti.inf.polsl.pl/BDAS/2009/BDAS‘09%20-%20KONCEPCJA%20ELEKTRONICZNEGO%20ARCHIWUM%20WIECZYSTEGO.pdf?Id=646&val=1
Marasek, K., Walczak, J.: Long-term preservation of digital files in data network structures, 01 Dec 2015. http://www.ci.pw.edu.pl/content/download/1426/11818/file/KMJPW06102015-fin.pdf (in Polish)
Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: 2010 Proceedings of the IEEE INFOCOM, pp. 1–9. IEEE (2010)
Merten, A.G.: Some quantitative techniques for file organization (1970)
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of ICANN/ICONIP. Citeseer (2003)
Pater, K., Traczyk, T.: Opakowanie zasobów cyfrowych na potrzeby archiwizacji długoterminowej. Stud. Inform. 34(2B(112)), 898–103 (2013). http://www.znsi.aei.polsl.pl/materialy/SI112/SI112_8.pdf
Płoszajski, G. (ed.): Standardy techniczne obiektów cyfrowych przy digitalizacji dziedzictwa kulturowego. Biblioteka Główna Politechniki Warszawskiej, Warszawa (2008). http://bcpw.bg.pw.edu.pl/dlibra/docmetadata?id=1262
Rabinovici-Cohen, S., Marberg, J., Nagin, K., Pease, D.: PDS cloud: Long term digital preservation in the cloud. In: 2013 IEEE International Conference on Cloud Engineering (IC2E), pp. 38–45, March 2013
Schroeder, B., Gibson, G.A.: Disk failures in the real world: What does an MTTF of 1, 000, 000 hours mean to you? In: FAST, vol. 7, pp. 1–16 (2007)
Schwarz, T., Baker, M., Bassi, S., Baumgart, B., Flagg, W., van Ingen, C., Joste, K., Manasse, M., Shah, M.: Disk failure investigations at the internet archive. Work-in-Progess Session, NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2006) (2006)
Seaman, P.H., Lind, R.A., Wilson, T.L.: On teleprocessing system design: part iv an analysis of auxiliary-storage activity. IBM Syst. J. 5(3), 158–170 (1966)
Stage, A., Setzer, T.: Network-aware migration control and scheduling of differentiated virtual machine workloads. In: Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, pp. 9–14. IEEE Computer Society (2009)
Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach. IEEE Trans. Parallel Distrib. Syst. 19(11), 1458–1472 (2008)
Acknowledgments
The project entitled Cyfrowe Repozytorium DOkumentów CREDO (Digital Document Repository CREDO) is co-financed by the European Union through the European Regional Development Fund under the Operational Programme ‘Innovative Economy’ for the years 2007–2013, Priority Axis 1 – Research and development of modern technologies, Grant No. WND-DEM-1-385/00.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pałka, P., Śliwiński, T., Traczyk, T., Ogryczak, W. (2016). Persistence Management in Digital Document Repository. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)