The Long Decade of Digital Preservation in Heritage Institutions in the Czech Republic: 2002–2014

This paper aims to give state-of-the-art information about digital preservation activities in the Czech Republic during the last decade to an English-speaking audience. We briefly describe major phases of the “digital” projects. These were mainly in libraries, with some references to museums, galleries and archives. We focus on aspects related to the preservation of collected born-digital and digitised content. Even now, digital preservation activities in heritage institutions are often on the periphery of the interest of all stakeholders and the infrastructure supporting digital preservation of data in heritage institutions is not well financed or coordinated. Even though the “long decade”, which lasted from the dramatic events of 2002 until approximately 2014, saw a number of successful projects creating digital data in Czech libraries, the handful of projects which were in part focused on digital preservation were not flexible enough to accommodate user requirements and were failing to meet expectations. There is still much room for further development in the area of long-term preservation of digital data in the Czech Republic. This article is a shortened version of one of the analyses written under the “Strategy of the research, development and innovation for the years 2010–2015” program of the Moravian Library in Brno, Czech Republic. Received 22 July 2014 | Revision received 16 January 2015 | Accepted 27 January 2015 Correspondence should be addressed to Jan Hutař, Archives New Zealand, Te Rua Mahara o te Kāwanatanga, 10 Mulgrave Street, Thorndon, Wellington 6011, New Zealand. Email: jan.hutar@dia.govt.nz The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution (UK) Licence, version 2.0. For details please see http://creativecommons.org/licenses/by/2.0/uk/ International Journal of Digital Curation 2015, Vol. 10, Iss. 1, 173–183 173 http://dx.doi.org/10.2218/ijdc.v10i1.324 DOI: 10.2218/ijdc.v10i1.324 174 | The Long Decade of Digital Preservation doi:10.2218/ijdc.v10i1.324


Introduction
When the waters of the Vltava River started to rise in the summer of 2002, nobody expected the unprecedented catastrophe.First to be flooded was the Zbraslav Chateau, surrounded by the river several kilometres south of Prague, where the cellars housed a vast collection of Czech plastic art, with thousands of collection items.The historical plans describing the floods of the 19th century did not help the managers of the National Gallery, which housed its collection both in Zbraslav Chateau and in the centre of the city in St. Agnes' monastery.For managers of libraries, archives and museums around the country the descriptions of the floods of the 19th century were useless: the plans dealt with regular 100 year high waters, but what came from the south in 2002 was the 1,000 year water.This was caused, among other things, by inconsiderate forestry and land management during the 20th century.A number of heritage institutions spent the next ten years recovering from this nightmare; defrosting the flooded collections, relocating them, building new storage, and also digitizing damaged material.The advance of the internet and digital technologies coincided with this moment, and the shaken employees of the heritage institutions realized the preservation potential in digitization.
This paper aims to show to an English-speaking audience what preceded and followed the damaging flood in 2002 in the area of digitisation and digital preservation in Czech libraries.There is a significant lack of any English-written information about the Czech effort in the field of digital preservation and this document shall fill this gap.We outline the periods of development, describe major projects and refer briefly to the smaller ones.For each period we describe both the goals of digitization and collecting of digital data, and the levels and methods of preservation.The information regarding other heritage institutions, like archives or museums, will provide the reader with broader context.At the beginning of 2014 it seemed an ideal point to undertake an impartial review of the events, for a number of reasons: firstly, in 2012 a major Ministry of Culture digitization funding program put a stop to the decade-long practice of only financing hybrid digitization (that is, the transfer of content into both digital and microfilm media) and moved to new digitization standards.Secondly, two projects are scheduled to complete in 2014, which are financed from EU Structural Funds and are crucial for the national preservation infrastructure: the National Digital Library and National Digital Archive.Thirdly, 2013 saw the commencement of regional digitization projects, which bring significant amount of digitization technologies to regional government-funded institutions.

20th Century: Early Digitization Period
Before the year 2000 digital heritage data did not enter fully into Czech Libraries.Digital technologies were used in the area of collection management and cataloguing.Digitization started in the mid-1990s at the National Library1 , with the digitization of old prints and manuscripts (later Manuscriptorium 2 ).Small and isolated projects also doi:10.2218/ijdc.v10i1.324Hutař and Melichar | 175 digitized specially selected collections or items for the purposes of publishing and exhibiting.Initially, under the UNESCO Memory of the World programme, the National Library published the world's first completely digitised manuscript on CD-ROM.It was the 13th Century Czech music book Antiphonarium Sedlecense.Subsequent similar projects were financed ad hoc or from research grants.At this time, digitization and metadata standards were either non-existent or nascent and the digitized content was not published online but on offline CD-ROMs with a proprietary viewer application.In 2000 the Ministry of Culture created the VISK digitization support program, which aimed to finance digitization in libraries more systematically, with a new focus on fragile and deteriorated paper documents (19th Century newspapers).3Digitization of old manuscripts was also part of VISK from the outset.In the same year, the National Library also commenced its web archiving project.The projects did not consider preservation of the digitized images (Knoll and Psohlavec, 2002), this only arose later on and it was limited to bit level preservation of the files.In 2002 it was clear to many of those involved that preserving digital content long term in vast amounts is not a trivial issue and many preferred microfilm as a more reliable solution (Polišenský, 2002).The metadata generated in digitization was limited to descriptive metadata and was stored in various formats and technologies: in DOBM SGLM4 , MASTER+5 schemas, both designed by the National Library.Only three elements of technical metadata were captured and they were created at the entity level, not the file level.The main aim was to provide information about the scanning process (hardware, software, resolution etc.) in order to render the file with true colours.This is important for manuscript illuminations among other things.Preservation was not considered by the data producers.
By the end of the 1990s the National Library used both optical media and magnetic tape for storage; this was unique among the Czech heritage institutions.The technology used SAM FS (Storage Archive Manager File System) with locally developed data management application layer (AIP safe) using AIT2 and AIT3 tapes.The management of the technology was provided as a service by an external company.The storage system had limited functionality, far too long a response time, little flexibility for metadata creation and management, no data consistency checks, low security and intrusion protection and limited disaster recovery provision (Knoll, Polišenský, and Uhlíř, 2009).A robotic tape library was used for both manuscripts and 19th Century newspaper data.Manuscripts were also stored in a state-of-the-art CD archive, with controlled media policies, continuous media quality checks, backups etc.
The National Archive of the Czech Republic had to face the challenges of born digital items as early as the 1990s, when transfers from government agencies were already arriving in digital form on floppy disks (Crossczech, 2010).Prior to 2002 the employees of the National Archive were already informed in digital preservation technologies and standards from their visits abroad and, in one research project, suggested building a national centre for digital preservation (Macek and Wanner, 2003).Their efforts generated no significant response initially, and adopting standards like EAD in XML, VRA Core, or CDMA remained beyond the horizon for most Czech archives, museums and galleries.doi:10.2218/ijdc.v10i1.324

2002-2011: Preservation Digitization Period
The floods of 2002 significantly advanced digitization efforts in all Czech heritage institutions.Managers of the collections realized that not only gradual deterioration of paper but also dramatic climatic events can destroy the historical value of vast collections.Digitization for preservation was one of the ways to improve the life expectancy of the documents.VISK funding opened up systematic financing to more institutions and the patrons of the libraries were happy to use the two newly-created content management systems (Manuscriptorium6 and Kramerius).The latter, Kramerius, was an open source system7 slowly spreading to many Czech heritage institutions.It was designed mainly for 19th/20th Century newspapers and monographs, along with sound and video documents (this functionality was never used though).All the institutions receiving funding from the VISK program had to provide the data and metadata in defined formats to the National Library, where they were stored in Kramerius.The system was able to use only two proprietary descriptive metadata schemas (for monographs and periodicals).Both schemas were enriched in 2008 by basic technical and preservation information in PREMIS and MIX for images.At the start of the National Digital Library project in 2011 the National Library housed almost 10 million scanned pages created across the country through these VISK programs, as well as several hundred terabytes of web archive data.The technical storage infrastructure was consolidated and virtualized, with two mirrored locations each housing disk arrays and a tape library (LTO4 WORM) and one copy of the data on offline tapes, using HSM technology and processes like media replacement -thus achieving bit level preservation.However, prior to the National Digital Library project, the National Library did not have a system for ingest and management of the archival data.Being responsible for collecting the digitized masters from many suppliers and all libraries in the country posed significant risks to data security.One of the problems was that in-progress, unfinished and unchecked data stayed in preliminary storage for too long and the processes were not sufficiently formalized to enable secure management for large amounts of incoming data.In 2006 the National library gradually began to establish a digital preservation department to review the state of current best practice in long-term preservation, to introduce digital preservation to the library and especially into the digitisation function, to audit the processes in the library and to prepare for the National Digital Library project.The national digitization infrastructure was strengthened by the central digitization registry8 and the use of persistent identifiers -Handles first and from 2011, URN:NBNs.The digitisation registry started as a small project for tracking monographs and periodicals digitised in different Czech libraries, archives and museums to avoid duplication of effort in digitising identical documents.Prior to the National Digital Library project the metadata was still being created in proprietary schemas for monographs and periodicals, accompanied by PREMIS andMIX since 2008. doi:10.2218/ijdc.v10i1.324Hutař and Melichar | 177 The EU initiative i2010 Digital Libraries stimulated the formation of a Czech strategic plan called 'Strategy of the permanent preservation of the library collections of the traditional and electronic documents in Czech Republic until the year of 2010'.Published by the Ministry of Culture in 2005, in this strategic document we see for the first time a focus on digital preservation.Unfortunately, the strategy from 2005 remained a paper exercise and digital preservation plans were not realised.Later on the National Library started to take part in the international preservation community, mainly through participating as a member institution in the DigitalPreservationEurope project (DPE), Living Web Archives (LiWA), and also joining some activities of other related projects, like CASPAR (Šimko, Máša and Giaretta, 2009).In 2009 the Czech translation of DPE Planning Tool for Trusted Electronic Repositories -PLATTER (Rosenthal, Blekinge-Rasmussen and Hutař, 2009) was published.Combined with the results of the DRAMBORA audit in 2007 in the National Library, digital preservation slowly gained a higher profile in the Czech and Slovak library and archives community.
Other significant financing sources for digitization in libraries in this period were the Norway funds (EEA Grants), which supported a number of projects such as 'HISPRA -Pragensia historica' in the Prague Municipal Library (Měřínská, 2010) and 'Ad Fontes' in the Prague Municipal archives.These projects created digitization centres and generated and provided access to millions of digital pages, but only strived to achieve basic bit level preservation, using disc arrays, tape libraries or UDO II disks.Logical preservation, which would include processes like risk assessment, preservation planning, technical metadata extraction, and format validation, was not pursued at all.In addition to sufficiently financed projects, a number of smaller activities created important results.These focused on digitization and access to specific collections (like maps or audio documents in Moravian Library in Brno). 9Between 2005 and 2010 the funding from the Ministry of Culture through the VISK grants decreased and has never met the amount originally anticipated.
The National Archive was lagging behind the Czech national library significantly after 2002.Digitisation as business-as-usual was established very late there, around 2005.Czech archives started slowly, with insufficient funding and a lack of qualified staff being the major limiting factors at first.Despite this, by 2010 the National Archive had scanned important collections, such as census information, population registries, and special collections like the K.H.Frank archives (Farkas, 2012).Although the digitization in Czech archives was not as well coordinated and systematically financed as in the Czech libraries, during the period of 2002-2012 all the archives in the country created huge amounts of digitized data, which is mostly accessible through a portal called Monasterium10 (Křečková, 2012).From 2010 regional archives started to publish their birth, death and marriage registries from the Middle Ages to the 19th Century on the Internet, usually via portals shared and owned by few cooperating archives.This was welcomed by the user community, especially genealogists.Unfortunately, there is no central website for birth, death and marriage information yet.The preservation of the digitized masters was left to individual archives, which means that several archives built their own digital data storage systems, focusing on bit level preservation only.Small archives and libraries had very poor storage management; it was not uncommon for them to keep master images on CD-ROMs alone, even after 2005.In 2004 the National Archive focused on the standards of formats, metadata and data management for digital documents in archives and the government, which was slowly starting to move into the e-government era, and asked the Ministry of Interior to formulate a clear strategy for doi:10.2218/ijdc.v10i1.324building the National Digital Archive.In 2008 a selected private company began the research and technological project called the National Digital Archive, which was to be financed through the EU Structural Funds Integrated Operational Program11 .
This period also saw a rise in interest in digitization and digital preservation in the museums and galleries community.A major project called CITEM12 aimed, among other things, to build a preservation repository for this type of heritage institution.However, an inability to secure sufficient funds put this project to bed temporarily.Even though the dream of a central museums repository has not been realised, the project brought digital preservation to the attention of this community.

2011-2014: Mass Digitization Period and Decrease in Barriers
The years 2011-2014 shifted the Czech heritage institutions that were striving to preserve digital heritage into a new phase.
First, Czech heritage institutions entered the mass digitization phase.The two major projects were the National Digital Library (planning to scan 26 million pages and build a certified long-term preservation system for the core of the Czech library digital heritage in three years) and the National Digital Archive (planning to build a certified preservation system for born digital archival data, as the key component of the egovernment effort).The National Library has also signed a contract with Google which would digitize more than 200,000 volumes of historical library documents published between the years 1620 and 1900 (Národní knihovna České republiky, 2011).
Second, the EU Structural Funds brought support for digitization to the Czech regional governments, which have started to build regional data centres, primarily for the purposes of e-government, but also open for use by local heritage institutions.As a result, automatic scanning technologies have become more easily accessible to regional institutions.
While the two major projects mentioned above are building mass digitization centres, in the regions we see smaller digitization projects with similar scanning technologies.The commercial suppliers of the regional digitization projects claim to run OAIS (ISO 14721:2012)13 compliant solutions of their own development (like ICZ DESA 14 ).The main focus of these projects remains on electronic records management and the situation is different in each region.In some regions the libraries only get scanning facilities from the project's financing, in others the heritage institutions become more involved and can finance staff as well, or use the storage infrastructure.In general, the regional projects attracted more attention to digital preservation issues and both the National Library and National Archive have fielded many questions and requests for support and guidance.
Both national projects have in scope the establishment of an OAIS-compliant and certified (ISO 16363:2012) long-term preservation repository.However, the National Digital Archive project was stopped in early 2014 for reasons related to complications in the public procurement process for IT infrastructure and the preservation system.In doi:10.2218/ijdc.v10i1.324Hutař and Melichar | 179 addition to the supplier of the selected solution for the National Digital Library project (AIP Safe15 ), the National Digital Archive received bids from Slovak company Tempest, which claims that they developed an OAIS-compliant solution for the University Library in Bratislava16 , and from a number of other local companies.The bids from local companies included an IBM technology solution, based on open-source systems like RODA17 , or commercial solutions like SDB Tessella (re-branded as Preservica18 in 2014).Unfortunately, the public procurement process had to be repeated twice and as the project itself was supposed to finish in the middle of 2014, there was no time left.Instead, the National Archive will try to develop a modest solution based around an open source system: Archivematica19 .It will be not funded with EU money.
The National Digital Library project is building a preservation system based on a system supplied by the local supplier AIP Safe.AIP Safe is an ECM system, used in insurance companies and other midsize companies.It was previously used in the National Library in the late 1990s but was abandoned at that time because it was not seen as flexible enough.The tender process for the National Digital Library project became controversial after some National Library employees and also the Library Board claimed that the process was not fair and that the winning bid would never be able to fulfil the tender requirements.These claims were not accepted and half of the project team decided to leave the Library in the end of 2011 (Menzelová, 2012).Out of the four bids in the tender, one included the Rosetta system by Ex Libris20 , two included SDB by Tessella, and one AIP Safe.The solution description for the project was not published, however the AIP Safe producer promised to implement all requirements from the tender.The system has no internal metadata data model and instead uses a digitization metadata profile proposed by the National Library (METS, MIX, PREMIS).21All of the system's validation tools require this metadata for any data stored in this system.This is why the system does not yet house all the previously created data from the VISK projects, data in other structures, such as born-digital data from the web archive, and all of the national data as originally planned.Despite the current situation, we expect that the National Library will continue to open their long-term preservation system to other data types and formats in addition to the data already scanned through the National Digital Library project.We must wait until early 2015 for the final report from the National Digital Library project to see its results.
The above-mentioned Fedora-based Kramerius system, which is being used as a digital library platform by many Czech libraries, will have a preservation layer added in 2014.The intention is to provide an open source OAIS-compliant repository to all institutions already using it as their digital library.A number of Czech institutions are experimenting with Archivematica and RODA already, but none of the projects are in the production phase.The project to extend Kramerius in the direction of digital preservation 22 has not yet published any analysis; it is very likely they will use RODA or Archivematica as the preservation core.
If we look at the Czech university sector we see DSpace and some other repository systems (Invenio, EPrints) are widely used, but long-term preservation is not the doi:10.2218/ijdc.v10i1.324primary concern in this community.The DSpace user community focuses more on access and less on the preservation aspects.Of the other research and scientific projects, we should mention the large storage infrastructure for science built in the CESNET project 23 , but long-term archiving is not the primary goal there either.In summary, we do not expect to have a fully certified, operational and OAIS-compatible long-term preservation repository for science data and/or cultural heritage data any time soon in the Czech Republic.
The significant achievement of this period is the translation and publication of a number of relevant standards.The Czech Office for Standards, Metrology and Testing prepared publications of the OAIS, ISO 16363 and other records management standards in the Czech language.Other communities prepared a translation of the Data Seal of Approval guidelines24 .This will certainly support and enlarge the preservation community in heritage institutions as well.

Summary and What's Next
Above we have described the major digitization activities and projects in Czech libraries and archives.Most of the financial support for projects that could potentially build certified flexible long-term preservation systems available to multiple institutions (National Digital Library and National Digital Archive projects) came from European structural funds.Other projects financed from the Czech public budget were usually too small to have such high ambitions.However, the above mentioned large projects financed from the EU, including the regional data centres, have turned out to be much politicized.As the public tenders for the system integrators come closer to being announced, expert-driven projects change to politicized projects in which local lobbies and interest groups come into play and overtake the steering of the projects.On the other hand, small projects are driven by librarians' interests in finding a solution for the library's needs.But it is fair to say that as the size of projects and available funding grows, non-professional influences increase rapidly and can steer the tender and results to a solution that does not address original needs fully.
In Czech heritage institutions we find only limited understanding of the distinctions between logical and bit level preservation.The people from the Ministry of Culture responsible for the libraries, and thus the library managers as well, do not emphasize long-term preservation issues sufficiently.The primary concern is still about future finances for data creation (digitisation), clearing of access rights for the digitized content and outlining the future EU Structural Funds programs.The issue of opening up the digitized collection to wider audiences outside the library premises via changes in copyright law is hindered by the need to leverage the financial resources invested in the digitization.Similarly, the need for ongoing support of the long-term digital preservation of the digitised data is still not recognised by many senior managers at heritage institutions in Czech Republic.They do not see format obsolescence, non-valid data etc. as a real problem happening as we speak.There is still a great need for raising awareness in this field.We have to stress that since 2006, when the National Library joined the DigitalPreservationEurope FP6 project, the knowledge and understanding of digital preservation issues has been enhanced significantly.Unfortunately, it was only doi:10.2218/ijdc.v10i1.324Hutař and Melichar | 181 on the level of individual professionals, with limited impact on national or multi institutional cooperative effort.
It is not only the heritage institution managers and their ministerial colleagues who are largely unaware of long-term preservation issues -it is also the case for the institutions themselves.Czech heritage institutions' mission statements and mandates usually do not refer to digital data explicitly and the need to preserve them in the long term.Then the institutions have no space to push their funding bodies to continually finance the preservation of digital data.Most mandate documents do not talk about digital at all.Czech institutions do not routinely publish their preservation strategies or plans, nor create or publish documentation of the processes and trustworthiness in terms of ISO 16363:2012.Most of the heritage institutions have no data management systems in place and do not have controlled, standardized and formalized processes in place for management of data.Standardization and quality assurance would improve not only the efficiency of the budgetary expenditures, but also reduce the risk of data loss.
The first thing which could lead to written documentation and policies will be the ISO 16363 certification process for the National Library digital repository, planned to start in 2014.This will surely be a wakeup call for all interested parties, including the National Archive and other big libraries in the country that plan to certify their digital repository.
In some cases the archives and museums consider the scanned data as having documentary value only and nothing obliges the state to continue to finance their preservation, even though the creation of these collections consumed a decade of work.There is no explicit financing aimed at long-term preservation, unlike for digitisation, in the Ministry of Culture.A major risk of the few preservation projects is that large financing is often available at the beginning of the projects and the infrastructure created in this phase is then left without much additional financing.While the preservation of digital content means continuous work, it also requires continuing and clearly targeted ongoing financing.That is not happening and there is no will from the government to change this.We may only hope this approach will change with more born-digital documents coming into repositories, for example from publishers or from government agencies in archives.Any funding program supporting digitisation, preservation and preservation actions should ask the applicants also to provide TCO analyses, and be professional in the planning and standardization of the processes in the institutions and in quality assurance.Otherwise this money might be wasted in inefficient outsourcing and service purchases.
Finally, there is still a need for the systematic support of education in the area of digital preservation or knowledge sharing.If Czech institutions want to use open source archiving technologies they should think about building a platform for sharing experience.Establishing a network like this, education, raising awareness, and trying to get continuous funding for nationwide and centralized solutions should be priorities for the future.
Despite the list of negatives above, we share positive expectations.With beating heart we are following the activity of several institutions and projects25 aimed at creating informal consortia and starting to cooperate in using the large storage capacities of CESNET and university computing centres for creating services for AIP replication and also coordinating their efforts at extending their services to a full-featured preservation system.This activity arising from the grass roots is in its infancy still, but already now we can feel the positive zeal of the parties involved.Also, the individual activities in doi:10.2218/ijdc.v10i1.324several areas (like standards translations and certification efforts, testing of the open source systems etc.) fill us with hope that the end of the next decade will see several certified preservation systems securely housing petabytes of data.
The above chronological account is not intended to be a detailed description of all activities which could be mentioned.For further details the reader may refer to the Czech language sources referenced.