Are You Ready? Assessing Whether Organisations are Prepared for Digital Preservation

In early 2009 the Planets project undertook a survey of national libraries, archives and other content-holding organisations in Europe to better understand the organisations’ digital preservation activities and needs and to ensure that Planets’ technology and services are designed to meet them. Over 200 responses were received including a cross-section of major libraries and archives especially in Europe. The results provide a snapshot of organisations’ readiness to preserve digital collections for the future.The survey revealed a high level of awareness of the challenges of digital preservation within organisations. Findings indicated that approximately half of those organisations surveyed have taken measures to develop digital preservation policies and to budget for it, while a majority have incorporated digital preservation into their organisational planning. Organisations predict that within a decade they will need to store large quantities of data in a wide range of formats from a variety of sources; three quarters of them are looking to invest in a solution within the next two years. However, the findings also point to varying degrees of readiness. Organisations with a digital preservation policy are significantly further advanced in their work to preserve digital collections for the long-term than others.


Introduction
In the last few years, digital preservation has developed from a theoretical discipline to one where real solutions are starting to be developed and implemented. While more research is still needed, practical steps are now possible. But, how ready are libraries, archives and related organisations to begin taking those practical steps?
In early 2009 the Planets project conducted an online survey to assess the state of readiness of archives, libraries and other organisations interested in digital preservation. The survey aimed to understand the state of digital preservation and the digital preservation needs of European organisations that create or hold digital content.
Previous studies have provided snapshots of the state of digital preservation. In 2005, the Digital Preservation Coalition (Waller and Sharpe, 2006) surveyed 104 organisations in the UK. These included memory institutions, government departments, research institutions and companies in a range of sectors. The results showed that there was considerable confusion about how to address the problem of digital preservation. While 41 percent of respondents said there was a need to keep digital information alive for 50 years or more and 52 percent said they had a high level of commitment to digital preservation, just 18 percent had a strategy in place and 20 percent funding. Half (55 percent) were unclear about roles and responsibilities, half (55 percent) had not yet assessed the volumes of material they needed to preserve and half (49 percent) did not know the life spans of digital data. The same proportion (50 percent) stated that they printed out hard copies of digital information as a means to preserve it. The study revealed the scale of the problem (with a growing volume of digital information of increasing value) and the lack of good solutions in place in organisations. It concluded that despite the high levels of awareness: "the level of implementation of digital preservation solutions is significantly lower than would be expected given the awareness and commitment that were measured." In 2006 and 2007, DigitalPreservationEurope surveyed 172 national libraries, archives, research institutions, ICT and media companies, and other organisations in Europe. The results similarly pointed to high levels of awareness. Seventy-seven percent of respondents considered long-term preservation to be a key strategic priority. However, only one-third (35 percent) had implemented a trusted repository. The findings demonstrated that where respondents had digital preservation systems in place these were a mix of open-source, commercial and software developed in-house. Organisations also considered cooperation across organisations to be important to digital preservation.
The Planets survey aimed to build on the earlier surveys and determine how awareness has grown, how far organisations are along the path to implementing a full digital preservation solution, what unfulfilled requirements organisations have and what barriers exist that hinder the adoption of solutions.

Method
The Planets survey of long-term management of digital information was conducted in February and March 2009 in the form of an on-line questionnaire. The survey was targeted at organisations and individuals with an interest in retaining and accessing digital content in the long term.
Invitations to participate in the survey were sent to around 2000 individuals, whose role could encompass the long-term maintenance of digital information, in libraries, archives and other organisations across Europe. These individuals were selected from a number of sources. Over half were individuals known to Tessella (who undertook the research on behalf of Planets) as having an interest in digital archiving. The majority of the remainder were individuals who had registered to receive updates of Planets' activities, and the final group was people who were personally contacted by members of the Planets Scientific Board and Executive Steering Committee and invited to participate. Follow-up telephone calls were made to 120 of these individuals to encourage them to take part in the survey. In particular, individuals in the 96 national archives and libraries in Europe, as listed on UNESCO's website, were targeted.
As well, initial announcements about the survey were placed on approximately 30 mailing lists related to digital preservation and followed up by two reminders during the lifetime of the survey. The lists included international digital preservation mailing lists such as PADI (Preserving Access to Digital Information), and specialist mailing lists targeting sub-sections of the digital preservation community, such as research institutes, government, and film and sound archives.
In addition, the survey was publicised through intermediary organisations and projects in EC member countries.
Digital Preservation Europe, the Digital Curation Centre, and the Caspar, Shaman and Protage digital preservation projects were all asked to cascade notices on Planets' behalf and the Council of European National Libraries (CENL), International Council on Archives (ICA) and the Association of European Research Libraries (LIBER) were asked to disseminate the message to their members. Finally, a news item about the survey and inviting participation was placed on Planets' website.
Respondents were promised confidentiality and anonymity in the introduction to the survey. The survey comprised 29 questions which took up to half an hour to complete, therefore it is not surprising that not everyone completed all questions.

Results
Two hundred and six responses were received before the survey closed.

Distribution of Responses
Countries. Fifty-six percent (115) of responses were from European Union countries, and 11 percent (23) responses from European countries outside the EU. Sixteen percent (33) came from Canada and the USA. Just three percent (six) of responses came from the rest of the world. Fourteen percent (29) did not disclose their country. Ten or more responses were received from: the UK (54), USA (26), Germany (16), Switzerland (15), and Netherlands (10). Organisation Types. Forty-one percent (75) of responses represented libraries and 30 percent (55) archives. Fifteen percent (28) were from government departments and the public sector. Seven percent (12) were from suppliers and vendors and four percent (eight) from commercial organisations. Three percent (five) were from museums. See Figure 1 for the full breakdown. Respondents professions. Respondents came from a wide range of professional backgrounds. Fifteen percent stated that they specialise in digital preservation. Twentytwo percent work in curation and records management, 16 percent work in preservation in general, and 16 percent work in IT. The remainder work in a variety of professions including management, research, and those that produce digital information.

Digital Information Requiring Preservation
The survey aimed to establish the volume and types of digital content organisations need to hold now and predict that they will need to hold over the next ten years and the source systems this content will be derived from.

Data volumes.
Respondents were presented with categories of data volumes (from less than one terabyte (TB) to over one petabyte (PB)) and asked to indicate the volume of digital content they store now and the volume they expect to store in two, five and ten years' time. Eighty-seven percent of respondents hold less than 100 TB of content now, and the median volume of content is less than 20 TB (see Table 1).

Data
Volumes Now

Data Volumes in 2019
Mean 1 150 TB 1.0 PB Median 1-20 TB 500 TB -1 PB Mode 1-20 TB > 1PB In ten years' time, 70 percent of respondents expect to hold more than 100 TB, and the median volume of content held is expected to be over 500 TB. Forty-two percent of respondents' organisations expect to hold more than one PB of data in ten years' time. Ten percent store nothing now; this is expected to fall to two percent in two years' time (see Figure 2).

Figure 2: Growth in Volumes of Digital Content that
Organisations Intend to Store over the next Ten Years (129 Responses) 1 The mean was calculated using the mid-point of each band of data volumes and a value of 2 PB for the > 1 PB band; it is given to 2 significant figures. National archives and national libraries hold the largest volumes of data: a mean 1 of 190 TB in 2009 (200 TB and 180 TB respectively). They expect to hold a mean of 1.4 PB of digital information in 2019. Types of Digital Information. Over 80 percent of organisations indicated that they currently have a need to preserve documents and images (see Table 2) and this rises to over 95 percent in ten years' time. Within ten years, over 70 percent of organisations expect to need to preserve video, audio, databases, websites and email. Almost half (49 percent) of organisations already have a need to preserve databases and by 2019, 85 percent expect to need to preserve them.  In 2019 the percentage of libraries storing websites (99 percent), eBooks (81 percent), and eJournals (81 percent) is significantly higher (at the 99 percent confidence level) than the average of all types of organisations. Source Systems. Organisations receive content from a range of source systems. Those used by more than half of the respondents were: file systems (77 percent), document scanning programmes (58 percent), the internet (55 percent), electronic document management systems (55 percent), email systems (54 percent), and media digitisation programmes (54 percent). The survey showed that niche or domain-specific source systems are used by far fewer organisations; CAD is used by 29 percent and lab systems by 18 percent of respondents' organisations. Libraries concentrate on archiving the internet (77 percent) and media digitisation programmes (75 percent), whereas archives have more of a focus on the systems used to manage organisations: email (64 percent), EDMS (66 percent), ERMS (52 percent).

Awareness of Digital Preservation.
Ninety-three percent of respondents stated that their organisation is aware of the challenges presented by digital preservation. Twenty-four percent of respondents currently have a solution in place or planned and over half (52 percent Figure 3). However only one-quarter (27 percent) of government departments and the public sector in general have a digital preservation policy in place. A high proportion of commercial organisations (88 percent) and suppliers and vendors (60 percent) have digital preservation policies, although these results should be treated with caution due to the small size (eight commercial organisations and 10 suppliers and vendors) and potentially unrepresentative nature of the sample.  Timescales for Investment. The majority (77 percent) of organisations plan to invest in a solution in the next two years. One third (32 percent) of organisations are currently investing in a digital preservation solution and two-fifths (45 percent) are looking to make an investment in the next six months to two years. One-fifth (23 percent) do not plan to invest for over two years.

Digital Preservation Implementations
Implementation Phases. Respondents were asked to describe the stage that their organisation was at in working towards a digital preservation solution. They were allowed to select more than one option from the six options presented to them, resulting in the total percentage exceeding 100 percent. Eighty-five percent of organisations stated that they are working towards a solution or have one in place. The remaining fifteen percent of respondents have no plans to deal with the long-term management of digital content. Of those working towards a solution, 27 percent are assessing their needs using consultancy and 22 percent with a prototype; 13 percent are tendering for a solution; 48 percent have a long term solution in development and seven percent already have one in place.
Many of the respondents were at more than one stage in working towards a long-term solution. For example, of those who already have a long-term solution in place, 18 percent are assessing their needs and requirements and 32 percent are looking to improve or extend their current solution. Solution Implementation. Respondents were asked about how they expect to implement their solution (respondents were allowed to select more than one answer resulting in the total exceeding 100 percent), who they expect to implement it and whether they use or plan to use open source or proprietary software. Two-thirds (64 percent) of organisations are integrating components into a custom solution, with the remainder evenly split between developing a custom solution (33 percent) and using an off-the-shelf package (32 percent). Respondents are combining these approaches, with half (50 percent) of those developing a custom solution also integrating components into that solution and two-fifths (40 percent) of those using an off-the-shelf package also integrating components into a custom solution. Approximately onetenth (11 percent) of organisations are developing a bespoke or custom solution from scratch, i.e. without using existing components or off-the-shelf software packages.

Long-Term Digital Information Management
Sixty-nine percent of respondents indicated that they expect to use an in-house team to implement their solution. Forty-six percent said they expect to use a thirdparty development team and 21 percent a third party system integrator. Forty-five percent of respondents are using more than one type of implementer. Of those using an in-house software team, 34 percent also expect to use a third-party development team and 12 percent a third-party system integrator.
Over half (57 percent) of respondents currently use a mixture of open source and proprietary software, with the rest of the responses even split between open-source only (13 percent), proprietary only (14 percent), and undecided (16 percent). When looking towards the future, the proportion of respondents who have not yet decided what type of software they will use increases to 25 percent, the proportion using proprietary only software decreases to two percent and the other proportions remain essentially unchanged (at 59 percent and 14 percent).

Control over Formats.
Twenty-seven percent of respondents indicated they had complete control over the format of content in their digital archives. Two-fifths (42 percent) work with content providers to influence the formats that they will accept, and one-third (31 percent) said they have little or no control and are obliged to accept the formats provided to them.
Thirty-eight percent of archives have complete control over formats, compared with 13 percent of libraries, and 45 percent of libraries having no control, compared with 27 percent of archives. This difference is even more marked when just national libraries and national archives are compared: fourteen times more (56 percent versus four percent) national archives than national libraries state that they can completely control the formats of the content they receive.

Digital Repositories Important Capabilities for a Digital Archive.
Respondents were asked to rate from 1 to 5 how important they thought various capabilities of a long-term digital information management system were. The ratings scale was: 1 = not applicable, 2 = least important, 5 = critical. So, any capability rated ≥ 3 is deemed important. The mean ratings assigned by respondents are given in Table 3, ordered by the mean rating from highest to lowest.
For archives, the three key capabilities are (with their mean ratings): maintains authenticity, reliability and integrity of records (3.8), ensures records are accessible for more than 50 years (3.5), and plans the preservation of content to deal with technical obsolescence (3.5). For libraries, the three key capabilities are: maintains authenticity, reliability and integrity of records (3.8), is able to store many different types of content (3.7), and checks records have not been damaged. For government departments and the public sector in general, the three key capabilities are: maintains authenticity, reliability and integrity of records (3.8), plans the preservation of content to deal with technical obsolescence (3.6) and complies with established data or digital information management standards (3.6).

Scalability of Digital Archives.
Respondents were asked to rate the importance of scalability for digital archives, using a scale from 1 (not important) to 5 (critical). The mean ratings assigned to each aspect of scalability were: 3.8 for scalable to large volumes of data (petabytes of content), 3.7 for scalable to high ingest rates (millions of objects per year), and 3.1 for scalable to high access rates (hundreds of objects per second). Significantly (at the 95 percent confidence level), more national libraries (73 percent) rate scalability of content as critical than national archives (27 percent).

Metadata Standards.
The survey investigated the metadata standards used by organisations to describe stored digital objects. Dublin Core was the most popular standard with 51 percent of respondents already using it and 18 percent planning to use it. MARC came next with 34 percent already using it and 5 percent planning to, followed by ISAD(G) with 28 percent already using it and 10 percent planning to (see Figure 4).

Policy and Implementation
The overall results were further investigated by crosscorrelating them with the information about which organisations have a digital preservation policy. Organisations with a digital preservation policy are less likely (three percent versus 11 percent) to have no experience or be unaware of the challenges presented by digital preservation and nearly three times more likely (36 percent versus 13 percent) to have a solution in place or planned. In addition, organisations with a policy are more likely to include digital preservation in their operational planning (92 percent versus 60 percent), their business continuity (85 percent versus 56 percent) and financial planning (78 percent versus 45 percent). Also, they are three times more likely to have a budget for digital preservation in place (72 percent versus 23 percent).
Organisations with a policy are four times more likely (51 percent versus 12 percent) to be investing in a solution now and just 13 percent expect to leave it longer than two years to invest, compared with 34 percent for those without a policy. Over three times (20 percent versus six percent) as many organisations without a digital preservation policy, as with, have no plans for the long-term management of digital information. Conversely, over three times (25 percent versus 7 percent) as many organisations with a digital preservation policy, as without, already have a long-term solution.

Discussion
In 2005 there was widespread awareness within the information management community about the need to preserve digital content, but little action had been taken. Four years on, Planets' survey on long-term management of digital information indicates that significant strides have been made, in particular by those organisations that have established a digital preservation policy.
There was a relatively large response to the survey which included a cross-section of the major archives and libraries in Europe. The methods used to publicise the survey and its inclusive nature meant that although its primary target was European organisations, a fifth of responses were from outside Europe.
Digital preservation is not just a concern for archiving specialists in memory institutions such as archives and libraries. The ubiquity of digital information and its importance in business, governmental and private life means that preservation of digital content is an issue that affects us all. Therefore, it is good to see a broad range of organisations responding to the survey, and in particular that some digital information producers are taking an interest in digital preservation.
Digital preservation is maturing as a discipline in its own right, so it is unsurprising that fifteen percent of respondents specialise in this area.
However, as demonstrated by respondents' roles, many of those involved in digital preservation still come from the more traditional backgrounds of preservation, curation, records management and IT. Digital preservation is also drawing the attention of senior management; eleven percent of respondents were directors or heads of IT. The findings also indicated that producers of digital content (four percent of respondents) are beginning to take an interest in the issue.
In contrast to the 2005 survey (Waller and Sharpe, 2006), organisations now have a clear understanding of the volume of data they must archive. While the current storage needs of most organisations are quite modest, organisations predict a large increase in the volume of content over the next decade. At the same time, respondents need to preserve a wide range of types of digital information from a variety of sources. Almost all organisations expect to need to preserve digital objects not only in "simple" forms such as documents and images where some solutions already exist but also in "complex" forms such as databases where solutions are still in development. Libraries in particular will need to preserve such dynamic content in the future.
Despite this need to deal with objects with behavioral properties, there was less interest expressed in emulation than migration. This may be because emulation is still a subject for research, rather than a practical preservation strategy. However, it does point to a need for education

Already use
Plan to use Heard of Not heard of and understanding about the role of emulation as a preservation strategy. Over nine in ten respondents were aware of the issues and the challenges associated with digital preservation, reinforcing the findings of the earlier surveys. Half of the respondents' organisations had taken the vital first step of developing a digital preservation policy.
Half have allocated a budget. However, where European organisations have a budget, it is five times more likely to be a capital-only one than a revenue-only one. The prevalence of capital over revenue budgets in Europe compared with North America may reflect the fact that many organisations are starting on the road to digital preservation and therefore need a high capital expenditure to put a solution in place. In which case, we would expect the percentage of organisations with a revenue budget to increase over time as the focus switches from the development of a digital preservation solution to its ongoing maintenance, including both the ingest of new material and the management of material already ingested. It is difficult to set a budget for on-going expenditure without experience of what the organisation needs to spend. It may also reflect the situation that many memory organisations operate under funding models where it is easier to obtain grants for individual projects than a long-term commitment from a funding body to support on-going investment.
Although awareness amongst respondents is high, it appears that organisations continue to face barriers to implementing solutions. Just one quarter currently has a solution in place or planned. Whether these barriers are due to lack of knowledge, lack of funding or some other cause, such as low priorities, is not known. Caution should be applied in generalising this result, as those people who responded to a survey on digital preservation are more likely to be aware of the problems of digital preservation in the first place. However, findings indicate that those organisations that do plan to invest, plan to do so within the next two years.
Organisations are familiar with open-source solutions but are less familiar with commercial solutions. They plan to follow a route of component-based development and customisation where a mix-and-match solution is used. Currently, open-source and proprietary software are used equally; however findings indicate increased preference for open-source solutions in future. Such solutions need to be componentised with well-defined interfaces in order to fit in with the pick-and-mix approach used by organisations.
National archives are the most likely to develop, or have developed a custom solution, reflecting the fact that many national archives have pioneered solutions to digital archiving. Conversely, government departments and the public sector are least likely to develop their own custom solution and more likely to integrate components into a custom solution.
The ability of respondents' organisations to preserve digital content for the long term is limited by their ability to control the format of digital material that they need to store, mainly because such content is created externally. National archives are three times more likely to restrict the formats that they will accept than national libraries, suggesting that some digital preservation activities will have to occur before transfer to national archives. Much of the material that is transferred to national archives comes from government departments and the public sector, but this is the group which is least likely to have a digital preservation policy. Therefore these organisations will need to develop such a policy in order to prescribe the process required to transfer digital material to the national archive in an orderly manner, as well as to cover the pre-transfer preservation activities. They may need assistance and education in order to overcome the problems they have.
Respondents are generally agreed about the key capabilities required of a digital preservation system. Such systems must maintain digital information for up to 50 years in such a way as not to damage or corrupt it and so that it can be accessed in future. Other important attributes in choosing a solution are the ability to plan preservation and adherence to standards (although there is less clear agreement on which standards!).
Given the anticipated rises in volume, it is not surprising that scalability is generally regarded as one of the major criteria in assessing solutions. Given that libraries and archives predict that they will have similar levels of digital content in the future, it is surprising that archives are not as concerned about scalability, and scalability to total content in particular, as libraries. It is noticeable that scalability to high access rates is not ranked with the same importance as scalability to high volumes of content and high ingest rates. There are two possible explanations for this. One is that it reflects the fact that some organisations have restrictions on access; for archives this may be that parts of the collection are restricted for a period of time and for libraries this may be that access is restricted to a specific group of users such as on-site visitors to national libraries or members of the university for academic libraries. The other explanation is that it indicates that organisations are preoccupied with ingest and storage and have not yet reached the stage where users are requesting access to large volumes of content, which would again point to the relatively early stage of digital preservation.
The findings indicate that while archive, library and related organisations are making progress towards longterm management of digital content, some are considerably further down the road of implementation than others. The results suggest a divide between those that have established a digital preservation policy and those that have not. The existence of a policy is a critical early step. Organisations with a policy are three times more likely to have a budget and three times more likely to have either a solution in place or one planned for the near future than those without a policy. This points to a need amongst those who are serious about maintaining access to digital content to start by gaining internal consensus about what must be preserved, for how long and for whom as a first step towards establishing an internal business case and getting commitment to the task.
Organisations with a digital preservation policy currently store more data than organisations without a policy, although in ten years' time the difference will have been almost completed eroded away. Similarly, more organisations with a digital preservation policy currently store each of the different types of digital information, but again in ten years' time there is very little difference between the two groups. It appears that organisations with little data in relatively few formats do not prioritise developing a digital preservation policy, whereas organisations facing the challenge of preserving large volumes of valuable content, or content in a wide variety of formats, are taking steps to implement practical solutions. Over the next 10 years the increasing need to preserve digital information is likely to provide an impetus for many to put a digital preservation policy and solution in place.

Conclusions
The survey revealed that many organisations are beginning to make a transition from analyzing the problem to solving it. They remain concerned that mature solutions do not yet exist. Nevertheless, 85 percent of organisations with a digital preservation policy expect to make an investment to create a digital preservation system within two years. Such systems are likely to be componentised, mix-and-match solutions. They will need to be scalable, particularly to handle the predicted large volumes of content, and also to handle high ingest rates. In addition, they will need to handle a wide range of formats from a variety of sources and preserve the information contained therein for up to 50 years.
For organisations without a digital preservation policy, it is expected that the predicted increases in volume of digital information and the range of formats needing to be preserved will provide the impetus to focus on digital preservation and take practical steps to address its challenges.