Making Digital Curation a Systematic Institutional Function

Over the past decade, a rich body of research and practice has emerged under the rubrics of electronic records, digital preservation and digital curation. Most of this work has taken place as research activity (often financed by government agencies) within libraries and information/computer science departments. Many projects focus on one format of information, such as research publications or data, potentially de-contextualizing individual records. Meanwhile, most institutional archives and manuscript repositories, which possess a rich theoretical and practical framework for preserving context among mixed analog materials, have failed to extend their capabilities to digital records. As a result, relatively few institutions have implemented systematic methods to capture, preserve and provide access to the complete range of documentation that end users need to understand and interpret past human activity. The Practical E-Records Method attempts to address this problem by providing easy-to-implement software reviews, guidance/policy templates, and program recommendations that blend digital curation research findings with traditional archival processes and workflows. Using the method discussed in this paper, archives and manuscript repositories can use existing resources to incrementally develop digital curation skills, building a collaborative, expanding program in the process. Archival programs that make digital curation a systematic institutional function will systematically gather, preserve, and provide access to genres of documentation that are contextually-rich and highly susceptible to loss, complementing efforts undertaken by librarians, information scientists and external service providers. Over the next year, the suggested techniques will be tested and refined at the University of Illinois Archives and possibly elsewhere. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre.

The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors.ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre.
On November 19 th , 2009, over 1,000 emails and 3,000 other private documents that had been stolen from the University of East Anglia's Climatic Research Unit were uploaded to a web server in Russia and immediately mirrored across the Internet (Anon., 2010b).Bloggers, journalists, political operatives, government officials and scientists expressed a wide range of opinions concerning these records and the manner in which they were acquired.No consensus emerged regarding what the documents demonstrated concerning the research practices of atmospheric scientists, much less global temperature trends (Monbiot, 2010).But what if the scientists in question had subsequently placed all of their private records into a publicly available research archives?
Whatever one thinks of the case for global warming, the "Climategate" controversy has shown that individuals will go to great lengths to procure content-rich documents, particularly when they believe the records will provide evidence of malfeasance.It also demonstrates that individuals and institutions work against society's interest (not to mention their own) when they try to selectively preserve information to their liking (Pearce, 2010).Even the perception that a party has fabricated or misrepresented evidence will undermine public confidence in his or her conclusions (Crook, 2010).For this reason, individuals and institutions must carefully plan to preserve (or appropriately dispose of) all materials related to a project or activity based on an assessment of the legal, administrative and historical value of the records.This includes electronic materials.Individuals, institutions, and society as a whole need an accurate, complete and usable record of human activities, and an appropriate legal and institutional framework in which to use that record.Without a trustworthy record, people and institutions cannot make informed decisions, verify existing information, evaluate evidence, hold others accountable, construct accurate histories or develop new knowledge.An authentic record does not preserve itself, and even the best-intentioned records creators often lack the resources or expertise to act as permanent custodians for non-current records.Nor can we rely on those who provide the service of temporarily storing and transmitting records to permanently preserve an interpretable record of human activity.

2
Ideally, neutral parties dedicated to records preservation, such as archives, libraries or disciplinary repositories, will take custody of records that hold enduring value.An institution can ensure that the entire range of records with long-term value are deposited, preserved and made available in the right location by assessing their true value and then depositing them in the agencies best equipped to deal with them.Archives, libraries, institutional repositories and other groups share a common purpose in attempting to preserve records of enduring value, but they specialize in different types of records.Archives have a particularly important role to play in preserving materials that were not, at the time of their creation, meant to be publicly available.Such records can be particularly important in helping users to judge whether evidence or information contained elsewhere is trustworthy.
Unfortunately, relatively few archives or manuscript libraries have implemented a set of systematic institutional functions to preserve digital records that fall within their documentary purview.These institutions need a practical method to capture, preserve and provide access to records like email, blogs, digital photographs and unpublished reports, which are at extreme risk of loss over the medium and long term.What can be done to make digital curation a systematic institutional function in archives and manuscript repositories?In order to answer this question, we must first understand the nature of digital curation research from the point of view of an archivist at a small or medium-sized repository.

Electronic Records, Digital Preservation and Digital Curation Research
Over the past decade, a rich body of research and practice has emerged under the rubrics of electronic records, digital preservation and digital curation.Most of the literature regarding these topics is based roughly on the concepts outlined in the Reference Model for an Open Archival Information System and formalized in a report outlining the attributes and responsibilities of a trusted digital repository (Consultative Committee for Space Data Systems, 2002).
A complete list of relevant projects, conferences, workshops and publications would fill many volumes.The InterPARES Project, a highly respected international effort to establish the functional, technical and administrative requirements for preserving authentic electronic records, has disseminated over 1,650 publications, presentations, lectures and other research outputs, and is still going strong. 3 In the United States, the National Digital Information Infrastructure and Preservation Program (NDIIPP) officially lists over 50 "partner publications" and also supplies guidance documents, podcasts, tools, and services. 4The EU Planets project, which focused on building a preservation environment for national libraries and archives, lists approximately 172 papers, reports, articles and other publication on its website. 5In addition, print and online journals such as D-Lib Magazine, Code4Lib and First Monday routinely publish peer-reviewed articles related to digital preservation.Many projects, individuals, and organizations have developed resources to assist others who wish to implement these research findings.Award-winning tutorials have been written (Cornell University, n.d.).Evaluation tools have been developed (Dale & Ambacher, 2007).A complex data dictionary was written and formalized into an XML schema (PREMIS Editorial Committee, 2008).Software tools and services have been developed. 6Flashy grant projects are announced annually, some funded with many millions of dollars. 7Electronic records, digital preservation and digital curation projects have multiplied at an alarmingly fast rate, producing a useful but bewildering array of theoretical frameworks, diagrams, software and services.But we have to ask a question, particularly relevant to this conference's theme of "Growing the Curation Community".Have libraries and archives made adequate progress in implementing the procedures, tools and services to actually preserve digital records?8

Implementation in the Archives and Special Collections Community
The library, archives and special collections communities would certainly be much less able to capture and preserve digital materials if the work described above had not been completed.The existence of numerous conferences, workshops and doctoral curricula, along with a few well-placed success stories, might lead us to conclude that all is well.9But a closer look reveals that few institutions are systematically addressing electronic records issues across the entire range of documentary formats for which they have a mandated responsibility.Few college and university archivists have made practical progress in preserving electronic information.A recent study found that only 49% of institutions have an electronic records program.The programs that do exist have, by and large, failed to capture a broad range of institutional records beyond research publications and reports (Zach & Peri, 2010).
A recent OCLC Research report echoes these findings.Many institutions are collecting born-digital materials on an ad hoc basis but lack systematic management plans for the materials.For example, 45% of surveyed institutions have not assigned responsibility for born-digital materials to a designated agency or agencies, such as a special collections unit, institutional repository or an outside custodian (Dooley & Luce, 2010).
Several factors impede archives from developing digital curation programs.Over one-third Zach and Peri's interviewees noted that their programs lacked proper administrative support.Overwhelming majorities requested guidance in developing an appropriate policy framework (Zach & Peri, 2010).Respondents to the OCLC Research survey noted that they lacked funding (69%), planning time (54%), expertise (52%) and institutional support (42%) to implement programs for born-digital materials (Dooley & Luce, 2010).In the United Kingdom, those teaching digital preservation roadshows found great hunger for simple approaches to managing electronic records (Kilbride & Todd, 2010).
The complexity of electronic records, digital preservation, and digital curation literature, techniques and software -most of which is discussed outside of the major journals read by practicing archivists and records managers -is a clear concern to many archivists.A recent evaluation of the PLATO digital preservation planning tool concluded that the tool is too complex to be of immediate use to most practicing archivists (Belovari, 2010).As Mark Greene and Dennis Meissner have noted, most archives and special collections units face significant backlogs in paper materials, which prevent them from addressing other matters, such as the management of borndigital materials (Greene & Meissner, 2005;Greene, 2010).Issue 1, Volume 6 | 2011 Given these problems, it is very tempting to say that the preservation of borndigital materials should be left to institutions other than archives.But several factors suggest that archives must play a central role in an institutional strategy to preserve records.

The International Journal of Digital Curation
First, archives have traditionally preserved document genres that do not fall within the scope of other agencies.The theories and practices developed to manage unpublished resources must enrich the current practices of digital curation.Cox forcefully and eloquently articulates the need for archivists to guide modern organizations in collecting historical materials (Cox, 2005).Focusing attention on public and corporate archives, Berman and Lavoie argue that archives have played a long-term role in preserving analog materials of corporate bodies and of individuals, as well as records of their parent institutions (Berman & Lavoie, 2010).Specifically, archival theory and practice embodies a framework and skill set that preserve context, making records optimally interpretable and usable.Provenance, hierarchical description and aggregate control serve this purpose for analog materials and may be beneficially applied to digital materials as well (Anderson, 2010).Archivists must update their skills, and several archives have provided examples of how traditional archival principles can be applied to making digital manuscripts or personal materials optimally accessible (Martin & Thomas, 2006;Kim, et al., 2006;Forstrom, 2009;Greene, 2009).
At my own institution, archival principles and practices are being used to improve record keeping and digital curation practices. 10For example, a University of Illinois campus committee recently issued a report concerning the future of the Library.The report recommended that the Library can better meet scholars' needs by developing a data curation program as a complement to our institutional repository and in response to National Science Foundation Requirements for data management plans.Preserving datasets and publications is a necessary job.But if these recommendations are implemented without considering other documentary genres, the result will be woefully insufficient to the tasks of documenting the university's functions in society or even understanding the significance of a single research project.Institutions must seek to fully document an individual or institution's digital lifestream in the same way that print and analog archives do for the past generation of scholars.11At Illinois, the University Archives seeks to fulfill an essential element in this task.Records such as email, spreadsheets, digital photographs, blogs and other informal or unpublished materials are heirs to the analog record formats that have long constituted the stock and trade of archives and special collections libraries.It is incumbent upon archival programs to articulate a set of methods, policies, frameworks and techniques that can make the preservation of these materials a systematic institutional function, complementing the curation of other information (such as publications and research data) that are held in local or disciplinary repositories.Archivists must make this case, and institutions must support it.

Making Digital Curation Archival: The Practical E-Records Method
The archives and digital curation communities share guiding values and principles.By working together more directly, we will achieve greater relevance in our institutions and society.Fogerty (2008) notes that archives can achieve relevance by identifying and serving the parent organization's objectives and by identifying and cultivating external constituencies whose objectives align with those the parent organization.Other types of repositories share this need.Digital curation will become a systematic institutional function only to the extent that those interested in pursuing it fully collaborate with each other.
In order to collaborate, archives need a method by which they can incrementally build competence (within ourselves) and trust (within our parent organizations and external constituents) so that they can preserve records that are not being preserved by other repositories.Although the Society of American Archivists and other groups offer a number of training options concerning born digital materials, these programs lack an organizing principle.Attendees are left to apply specific technical lessons (such as how to preserve PDF files or how to implement a repository system) without an overall framework for program development and growth. 12he need is great and the archival community is receptive.The authors of the A*CENSUS survey report, a very comprehensive survey of US archivists, concluded that two of the biggest tasks that challenged practicing archivists were strengthening technical skills and "[i]dentifying effective methods for transferring the knowledge and values acquired through decades of experience to members of the next generation of archivists" (Irons-Walch et al., 2006).It is unrealistic to expect that our parent institutions will garner significant new funding to address electronic records issues in the period of austerity through which we are living.For these reasons, there is a strong need for the community to develop methods that can be used within existing budget lines and funding levels to gradually build a collaborative, expanding program.
During the 2009-10 academic year, while on sabbatical from the University of Illinois, I undertook a three-part project to supply archivists a method to engage with digital curation issues.The Practical E-Records Project includes three outcomes: 1) an assessment of tools that can be used to appraise, process, preserve and provide access to electronic materials, written from the point of view of a small archives; 2) policy templates that are suitable for customization in a small archives or manuscript repository; and 3) a set of recommendations that will allow archivists to gradually develop a digital archives program within existing staff, resources, and technologies.The recommendations, which will undergo refinement and testing during the upcoming year, may be found at http://e-records.chrisprom.com/?page_id=508.

Project Methodology
The project began with a review of electronic records, digital preservation and digital curation literature from the perspective of a practicing archivist who wishes to apply traditional processing approaches in the new digital context.I undertook the reviews and subsequent software assessments from a very specific perspective: that of a lone archivist who has limited access to information technology and/or budgetary support, but who needs to manage records that have been deposited after their period of active use has passed.I assumed the records being accessioned had not benefited from much, if any, pre-custodial intervention.By approaching the problem of digital preservation in this fashion, I hoped that lessons learned could also be applied in cases where records had undergone more active involvement by an archivist during their period of active creation and use.Such involvement would, in theory, make the challenges of accessioning, processing, and preserving the records easier than if no such involvement had taken place.

Software Reviews and Evaluations
After reading some of the digital curation and preservation literature, I began testing software, using two sets of test records: files from the American Library Association's Office of Intellectual Freedom and the email correspondence of Nobel prize-winning chemist Paul Lauterbur.Over the course of several months, I read project documentation, installed and uninstalled software, and used it to complete tasks like appraising, identifying, sorting and weeding records; migrating files to other formats; characterizing files; extracting metadata; or storing records in repository software.I rated the tools against a loosely-defined set of evaluation criteria, measuring the software's ease of installation as well as its functionality, scalability, sustainability, metadata support and documentation. 13  All software reviews were completed and posted to the Practical E-Records blog under the category "Software Reviews." 14 While completing the software reviews and reading literature, I developed and posted customizable templates, to be used in a nascent electronic records pilot program.Over time, the software reviews were organized into the set of recommendations, available at the URL listed above.
While my assessments certainly are not intended to be the last word regarding any of these projects, it is my hope that they will offer archivists a rough and ready guide for comparison, using a set of "real world" problems, rather than a long list of functional requirements. 13Those wishing to conduct a more rigorous assessment of open source software should consult Goh, et al., (2006); Wasserman, et al., (2006); Nichols & Twidale, (2003); and Balnaves, (2008).These resources provide some general guides to use in selecting or evaluating open source software.The LAM community has developed evaluation frameworks, under which specific projects are evaluated in reference to an external standard for trustworthiness in preserving digital materials.While these resources are useful for evaluating particular software implementations, they are not suitable for evaluating the development methods, level of community support and practical results of an entire open source project.
14 Practical e-Records: http://e-records.chrisprom.com/?cat=3.Issue 1, Volume 6 | 2011 What lessons have I learned?First, that there are many tools that would be useful in establishing a program to deposit and manage electronic records in an archives.Whether an archivist needs to develop a preservation plan, characterize the nature of files, migrate them or store them in a repository, there are many free or paid tools available.In many cases, these tools can be installed with a minimal amount of effort.

The International Journal of Digital Curation
However, many tools are unduly difficult to use in a live archival setting.For example, appraising records to be considered for potential inclusion in a submission information packet proved to be extremely frustrating.Prior to deposit, the Office of Intellectual Freedom files comprised 25 gigabytes/31,927 files (of mixed formats) in 2,172 folders and subfolders. 15No single tool allows an archivist to quickly and easily identify records of enduring value from such a muddle.In the end, several specialized file managers and programs, such as Tree Size, OSX Finder, ReNamer, and the Danish National Archives SABA copying program, proved useful to me. 16Nevertheless, I was unable to assemble a submission information packet in a timely fashion.This experience of attempting to select files of enduring value points to a considerable hole in current software development efforts and in the OAIS conceptual model, which is silent on what Adrian Cunningham has called "pre-ingest activities" (Cunningham, 2008).Archivists and records producers need a purpose-built tool for conducting records appraisal and other actions (such as deletion and renaming) related to selection.Such tools must create a packet containing the information objects, technical metadata about them, and a record of decisions and actions that led to their selection and the exclusion of other records.Until this gap is rectified, it will be very difficult for archivists to make any headway in dealing with the reality of the digital lives that records producers are leading. 17ools that can be used to accession, identify, characterize and validate files illustrate a slightly different issue.Although they implement the functions that archivists require, they are difficult to use without extensive tech support.They are meant for integration into other services, but are very difficult for archivists to use on a stand-alone basis.JHOVE, for instance, can only be installed and used by someone who has significant systems administration experience.In addition, many of the tools are very difficult to use in a batch mode.
For example, the FITS toolset is a very useful tool for extracting identification, validation and characterization metadata from many common file formats, tying it together into an XML wrapper.Properly stored with a digital object, the FITS output provides many of critical pieces of an archival information packet, such as a checksum and preservation description information, which are required to maintain the authenticity of a file over time.However, FITS is a command line tool and cannot be run in batch mode without a separate script or user interface. 18 similar problem afflicts file migration tools.Many excellent options exist, but it is very difficult to select the proper tool for any given task or to use them in a systematic program.Imagemagick and Open Office, for instance, can convert image and documents in batch mode, but require that users submit commands from a terminal prompt.Both deliver inconsistent results unless all parameters are correctly configured at the command line and/or the command is applied to a set of homogenous files.Both issues will likely cause problems for many archivists.
Xena, developed by the National Archives of Australia, bundles several opensource migration tools into one convenient package.In my experience, it can be easily installed and used on Windows, Mac and Linux platforms.For most archives, it will quickly convert a wide range of files to normalized formats.However, those using Xena should note that support for certain files, such email, database and video files, is not extensive, so additional action will need to be taken in order to convert them.Although Xena's developers have supplied an excellent set of instructions for developing plugins to support other files, writing a plugin would be a job for only the most technically skilled archivist or, more likely, a developer.Xena also encloses each of the converted files in an XML wrapper, which means that the file can only be viewed in the Xena Viewer application.Although the application works well and allows users to export without the wrapper, viewing files is extremely cumbersome.In addition, some repositories may question the normalization actions that Xena undertakes.If you wish to convert the files to another format (for example, PDF-A instead of Open Office Document format), separate software would be needed. 19  Regarding storage of archival information packets: the installation and maintenance of digital repository software, especially that based on Fedora or DSapce, is beyond the capability of most archivists and even many seasoned IT professionals.In addition, repository software requires extensive configuration and integration with other tools or services, such as archival descriptive software.

Policy Templates and Guidance
The process of evaluating software demonstrates that many archivists need a framework in which they can overcome the technical hurdles that stand in the way of implementing digital curation.To address this problem, I gradually developed and provided a set of template policies, guidelines and advice that archivists at any institution can use as a starting point for local implementation.Currently, recommendations are provided in the following areas: Over time, additional templates will be provided. 19A more complete analysis of Xena is found at http://e-records.chrisprom.com/?p=1081.Issue 1, Volume 6 | 2011 The guidelines follow a deliberately light-handed approach, eschewing the mandates and long forms that have tended to characterize electronic records management policies.For example, I developed an "Email Management and Preservation Guideline."It offers easy-to-understand information concerning email storage formats and lists practical steps that any individual can take to ensure that email accounts are left in an archives-ready format. 20  The Do-It-Yourself Repository deserves special attention.As noted earlier, the digital preservation community established the requirements for a trustworthy digital repository, resulting in literature and standards that are very complex and difficult for practicing archivists to assimilate while meeting daily responsibilities.Like a secure preservation environment for analog materials, a trusted digital repository cannot be built overnight and will require significant skill and advocacy.

The International Journal of Digital Curation
In the meantime, we may need to store digital records in the equivalent of the library basement, using existing, imperfect tools and technologies in a way that establishes an adequate level of trustworthiness.The DIY Repository outlines a set of steps that any archivist can use to ensure that records are appropriately managed using whatever pre-existing computing infrastructure might be available (within reason).Using this recommendation will ensure that the institution preserves records and contextual information under a simple set of standards and conventions, leaving them easy to migrate into a turnkey repository system that might be established at a later date.Figure 1 provides a schematic view of the recommendations, which are discussed in full detail online. 21  It is possible that, over time, an application such as Archivematica, which is currently in proof-of concept stage, may provide repositories with a means to replace many of the steps recommended above. 22In the meantime, the main elements of the program provide a means to bridge the gap between current institutional capacity and future dreams.

Figure 1 .
Figure 1.Selected Elements of the Do-It-Yourself Repository's Conceptual Model, Superimposed on OIAS Reference Model's AIP diagram.