34 DCC&U

The proliferation of Web, database and social networking technologies has enabled us to produce, publish and exchange digital assets at an enormous rate. This vast amount of information that is either digitized or born-digital needs to be collected, organized and preserved in a way that ensures that our digital assets and the information they carry remain available for future use. Digital curation has emerged as a new inter-disciplinary practice that seeks to set guidelines for disciplined management of information. In this paper we review two recent models for digital curation introduced by the Digital Curation Centre (DCC) and the Digital Curation Unit (DCU) of the Athena Research Centre. We then propose a fusion of the two models that highlights the need to extend the digital curation lifecycle by adding (a) provisions for the registration of usage experience, (b) a stage for knowledge enhancement and (c) controlled vocabularies used by convention to denote concepts, properties and relations. The objective of the proposed extensions is twofold: (i) to provide a more complete lifecycle model for the digital curation domain; and (ii) to provide a stimulus for a broader discussion on the research agenda.


Introduction
We live in a world where our personal and collective memory is informed by all kinds of digital information: personal images and videos, work documents, spreadsheets, e-books, emails, blogs, RSS feeds etc. The proliferation of Web and social networking technologies has enabled us to communicate, exchange and produce new digital assets at a phenomenal rate. The same trend is also evident in both private and public sector organisations. The adaptation of database technologies and, more recently, the Web, has led to an increasing volume of digitized or born-digital objects that not only help streamline daily operations and services, but also shape our sociocultural experience and identity.
This deluge of digital information introduces new requirements for the process of appraisal, preservation and management, related to the need to ensure that our digital assets, and the information they carry, remain available for future use. Certain critical questions have emerged to this end, as researchers and practitioners have become aware of both the severity and the universality of the problem, in the context of digital preservation research and practice: How can we ensure the authenticity and integrity of digital objects? What should we preserve, in the face of this digital information deluge, and what not? How are we to ensure usability and accessibility of these evolving information assets, as their context of use changes? Digital curation has emerged as a new interdisciplinary practice, community of practice, and field of inquiry, that seeks to find answers to these questions. Its fundamental principle is that ensuring future fitness for purpose of digital information, as its context of use evolves, requires the active management and appraisal of digital assets over their entire lifecycle.
The UK's Digital Curation Centre (DCC) has been a leading advocate of the need to approach digital information management from a disciplined lifecycle perspective. The DCC curation lifecycle model has appeared in Higgins (2007). Issues of authenticity and integrity, strategies to provide for adequate knowledge representation and access, support for a predictable preservation lifecycle of assets, as well as attention to the interests of particular communities of practice -such as archivists and researchers -have been major areas of interest for the DCC, in addition to its significant role as a centre for advocacy in learning on digital curation issues, not least on account of its ambitious digital curation manual publishing project.
Work in the Digital Curation Unit, Athena Research Centre, Athens, on the other hand, stems from extensive work in cultural heritage informatics, especially in the fields of cultural domain ontologies (Bekiari, Constantopoulos & Doerr, 2007;Constantopoulos, Doerr, Theodoridou & Tzobanakis, 2002, 2004Constantopoulos & Dritsou, 2007;Kakali et al., 2007;Stasinopoulou et al., 2007), of natural language processing (Androutsopoulos, Aretoulaki & Mitkov, 2003;Androutsopoulos & Galanis,2005;Galanis & Androutsopoulos, 2007;Malakasiotis & Androutsopoulos, 2007), metadata in the field of digital libraries , archives and public sector information (Bountouri, Papatheodorou, Soulikias & Stratis, 2008) and dynamic management and maintenance of views in databases and data warehouses (Kotidis & Roussopoulos, 2001). A recent analysis of curation practices in the field of museums and heritage, based on an activity theory methodology (Kaptelinin & Nardi, The International Journal of Digital Curation 2007), advocated the need for further study of discipline-and context-dependent digital curation practices, in parallel with the existing codification of "digital curation" as a universal field, cutting across diverse use contexts and disciplines. It also highlighted the need to account for the evolving nature of occurrence-level information objects (in a scholarly knowledge or, even, public communication context) as these objects co-evolve with successive states of scholarly/scientific and common knowledge (Dallas, 2007). A conceptualisation of digital curation practice and field of inquiry, proposed on the basis of these insights, introduced a broader digital curation lifecycle including not only the activity of appraisal, but also that of use experience, consonant with recent Web 2.0 and social computing practices. It provided for semantic notions of "knowledge enhancement" and "presentation" for digital information assets, which go beyond the syntactic notion of "tranform" in the DCC approach; and called for attention and explicit support for the overarching context management processes of goal and usage modelling, domain modelling, and management of authorities (Constantopoulos & Dallas, 2008).
Despite their differences, it may be recognised that both approaches are geared towards achieving (a) trustworthiness of digital resources, (b) organisation, archiving and long-term preservation, and (c) added-value services and new uses for the resources. The latter, however, further recognizes that stakeholders in the curation of digital information include not only the custodians of preserved assets (such as librarians or data managers), but also those concerned with the production and communication of knowledge (i.e., the full spectrum of research communities), as well as the data users who are accessing this knowledge. Consequently, the validity and usefulness of digital information objects as "fit for purpose" depends, crucially, on knowledge representation that is adequate and appropriate to such a broad context.
In this paper we re-examine and augment the model of digital curation proposed by the DCC, using some of the key elements of the process-oriented approach of the DCU. We do not, at present, attempt to integrate the context management processes of goal and usage modelling, domain modelling and authority management. While these processes constitute much needed components of an effective digital curation approach, as argued in Constantopoulos and Dallas (2008), we adopt at present a stepwise approach and focus on harmonization of the process lifecycle, which, after all, is the focus of the DCC model.
In particular, we argue that the digital curation lifecycle should be extended by: • The registration of the user experience while accessing the data. This user experience is recorded in session logs, in observational data and in traces produced by the interaction of the user with the resources, such as social tags, annotations, and other Web 2.0 artefacts. • An action of adding knowledge to repositories of digital resources. This added knowledge represents a new way of looking at, or combining, the primary resources and prior knowledge. Any added knowledge may itself evolve, thus producing secondary, autonomous digital resources. • The inclusion of controlled vocabularies (i.e., geographic names, historical periods, chemical molecules, biological species, etc.) used by convention to denote concepts, properties and relations.

DCC&U 37
Our proposed lifecycle model for digital curation is intended to support the planning and organization of digital content management by pinpointing important curation activities that were not included in the DCC model. The enhanced model proposed here is more comprehensive than either of its constituent models; the added complexity presents, admittedly, additional implementation challenges, but is, in our view, a better match to the complexity of real-world digital curation.
In the next section we review the research background to digital curation. We then present the digital curation lifecycle model introduced by the DCC. Subsequently, we discuss the main features of the process-oriented approach for digital curation introduced by the DCU. Based on these two models, we outline some crucial extensions to the DCC model and introduce DCC&U as an extended lifecycle model. The last section of this paper presents concluding remarks.

Related Work
Digital curation is an interdisciplinary domain that combines skills and practices from many disciplines such as computer science, archival science, librarianship and information science. The practice of digital curation extends to multiple fields of activity, embracing research disciplines from the humanities to the sciences, as well as the collections of outputs from these disciplines, whether they are to be found in escience repositories, in the custody of institutional records managers, or in museums, libraries and archives. An activity-theoretic study of digital curation in the field of museums and cultural heritage demonstrated that it may take place at all stages of engagement with cultural heritage information, from excavation, fieldwork and collection, to museum-based scholarly research, exhibition curating, and visitor-centric experiences. Actors involved in digital curation may include field researchers, registrars and documentalists, information managers, scholars, educationalists, exhibition curators, and the public (Dallas, 2007). Due also to the multidisciplinary landscape of digital curation, diverse lifecycle models have been created in order to reflect the needs of different fields of application. An example in the area of Personal Digital Archives can be found in Paradigm Project's Lifecycle for the long-term preservation of digital archives (Paradigm Project, 2006). The SHERPA DP Project (Knight, 2006) and the LIFE Project (Wheatley, Ayris, Davies, McLeod, & Shenton, 2007) can serve as examples in the area of institutional repositories. In the area of electronic records management, the InterPares Chain of Preservation Model 2 provides a comprehensive and well-documented process.
Apart from lifecycle models found in projects implemented to meet the needs of various disciplines, there also exist standards and specifications for individual key areas of digital curation such as lifecycle planning for digital resources. ISO 15489 parts 1 and 2 (International Organisation for Standardisation [ISO], 2001) provide both a best practice framework and implementation guidelines for the management of digital and physical information. OAIS (ISO, 2003) provides a generic conceptual framework for building a complete archival repository. The MoReq Specification (MoReq2 Team, 2007) considers the administrative stages required, as well as applicable standards to implement, when developing technical solutions for digital curation in a corporate environment.
Since its establishment in 2004, the DCC has carried out extensive work to define the principles of digital curation and to provide appropriate tools. Among the DCC's key contributions is a detailed lifecycle model (Higgins, 2007), depicted in Figure 1. The DCC lifecycle model represents the complex processes found in digital curation in a comprehensive and generic model that can be applied to any discipline.

The DCC Curation Lifecycle Model
In this section we briefly present the Curation Lifecycle Model (see Figure 1), first developed by the DCC in 2007 as a "generic graphical high-level overview of the stages required for successful curation and preservation of digital material from initial conceptualisation". Additional information about the DCC Curation Lifecycle Model (referred to as "DCC model" hereafter) can be found in Higgins (2007).
The DCC model is designed to facilitate the organisation and planning of curation and preservation activities within an organisation through the introduction of a series of planned actions and policies. The discrete functions within the DCC model can be conceptually partitioned into three categories: full lifecycle, sequential and occasional actions.
Full lifecycle actions encompass the set of actions that need to be performed throughout the lifecycle of digital objects. There are four full lifecycle actions in the DCC model, depicted as the four inner cycles that encompass the data in Figure 1 • Description and Representation Information: This includes the administrative, descriptive, technical, structural and preservation metadata that are necessary to adequately describe a digital object in the long term, as well as the information necessary for the understanding and rendering of the object and its metadata. • Preservation Planning: This action includes the necessary administrative and management plans for the actions of the lifecycle model. • Community Watch and Participation: This action includes not only using appropriate standards and tools, but also helping in their development and evolution. • Curate and Preserve: This action includes the need to be aware of, and to undertake all the management and administrative actions planned to promote curation and preservation throughout the curation lifecycle.
Sequential lifecycle actions specify a set of activities that must be undertaken in a specific order, so as to facilitate the curation and preservation process. These actions include: • The conceptualisation and planning of the creation and storage of data • The creation or reception of data, and their necessary metadata • The appraisal of the data for long-term preservation using welldocumented guidelines, policies and legal requirements • The data ingestion, by transfering the data to appropriate repositories, while ensuring that appropriate standards are used during this action • Undertaking all the necessary preservation actions, such as data cleaning and validation, generating preservation metadata and ensuring acceptable data structures or file formats • The secure storage of the data • Ensuring that the data are available to both users and re-users • The migration of data to different formats and the storage of the results of different selection queries on the data Finally, occasional actions describe those activities that need to be undertaken less frequently, such as the disposal of data that have not followed proper curation and preservation guidelines, the reappraisal of data that fail current validation procedures for further appraisal and reselection, and the migration of data to a different format.

The DCU Model
In the more recent work of Constantopoulos and Dallas (2008), an alternative approach for viewing the processes encompassed in digital curation was proposed. The goal of these processes is to achieve trustworthiness of digital resources, organisation, archiving and long-term preservation, added-value services and new uses for the resources. The distinctive feature of this model is the explicit consideration of contextual information resources as an object of curation. The processes depicted in Figure 2 are:

The
• Appraisal: The development of criteria for the evaluation of potential resources as well as the actual selection of the resources that may become subject to subsequent curation processes. • Ingest: The process involves (a) the digital recording of image, sound, text and data, (b) the digitisation of analog recordings on various physical carriers, and (c) importing digital resources from other sources, including repositories. • Classification, indexing and cataloguing: Three actions necessary to the production of logical indices for information management and, most importantly, subject indices and indices related to the intended or possible uses of digital resources. • Knowledge enhancement: A process that refers to the real-world entities, situations and events represented by digital resources, their wider context and domain, and the digital resources themselves; for example, annotating documents with the entities of an ontology they refer to, representing formally the situations or events mentioned in documents, and linking documents to other documents that support or contradict them would all be cases of knowledge enhancement. • Presentation, publication and dissemination: Processes that include the generation of new artefacts (scientific, scholarly, artistic, etc.) from existing primary or secondary digital resources. • User experience: This process captures the interaction between users and resources, as well as the effects of this interaction. • Repository management: A function concerning both actual (centralized or distributed) and virtual repositories, as well as access mechanisms. • Preservation: A process aimed at safeguarding against risks to longevity, arising from either physical causes or due to technological evolution.

DCC&U 41
These procedures for digital resource lifecycle management rely on three supporting processes: • Goal and usage modelling: These two actions capture, respectively, the intentions of the creators and the users of a given class of digital resources, together with the usage patterns of the resources. • Domain modelling: This process produces or refines representations of expert knowledge about a domain of interest. • Authority management: A process dealing with the controlled vocabularies (i.e., geographic names, historical periods, chemical molecules, biological species, etc.) used by convention to denote concepts, properties and relations.

DCC&U: A Proposed Extension to the DCC Model
The main purpose of this section is to propose a set of enhancements to the DCC lifecycle model. The objective of these enhancements is twofold: (i) to provide a more complete lifecycle model of digital curation; and (ii) to provide a stimulus for broader discussion on the research agenda.
In comparison to the DCC model, the DCU model emphasizes more explicitly the need for registering and maintaining how the stored, curated and preserved information is utilized and accessed by the users through their queries and their interaction. This need becomes pertinent with the development of Web 2.0, whereby users interact with each other and form user communities that create and modify information assets. This interaction is also visible in social tags, annotations and similar Web 2.0 artefacts (Gavrilis, Kakali & Papatheodorou, 2008;Mika, 2007). We believe that user experience, related to content evolution in such contexts, should be regarded as a "first-class citizen" in a curation lifecycle model and, thus, propose to add an action after the "Access and Reuse" action, in which to record and maintain the User Experience.
Another important aspect that is not specifically addressed in the DCC model is the provision for maintaining authorities and how these authorities evolve over time. While domain models deal with definitions of concepts, properties, relations and rules in a domain, a good part of expert knowledge about the domain is captured in authorities used by convention to denote relevant concepts, properties and relations, or their instances. Authorities are bound to evolve as significant changes to the body of domain knowledge occur. Specific procedures need to be adopted so as to safeguard important qualities of digital resources capturing such knowledge, such as coverage, specificity, coherence, consistency and cost-efficiency. Hence, we propose to augment the "Description and Representation Information" action, so as to include information regarding the main entities, concepts and relations, as well as their instances.
We consider knowledge enhancement as a crucial process for digital curation. Scholarly and scientific research, as well as professional practice, incrementally generate new knowledge about the real-world entities, situations and events represented by digital resources, about their wider context and domain, or even about the digital resources themselves. This knowledge could be encoded and organized in terms of annotations, rules and/or ontologies using Semantic Web technologies The International Journal of Digital Curation (Papatheodorou, Vassiliou & Simon, 2002). Furthermore, this knowledge could be exploited by intelligent agents, capable of reasoning and inferring new semantics. Each knowledge addition is related to a different view, angle of interpretation or application, thus representing a new way of looking at, or combining, primary resources and prior knowledge. As explained in the previous section, added knowledge may itself evolve. Arguably, the information regarding the curator -the protagonist for each knowledge enhancement task -needs to be maintained and preserved as well.
Thus, knowledge enhancement is an essential component of the curation lifecycle model, one in which knowledge is added on top of an existing repository of digital resources and across its related knowledge base. Since this is an action that needs to be performed throughout the lifecycle of the digital objects, we propose to augment the existing "Curate and Preserve" of the DCC model into a new "Preservation, Curation and Knowledge Enhancement" action.
The new model, using these enhancements, is presented in Figure 3 below. Compared to the original DCC model, this one contains: •

Conclusions
In this paper we looked at two recent models of the digital curation lifecycle and proposed a new extended model that combines their best elements, while at the same time preserving overall simplicity and comprehensiveness. Our endeavour focuses on harmonizing lifecycle-related processes, setting aside, for the time being, context management processes incorporated explicitly in one of the two models. We hope that this stepwise harmonisation exercise may stimulate further discussion to achieve a more comprehensive understanding of digital curation processes.