DCC DIFFUSE Standards Frameworks: A Standards Path through the Curation Lifecycle

This paper is based on the paper given by Sarah Higgins at the 4th International Digital Curation Conference, December 2008. It reports on an ongoing Project - DCC DIFFUSE Standards Frameworks -'to identify and map the standards relevant to Digital Curation against the lifecycle process they support.'


Introduction
Effective long-term curation and preservation of digital information relies on the implementation of appropriate standards and technologies which support curation processes over the entire lifecycle of digital material.With hundreds of standards, in multiple versions to choose from, selecting those suitable for curation and preservation actions can be a daunting task.
The DCC DIFFUSE Standards Frameworks2 Project offers domain-specific advice on relevant standards.The DCC Curation Lifecycle Model (Higgins, 2008) is used to contextualise standards and visually present searchable frameworks.This helps digital curators identify which standards they should be using, and where they can be appropriately implemented.The use of Web 2.0 technology encourages community engagement and ensures information regarding standards usage is maintained.Meanwhile the Project is actively working with organisations from different disciplines to identify and develop appropriate frameworks.

Standards in Digital Curation
Information technology standards facilitate the implementation of solutions for creating and storing digital material, as well as supporting their subsequent access, use and reuse.Many standards are generic, and offer functionality which can be used across different disciplines, to support curation.They include file formats, reference models such as OAIS3 , persistent identifier standards and standards designed to support remote access, deposit and authentication.Other standards are discipline specific, and have been developed for a particular purpose which is not widely applicable.In particular, metadata standards, authority files and XML-compliant markup languages are often very specific to the material being described, and represent the result of intra-disciplinary collaborations.Examples of them are widespread and include: the metadata structure standard ISAD(G) (General International Standard Archival Description)4 designed for describing archives, which can be marked up in EAD (Encoded Archival Description)5 ; and mark-up languages such as Chemical Markup Language6 and MathML7 .
Implementors combine both types of standard, to develop frameworks which can be used to manage their digital information effectively.These frameworks can also be the result of disciplinary collaborations, with domains sharing the problem of identifying sets of standards which can achieve a community aim.
Implementing standards frameworks can have multiple benefits to a community, including: encouraging the achievement of community objectives through consistent and increased participation; sharing of resources, procedures, architectures, metadata profiles and access terminologies; and interoperability of hardware, software and data.
Developing and sharing effective standards frameworks can increase business effectiveness through efficiency savings, and ensuring legislative compliance.At the same time the implementation of sustainable and viable systems, with effective workflows, can be undertaken with reduced organisational design work.For some disciplines frameworks are de facto, relying on community agreement for their application, for example the Joint Information Systems Committee (JISC) Standards Catalogue 8 , which details recommended standards for the projects they fund.Other frameworks, such as the UK Government's e-Government Interoperability Framework (eGIF) 9 are mandated to ensure interoperability across a sector.
The benefits of standards frameworks to ensure consistency of approach, and consequent interoperability and collaboration, have been explored by the UKOLN Interoperability Focus 10 .Interoperability with the cultural and heritage sector is discussed by Gill and Miller (2002) and reports from orchestrated meetings with the sector noted a willingness to collaborate on defining standards frameworks (Miller, Dawson & Perkins, 2001).The benefits of open standards to the digital libraries community, to avoid vendor lock-in, and effective collaborative implementations have also been examined (Dunning et al, 2005).

Standards Frameworks and the Curation Lifecycle
The continuity of digital material is best assured by a lifecycle approach to their management.The benefits of this approach to archiving digital objects was discussed by Hodge (2000), and to the curation of digital information by Pennock (2007).The DCC is committed to promoting the lifecycle management of digital assets, and has developed the DCC Curation Lifecycle Model (Figure 1) to facilitate planning.This lifecycle approach to curation needs to be underpinned by the implementation of appropriate standards and technologies.The Model can facilitate the planning of frameworks, which ensure support for all parts of the lifecycle.
Standards frameworks intended to support the curation lifecycle should ensure that the recommended technologies maintain the authority of digital material, as defined by ISO 15489 (International Organization for Standardization, [ISO], 2001a[ISO], , 2001b)).Authenticity (i.e., where the material is what it puports to be) is maintained through: access controls; appropriate metadata; consistent use of persistent identifiers; and bitstream calculations such as checksums to ensure data have not been corrupted or tampered with.Reliability (i.e., where the contents can be trusted) is ensured through the maintenance of complete, organised and accessible material.Integrity (i.e., where the material is complete and unaltered) relies on protection by authority control.Usability (i.e., where the material can be located, retrieved, presented and interpreted) is maintained through: the implementation of systems appropriate to the business aim; inclusion of a comprehensive range of material for contextual understanding; and systematic management of material throughout the lifecycle.Additionally, standards frameworks for curation will ideally support interoperability, maximise accessibility, avoid vendor lock-in, provide architectural integrity, and help to ensure long-term preservation.

DCC DIFFUSE Standards Frameworks
In some domains, the benefits of standards are well understood, and comprehensive frameworks have been developed, documented and made accessible for potential adoptors.DCC DIFFUSE Standards Frameworks has started to capture these frameworks to make them further accessible.The Project graphically explains the way in which the standards included in a framework can be concurrently implemented to achieve curation aims.At the same time the Project can act as a depository for organisations, consortiums and projects, enabling them to document the frameworks they have developed in a consistent manner, manage them in one location and advertise them to others seeking curation solutions.
The resource consists of a browsable database of standards relevant to digital curation and preservation.Users can opt to browse by choosing a relevant Framework.The DCC Curation Lifecycle Model then offers a graphical searching tool, indicating the appropriate stage for implementation of the standards documented within the Framework.These contextualisations will help users to identify readily which of the many standards included in the database are appropriate to their own situation.They can identify which are designed to support the curation actions they wish to plan, aiding informed choices regarding implementation.It also enables users to identify gaps in the curation planning process as well as areas where additional standards need to be considered, or even developed.The database also offers a number of other browsing options: by title; by the technical function they support; and the organisation responsible for their development.
Information regarding individual standards is being formulated using a profile of the Standards Metadata Element Set, v3.0 11 , which was specifically developed for documenting standards by the ANSI-hosted Standards Registry Committee 12 .All standards are classified according to the frameworks in which they are included, the lifecycle function they support and their technical function.Users are able to identify: previous versions of a standard; those standards which are referenced within a standard and need to be used in conjunction with it; and those which have been created by the same body.Descriptions link to: standards documentation; information concerning sponsoring bodies; and further useful documentation concerning a standard such as implementation guidelines, XML schema or best practice guidelines.
The English language version of the Website Wikipedia 13 is being used to manage standards descriptions and encourage community participation.Early test descriptions created for the Project showed that this was the first resource consulted when researching data for the fields documenting both functions and usage of standardsparticularly as the actual standards documentation is not always readily available.This, and the enormous task of keeping the descriptive information up to date with limited staffing and budget, led to the decision to encourage the community to undertake the maintanence of this information.The possibility of a custom-built DCC DIFFUSE wiki for this purpose was considered, but it was decided that this was unlikely to achieve the community buy-in possible with the high-profile and apparently stable Wikipedia.DCC DIFFUSE encourages users, and collaborators to create Wikipedia entries where none currently exist, to keep a weather eye on existing descriptions, and make occasional corrections and updates.The feasibility of linking to a particular "DCC-endorsed" version of a Wikipedia entry, and the possibility of formatting the standards pages for future harvesting into the DCC resource within Wikipedia, was examined.
Figure 1.The graphical component of the DCC Curation Lifecycle Model (Higgins, 2008).

Contributions to DCC DIFFUSE Standards Frameworks
The Project has completed the documentation of the framework for the Records Management community identified by MoReq2 (Serco Consulting, 2008), and is currently working with both the digital repositories community and the archives community to capture the frameworks identified by the Driver Project (Foulonneau & Francis, 2007) and the UK Society of Archivists Data Standards Group respectively.The latter have been developing a framework, which is currently the focus of a series of articles in their member's newsletter, Arc.This framework has been developed, in conjunction with the DCC, for presentation in DCC DIFFUSE.It is hoped that it will also be accessible from the Society of Archivists website14 .It is planned that these two frameworks will be completely documented by the end of February 2010.
Collaborative work with the Open Geospatial Consortium (OGC)15 has currently been suspended due to pending funding changes across the DCC.The OGC intended to undertake data entry for the Project for the standards it develops and recommends.ensuring maximum exposure to relevant adopters.It is hoped this work will continue in the near future.A number of other communities have shown an interest in depositing their Standards Frameworks with DCC DIFFUSE.They include representatives of the web archiving, the museums, the particle physics and the eScience communities.

Possible Further Growth
Initially the DCC approached organisations asking them to participate in DCC DIFFUSE.Latterly the situation has become reversed with increased demand for participation in building the resource, and its use as an information source.Unfortunately future funding for developing the activity within the DCC has been significantly reduced, and so further development will depend on increased community participation to build a collaborative repository owned by the community.This can be undertaken using the approaches reported here.

Origins of DCC DIFFUSE Standards Frameworks
The Diffuse Project (Dissemination of InFormal and Formal Useful Specifications and Experiences to Research, Technology Development and Demonstration Communities) was originally funded under the European Commission's Information Society Technologies (IST) 5th Framework Programme and ran from 1 February 2000 until 31 January 2003.When funding terminated for this project, the data created by its partners 16 were retained online as a valuable information resource 17 .Unfortunately no funding was available to maintain the resource and in the rapid moving world of information technology it became outdated.In 2005, the DCC secured permission to re-purpose the content and has redeveloped the concept into the browsable database described above.
The redeveloped DCC DIFFUSE offers the contextualisation of the standards included, by both the specific domains, which find them useful and the lifecycle action that they support.It shows how particular sectors use standards, both domain-specific and generic standards together, to achieve their curation aims, and maintain the authority of their digital material.This should provide users a greater understanding of the applicability of standards to their own situation, and offer a more informed choice regarding which to implement.Additionally the harnessing of Wikipedia's volunteers should ensure that the resource does not become so badly outdated if it is not actively maintained for a period.

Conclusions
The DCC has further extended the concept of the original Diffuse Project to create a newly dynamic resource which enables domain-specific standards frameworks, which will support the lifecycle management of authoritative records.Contextualisation, using the DCC Curation Lifecycle Model enables users to identify appropriate standards and the suitable lifecycle stages for their implementation, facilitating detailed curation planning and the realisation of curation activities.
Originally developed in XML, HTML snapshots from the Diffuse Project were retained online by IS-Thought until early 2009.