Toward distributed infrastructures for digital preservation: the roles of collaboration and trust 1

. This paper first explores some of the reasons why collaboration is becoming increasingly important in supporting scientific data curation, digital preservation initiatives and institutional repository development. It then investigates the concepts of trust and control used in the organisation science literature and attempts to apply them to the work on trustworthy repositories being carried out by various international initiatives.


Introduction
The long-term preservation of digital heritage is gradually being recognised as one of the grand challenges of this present age, one that will require ongoing attention for many generations to come.The past twenty years has seen a gradual shift in the amount of attention given to this problem, a pattern that has accelerated rapidly since the Task Force on Archiving of Digital Information published its final report in 1996 [1].In the decade after its publication, this highly influential report has helped to catalyse and focus a large number of institutional, national and international responses to the digital preservation challenge.
One of the key outcomes of the task force was their recognition of the need for a 'deep infrastructure' to underpin the organisational challenges of digital preservation.The task force concluded that digital preservation was not primarily a set of technological problems that could be solved by what it described as "fine tuning a narrow set of technical variables," but instead involved the "grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape" (p. 7).They considered that an important part of this deep infrastructure would be the ability to support a distributed system of digital repositories and other services (p.21).

Both informal collaborations (associations and alliances) and formal partnerships among contractors and subcontractors will also surely arise, in which responsibilities for archiving are allocated among various other interests in digital information.
Moreover, shared interests in, for example, intellectual discipline, in type of information, in function, such as storage or cataloging, and even interests in the output of information within national boundaries will all form a varied and rich basis for the kinds of formal and informal interactions that lead to the design of particular archival organizations.
A decade later some of these collaborative issues are finally being explored.This paper will first investigate some of the reasons why collaboration is becoming increasingly important in scientific data curation contexts, in digital preservation initiatives and in institutional repository development.It will then investigate the concepts of trust and control used in the organisation science literature and attempt to apply them to the work on trustworthy repositories being carried out by various international initiatives.

The growing importance of collaboration
Intra-organisational collaboration is becoming an increasingly important part of the socio-economic context of the modern world, a trend at least in part promoted by developments in network technology.This is especially true in the commercial sector, where outsourcing, strategic alliances and other types of inter-organisational relationships underpin many developments [2].Scientific research and development has not been immune from this trend, and research collaboration is recognised as a key part of the scientific research process, including the curation of the data produced by experiment and observation.Collaboration is also seen as a key part of the development of the organisational infrastructures that underpin institutional repository networks and digital preservation more generally.

Scientific collaboration and data curation
Collaboration, both between individuals and institutions, has long been a crucial part of the scientific research process.Research collaboration is well-established phenomenon that has been studied in detail by sociologists of science and other social scientists interested in the development of research policy and practice [3,4].This research has shed light on the ways in which the nature of collaboration can differ markedly between scientific disciplines and sub-disciplines.For example, early research in the sociology of science concentrated on the nature of the informal social networks used by scientists to keep in touch with each other and the way in which these were used to create collectives that define disciplinary norms and interpretational paradigms [5].Other forms of scientific collaboration are considerably more formal and, often emboldened by technology, have become embedded in what have often become semi-permanent organisational structures.For example, Ziman [6, pp. 69-70] has commented that the expense and sophistication of scientific instrumentation in fields like high-energy physics or space science means that science in those fields are extremely collaborative, often based on multi-centred research teams that work together for many years.It has been noted, for example, that large experiments in high-energy physics, for example, bring together many hundreds of scientists for up to twenty years [7, p. 122].The rise of e-research has accelerated the collaborative nature of science and large-scale collaborations are no longer just typical of traditional 'big science' disciplines like high-energy physics or astronomy, but have become an important part of recent initiatives in chemistry, bioinformatics, healthcare and other disciplines [8].
From a study of scientific collaborations in the physical sciences, Chompalov et al. [9] identified four different ways in which they have been organised.The taxonomy they identified includes bureaucratic collaborations with formalised and hierarchical structures and clear lines of authority, leaderless collaborations that also have formalised structures but are managed collegially, non-specialised collaborations that are broadly hierarchical but have an unspecialised division of labour, and participatory collaborations that are fundamentally egalitarian.This last category was most representative of high-energy physics, although the researchers did not discover any other significant relationship between organisation type and disciplinary speciality (p.752).The types of collaboration adopted by researchers within scientific disciplines and sub-disciplines seem to have an impact on data sharing practices and on data curation more generally.For example, Chompalov et al. also explored the relationships between different collaboration types and the production of scientific knowledge.They found, for example, that non-specialised collaborations tended not to design their own instrumentation.This suggested that such collaborations did not always require the innovation inherent in instrumentation design, but instead were more representative of domains where data collection needs to be standardised over a range of different collecting sites (p 760).The relationships between collaboration type and data acquisition and sharing practices were quite complicated.All participatory collaborations had data sharing agreements and used data collectively, while leaderless collaborations tended to focus on autonomous data acquisition from different experiments (pp 761-2).It is not entirely clear what (if anything) these findings mean for the organisation of data curation more generally, but it might suggest that collaborative data curation facilities might emerge first in areas that have a more participatory collaboration pattern or a strong emphasis on data sharing, e.g.bioinformatics, astronomy or the social sciences.Being able to share standardised data may also be the motivation of some non-specialised collaborations.
Scientific data curation, where it exists, tends to be focused on the disciplinary or sub-disciplinary level.This has potential benefits in that data repositories can be embedded within particular research communities and can take advantage of the existence of specialised knowledge and, where necessary, common standards.Data standards often emerge where there is a recognised need for data sharing within particular domains.The existence of standards can also make the development of data centres and repositories viable, whether these be run on a commercial basis or supported by scientific societies or research funding bodies.
The nature of the traditional scientific enterprise (and its underpinning funding structures) meant that historically there was relatively little demand for collaboration on data curation activities across multiple subject disciplines, except in specific instances where sharing was considered useful, e.g. the linking of historical biodiversity information with geographical data [10].However, the global collaborations that characterise e-research, combined with the development of new generations of high-throughput instrumentation capable of generating vast amounts of data, has refocused attention on the need to "pool resources and to access expertise distributed across the globe" [11].This has major implications for the development of supporting infrastructures for collaborative e-research.As David [12] has pointed out, the successful development of such infrastructures will depend on dealing with all of the social and legal challenges that will be associated with them Curiously, the institutional infrastructure requirements have tended to be overlooked, as though fulfilling them will be easily arranged; whereas they are every bit as complicated as the hardware and computer software, and indeed may prove much harder to devise and implement.This is particularly likely to be the case in regard to collaborative activities that are inter-organizational --the very sphere in which the vision of Grid-support seems to hold the greatest transformative potentialities.
We will now turn to consider the role of collaborative infrastructures in digital preservation initiatives more generally.

Collaborative infrastructures for digital preservation
Until fairly recently, however, much of the focus in digital preservation research has been concentrated on technological issues, e.g. on the development and testing of different preservation strategies and the metadata schemas that have been defined to support them.However, as Lavoie and Dempsey have pointed out, digital preservation is as much about socio-economic and cultural processes as about technology [13].

is also a social and cultural process, in the sense of selecting what materials should be preserved, and in what form; it is an economic process, in the sense of defining what rights and privileges are needed to support maintenance of a permanent scholarly and cultural record. It is a question of responsibilities and incentives, and of articulating and organizing new forms of curatorial practice. And perhaps most importantly, it is an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders.
Consequently, digital preservation activities have often been the focus of collaborative initiatives.At one level, these include national strategic alliances like the Digital Preservation Coalition (DPC) in the UK and nestor (Network of Expertise in long-term STORage) in Germany, as well as things like the National Digital Information Infrastructure and Preservation Program (NDIIPP) in the US.Two recent reports have outlined in some detail those national networks that have been created to deal with digital preservation challenges.Verheul [14] focused on national library initiatives in 15 countries.Half of the libraries assessed were part of national networks but their nature varied widely: "sometimes the framework primarily provides funding and sometimes more practical facilities are offered to improve cooperation, such as coordinating offices, embedding within project organisation, websites, facilitating meetings and seminars" (p.56).The report also noted that national libraries are often key players in facilitating co-operation on digital preservation on a national level and that a number of countries are beginning to develop national strategies.International co-operation, at least for national libraries, seem to be mainly focused through well-established organisations like the International Federation of Library Associations and Institutions (IFLA) and the Conference of Directors of National Libraries (CDNL), although more specialised initiatives like the International Internet Preservation Consortium (IIPC) also exist.
The second study was undertaken by Severiens and Hilf [15] for the German nestor initiative and was focused on developing a profile for a national long-term preservation policy.In this, three organisational models are sketched: centralised, decentralised and a hybrid one which is mostly decentralised but with a level of coordination.The report also contained a short review of co-ordination efforts in Europe, Australia and North America.
Other collaborative initiatives in the digital preservation domain have a far more practical focus, e.g.work on preservation metadata and object packaging standards, on repository architectures, and on the testing and implementation of preservation strategies.Specific collaboration on preservation infrastructures tends to focus on shared services like registries, e.g. for file format or other types of representation information [16].

Collaboration and institutional repositories
An area that has focused attention on collaboration within the broader digital preservation context has been the development of institutional repositories [17].Politically motivated initiatives to encourage 'open access' (OA) to the outputs of publicly funded research (including data) coupled with the widespread availability of open-source repository software and interoperability tools like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has led to the growing deployment of services known as 'institutional repositories.'Crow [18, p. 16] has defined these as "a digital archive of the intellectual product created by the faculty, research staff, and students of an institution."While initially conceived as a means to facilitate access to research outputs (chiefly peer-reviewed research papers), it was soon realised that institution-based repositories also offered an opportunity for universities and other research organisations to reclaim responsibility for the management and distribution of a wide range of digital assets, including research papers, technical reports and working papers, scientific datasets, learning resources and, in some cases, administrative records.In many cases, the setting-up of a repository implied an institutional commitment to the ongoing management of such information.For example, Lynch [19] has said that university-based institutional repositories represent an 'organisational commitment' to the stewardship of digital materials, "including long-term preservation where appropriate."However, institutional repository advocates have never claimed that all institutions with a repository (or repositories) would need to preserve content themselves.Instead, they argue that repositories will need to collaborate with each other as well as with third party services like registries or dedicated long-term preservation services.All of which could be co-ordinated to a lesser or greater extent on a national level [20].
In these scenarios, digital preservation functions are often assumed to be the responsibility of a third party, e.g. to preservation services provided by regional consortia, the larger national or research libraries or data centres.A good example of this is the DARE (Digital Academic Repositories) programme in the Netherlands, where the national library, the Koninklijke Bibliotheek, was given the task of developing and implementing a strategy and infrastructure for providing long term access to the content deposited in participating institutional repositories [21].
Sometimes the focus is on third party services filling gaps not provided by other stakeholders.
There have been a number of institutional repository projects that have been specifically concerned with integrating digital preservation functionality.For example, the SHERPA DP [22] project investigated the design of a shared preservation environment.In this, the project partners first articulated a disaggregated framework based on the Reference Model for an Open Archival Information System (OAIS) that would enable preservation services to be outsourced to third parties.Within this general framework, it was envisaged that participating repositories would regularly transfer content (and its appropriate metadata) to a third party service for long-term preservation.Other projects have focused on more modular approaches.In these, repositories interact with multiple services within a common infrastructure or framework.In the UK, for example, the PRESERV project [23] developed a simple model to show how institutional repositories might interact with multiple third-party services, e.g. for bit-level preservation, object characterisation and validation, and preservation planning (e.g., risk assessments, technology watch, etc.).To provide a concrete example of collaboration within such a network of preservation services, the PRESERV project has explored in detail how format identification tools like PRONOM-DROID [24] available from The National Archives can be utilised to provide format profiles at the repository level.The project applied this to the Registry of Open Access Repositories (ROAR) to provide format profiles for over 200 repositories.In their final report, the project team suggested that other modular services -e.g.Web-based services -could be developed to deal with other aspects of preservation functionality, e.g. for format validation, preservation planning or migration [25].

The role of trust and control in supporting collaboration
Management scientists studying the nature of inter-organisational relationships argue that successful co-operation is built upon 'trust,' a concept that is typically defined in terms of confidence in the actions, intentions or goodwill of other parties within a given context.Thus, Ring and Van de Ven [26] have argued that the need to work cooperatively "over sustained periods of time means that ... [managers must] concern themselves with the trustworthiness of other parties to a deal" (p.488).

Trust and control
The concept of trust has been explored from many different disciplinary perspectives [27], but an important theoretical paper in management science by Mayer et al. [28] has defined it as "the willingness of a party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other party" (p.712).In inter-organisational networks, therefore, trust is at least partly about participants accepting a level of vulnerability in exchange for certain perceived benefits, e.g. in terms of sharing risk or knowledge.It is also understood that inter-organisational trust is developmental, as it usually builds up as organisations work together over time.For example, Ring and Van de Ven [29] have proposed that trust in the goodwill of other parties "is a cumulative product of repeated past interactions among parties through which they come to know themselves and evolve a common understanding of mutual commitments" (p.110).Each successive interaction brings higher levels of trust, which may in turn have benefits in terms of reduced transaction costs (e.g. for negotiating contracts) and improved reputation.However, higher levels of trust may also bring additional risks if things go wrong, as evidenced by things like the collapse of Enron [30].Trust is increasingly seen as an important issue in organisational theory and research, reflecting a growing preoccupation with Internetbased communication and commerce [31] and the development of virtual organisations [32].
Because trust is understood in terms of vulnerability to the actions or intentions of others, it is often contrasted with a related concept known as 'control.'Control refers to the processes that are used to monitor and enforce activities, e.g. through things like governance structures, contracts or adherence to standards.Control mechanisms can be formal or informal [33, p. 259].Formal control mechanisms might include the establishment of rules, policies and procedures backed by the monitoring and measurement of business processes or outcomes.Informal value-based control is focused on the development on shared organisational cultures that encourage certain behaviours and outcomes.To the extent that formal control mechanisms are seen to exclude any notion of trust, the two concepts have often been held to be opposites.The importance of control has been justified by the use of phrases like "trust, but verify," and "trust is good, control is better," the latter usually being attributed to Lenin. 2 However, some recent studies view trust and control as being more interdependent.For example Castelfranchi and Falcone argue that control mechanisms can themselves build trust [34].Möllering views trust and control as a 'duality,' i.e. that they "each assume the existence of the other, refer to each other and create each other, but remain irreducible to each other [35].In the establishment of successful networks, both will be necessary.

Trustworthy repositories -towards evaluation principles and frameworks
In the digital preservation domain, most discussions about trust have focused on the development of criteria for the evaluation of repositories and other preservation services.The rationale for this was outlined in the 1996 report of the Task Force on Archiving of Digital Information [1]: For assuring the longevity of information, perhaps the most important role in the operation of a digital archive is managing the identity integrity and quality of the archives itself as a trusted source of the cultural record.Users of archived information in electronic form and of archival services relating to that information 2 Some authorities suggest that this aphorism, which is often cited in German ("Vertrauen ist gut, Kontrolle ist besser"), may be derived from Lenin's 1914 essay on Adventurism: "Put no faith in words; subject everything to the closest scrutiny -such is the motto of the Marxist workers" -Lenin, V. I.: Collected works, vol.20, tr.B. Isaacs and J. Fineberg (p.356).Progress Publishers, Moscow (1964) need to have assurance that a digital archives is what it says it is and that the information stored there is safe for the long term.
The report additionally suggested the need for some kind of certification process that would be able to help establish a climate of trust.Following this, the OAIS Reference Model defined six 'mandatory responsibilities' that organisations needed to discharge in order to operate as an OAIS [36], although a later application of these criteria to the UK National Archives and the UK Data Archive suggested that it would be relatively difficult for any functioning archive not to comply with them [37, p. 10].
The first attempt to identify specific evaluation criteria for 'trusted digital repositories' came in 2002, when an international working group sponsored by the Research Libraries Group (RLG) and OCLC Online Computer Library Center published a set of seven attributes [38].The first of these was compliance with the OAIS model, the remainder covering a wide range of organisational matters, including administrative responsibility for operational matters, organisational viability in terms of a long-term commitment to long-term stewardship and financial sustainability, demonstrating the existence of appropriate (and accountable) levels of technical and procedural suitability, and basic system security.The working group (p17) also defined in more detail the main responsibilities of trusted repositories, including the essential need for organisations to understand their own requirements but also to identify which other organisations might be able to share certain responsibilities and how this might be arranged.They make the important point that comprehensive coverage within collections and effective interoperability across repositories will rely on a shared understanding of duties and roles.
Archivists and librarians need a more thorough understanding of how cooperative digital repositories and repository networks can be implemented and managed, including the use of third-party service providers.Models for the establishment of cooperative archiving services will be useful and necessary, as will be examples of service-level agreements as they apply to digital repositories (e.g., service-level agreements for external suppliers of archival storage).
Like the Task Force on Archiving of Digital Information, the working group assumed that certification would be an essential part of supporting co-operative networks of repositories and other third party service providers (e.g.registries of representation information or storage services).The working group, therefore, recommended the development of a framework and process to support the certification of digital repositories.This led to the formation of a follow-up task force, this time sponsored by RLG in conjunction with the US National Archives and Records Administration (NARA).
The RLG-NARA Digital Repository Certification Task Force focused on the identification of particular certification criteria and the delineation of a certification process that would be applicable to a wide range of different types of preservation repository.Following the issue of a draft version in 2005, the audit checklist was tested in various projects supported by the US Center for Research Libraries (CRL) and the UK Digital Curation Centre (DCC) [39,40].This work had several important outcomes.The first was the issue of version 1.0 of the TRAC (Trustworthy Repositories Audit & Certification) criteria and checklist, published by the CRL and OCLC in early 2007 [41].Recognising the increasing diversity of repositories, and the fact that for many of them long-term preservation was not an immediate priority, the compilers of the criteria and checklist encouraged repositories to use the checklist as an audit tool for objective evaluation (p.5).Those organisations that required a more formal evaluation process could choose to pursue certification, although it was recognised that there would need to be differences, depending on specific organisational or geopolitical contexts (p. 7).Underpinning all this was the need for repositories to document policies, repository development and implementation as part of the audit process (for many organisations, this might be a useful outcome in itself).The list of criteria was divided into three sections.The first of these dealt primarily with organisational infrastructures, including specific criteria related to governance and organisational viability, structure and personnel, policies and procedural accountability, financial sustainability, and legal issues related to contracts, licenses and liabilities.The second sets of criteria were concerned with the more practical aspects of managing digital objects.Heavily underpinned by OAIS concepts and terminology, this section provided detailed criteria relating to the acquisition and ingest of content, preservation planning, archival storage, information management (including metadata), and the provision of access.The third section provided more detailed evaluation criteria on technologies, technical infrastructure and security.
Other audit and certification initiatives also built, at least in part, on the RLG-NARA Task Force's work.For example, in Germany a working group of the nestor initiative published a draft Catalogue of Criteria for Trusted Digital Repositories in 2006 [42].Taking into account the draft RLG-NARA checklist, the DINI-Zertifikat [43], and other approaches, the working group published a comprehensive list of criteria that also fed back into further development of the TRAC methodology.
The Digital Curation Centre, in conjunction with the European Union 6th Framework Programme project Digital Preservation Europe (DPE), has built upon the principles that underpin TRAC and other repository assessment initiatives to develop a draft self-assessment toolkit known as the DCC/DPC Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) [44].As the name implies, the toolkit is primarily concerned with identifying risks, thus helping to transform "controllable and uncontrollable uncertainties into a framework of manageable risks" (p.11).The developers of the toolkit consider that it will mostly be used for selfassessment, i.e. as a means of guiding repository administrators and other staff "to identify the risks that carry the most profound implications with respect to their own organisation's business continuity, to determine the success with which they are able to anticipate, avoid, mitigate and treat risks, and to maintain appropriate evidential documentation to ensure that any conclusions of this assessment are verifiable, even if only needed internally" (pp.[23][24].Use of the tool will help organisations to fully document their mission, aims and objectives, but will also help to identify and categorise specific risks and provide a means of directing resources to meet the most important areas of concern.In addition, it might also help to prepare the organisation for a formal external audit based on assessment criteria based on TRAC or something similar.

Trust and control in digital preservation networks
In terms of the management and organisational science concepts discussed above, it is clear that most discussion of trust in the digital preservation domain has been concerned with the establishment of control mechanisms, i.e. the identification of suitable criteria for the evaluation and assessment of repositories.For example, the principles and best practices identified in the TRAC and nestor checklists could form the basis of a benchmark standard to which organisations in certain operating contexts could be assessed.This may be a suitable approach, for example, where third party repositories take responsibility for digital storage and where depositing organisations need confidence that these services are able to do what they claim.Self-assessment tools like DRAMBORA are also a type of control mechanism, although they perhaps tend towards the more informal side of the control continuum.In DRAMBORA, documentation and risk analysis could be used to help develop shared organisational cultures that are focused on solving long-term preservation challenges in an incremental and managed way.

Conclusions
Trust comes to the fore in many of the other areas of digital preservation where collaboration is necessary.This includes, for example, participation in strategic alliances and research initiatives, and in the provision of shared services like registries.For example, cultural heritage organisations with a long history of managing and preserving non-digital objects may not be able to demonstrate immediate competence with digital materials, but third parties may still have justified confidence in their institutional (or legal) mandates, proven sustainability and longterm track record.Similarly, data archives in the sciences can gain trust by their close integration into particular research communities or through mandates from research funding bodies or similar.Trust in the continued existence of infrastructure components like registries may be slightly more problematic, although a focus on distributed governance and ownership may be of some help here.At the very least, control mechanisms like DRAMBORA may help to identify the specific risks of working with third party services within collaborative networks and may in time help solve them.