Design and Implementation of the Australian National Data Service

Normal 0 This paper will describe the genesis and realisation of the Australian National Data Service (ANDS). It will commence by outlining the context within which ANDS was conceived, both in the international research and Australian research support domains. It will then describe the process that brought about the ANDS vision and the principles that informed the realisation of that vision. The paper will then outline each of the four ANDS programs (Developing Frameworks, Providing Utilities, Seeding the Commons, and Building Capabilities) while also discussing particular items of note about the approach ANDS is taking. The paper concludes by briefly examining related work in the UK and US.


Introduction
There is a rich research literature dealing with the changes to the scholarly communications system brought about by the rise of the Internet (Borgman, 2007).These changes have been underway since the late 1980s, but have accelerated rapidly in the last decade.As a result, a majority of all the scholarly publications produced each year are available online, and in time this will be true for all scholarly publications produced in any given year, as well as an increasing backfile.Some of these publications are available in toll-access publications, some in open-access publications.The latter category, augmented by pre-prints, are often captured and made available through various institutional repositories (Lynch, 2003).
The rise of data-driven research (Hey & Trefethen, 2003) has meant that publication of the research publication alone as a largely textual document will increasingly be regarded as insufficient.The reader will need access to the data, and perhaps even the associated computational models or workflows.Moreover, the data will need to be managed for the long term as part of the scholarly record.Consequently, a number of significant reports (American Council of Learned Societies Commission on CyberInfrastructure for the Humanities and Social Sciences

The ANDS Context
The Australian National Collaborative Research Infrastructure Strategy (NCRIS) was funded in the financial year 2004/5, and the programs commenced in financial year 2006/7.In addition to fifteen areas of identified research capability, the NCRIS plan recognised the need for a sixteenth area: Platforms for Collaboration.This area is made up of a number of discrete services, addressing the need for ongoing network investment (Australian Research and Education Network -AREN), a coordinated approach to authentication (Australian Access Federation -AAF), collaboration and middleware (Australian Research Collaboration Services -ARCS) high-performance computing (National Computational Infrastructure -NCI), and data-management and sharing (Australian National Data Service -ANDS).This paper will focus on ANDS.
ANDS arose from a series of consultations undertaken during 2006 and early 2007 by Dr Rhys Francis (now Executive Director of the Australian eResearch Infrastructure Committee, the advisory committee for Platforms for Collaboration).A consistent theme arising from the consultation meetings was the need to improve data management and availability.In April 2007, a forum with wide representation met and agreed to set up an ANDS Technical Working Group (ANDS TWG).The TWG produced a report (Australian National Data Service Technical Working Group [ANDS TWG], 2007) in October 2007 that laid out a vision of how ANDS might operate.This document envisaged four programs of activity within ANDS: Frameworks, Utilities, Repositories, and Researcher Practice.in close collaboration with the Australian National University (ANU) and Commonwealth Scientific and Industrial Research Organisation (CSIRO) on a project to establish ANDS.This project commenced in January 2008 and concluded in late 2008 with the submission of its final report.This project conclusion will overlap with the commencement of ANDS itself, which came into existence in September 2008.The author was a member of the ANDS TWG, was subsequently appointed as the Director of the ANDS Establishment Project, and will likely be one of the ANDS Deputy Directors.

Establishing ANDS
The Australian National Data Service (ANDS) is being established to ensure that Australian research data is well managed, made available for access, and discoverable so that researchers can find and access any relevant data in the Australian "research data commons", Australian researchers are able to discover, exchange, reuse and combine data from other researchers and other domains within their own research in new ways, and Australia is able to share data easily and seamlessly to support international and nationally distributed multidisciplinary research teams.The overall goal for ANDS is to "deliver greater access to Australia's research data assets in forms that support easier and more effective data use and reuse" (ANDS TWG, 2007, p. 18).This will be achieved by the creation of the Australian Research Data Commons (ARDC), a virtual space which aggregates information about available datasets (ANDS TWG, 2007, pp. 10-17).The centrality of the ARDC can be demonstrated by the decision to refer to "ANDS: Building the Australian Research Data Commons" in the marketing materials.
Consistent with these goals, over the course of 2008 the ANDS Organisational Network (ANDS ON) developed an Interim Business Plan (Australian National Data Service [ANDS], 2008).It will be updated in March 2009 and accepted (after discussion with the funding agency) by June 2009.At the same time the ANDS Project Management Committee focussed on the Agreement for the collaboration that will be responsible for ANDS, the contract between this collaboration and the Department of Innovation, and the search for the ANDS Executive Director.
The 10-year goals for ANDS (bearing in mind that it will initially only be funded for three years) are that: 1 The funding for ANDS will be A$21M over three years, commencing in the second half of 2008 and concluding mid-2011.

ANDS Principles
In seeking to turn Australian National Data Service Technical Working Group (2007) into a business plan and work towards these goals, the ANDS Establishment Project identified a number of principles to be followed.These are essentially the theoretical underpinnings for the implementation approaches that will be followed.

Commons Framework
ANDS will start in a way that anticipates the need to scale up and adapt over time via an extensible framework of data stores, federations and services that enable better data creation, capture, management and sharing.

Focus
ANDS will identify and work with those who are ready, willing and able to contribute significantly to the Australian Research Data Commons vision, and who provide the most strategic return to the Australian Research Data Commons for the effort expended.

Content
ANDS will initially focus on content recruitment into stores and federation across stores so as to achieve a wide coverage of data quickly at an agreed level of quality; in later years the emphasis will shift towards quality improvement.

Service Provision
ANDS is focussed on service provision, not research and exploration; its programs will develop, integrate, and continually improve production-level systems in support of well-understood services.A separate process will fund the development of more innovative and exploratory domain-focused initiatives that may become ANDS services in later years.

Strategic Partners
ANDS recognises the need to be open to, and engage appropriately with, innovations and external institutions relevant to the nascent Australian Research Data Commons, including the Access Australia Federation (AAF -a national authentication system based on Shibboleth and PKI) and the Australian Research Collaboration Services (ARCS -providing collaboration tools and an SRB-based data fabric).

Stores
ANDS assumes an environment where storage and long-term curation primarily occur in institutionally supported stores, either existing or created over the life of Issue 1, Volume 4 | 2009 ANDS.ANDS will facilitate public and restricted access and re-use across these institutional stores.These stores will preferably hold objects described by various discipline-specific and documented metadata schemas.Note that ANDS is not funding storage.Nor is it setting up discipline-aligned data centres (the lack of which is one of the key differences between the Australian and UK environments).

Sustainability
Research data management requires a long-term commitment.ANDS is developing its three-year plan on the assumption that it does not represent a one-off investment in data.The enduring changes forecast in this document within each program are also intended to be sustainable beyond the end of the ANDS planning period.

ANDS Programs
To enable researchers to work in the new world of data-intensive research outlined in the introduction to this paper, they will need: • Policies that support a new way of working • A technical data fabric that enables storing and moving data • A repository to store their data effectively • A referencing mechanism that supports input data, modelling outputs (such as visualisations), software code and documents to be crossreferenced • The ability to search across all the collections that have been registered • The training and training materials that enable the infrastructure to be used effectively The research infrastructure to be provided by ANDS to realise this will be delivered under four programs of activity (Developing Frameworks, Providing Utilities, Seeding the Commons, Building Capabilities).They are different to the programs originally envisaged in ANDS TWG (2007).The need to focus what ANDS does has been driven by the complexity of the environment and the limited funding available.This meant that it was not possible to achieve critical mass in the planned Repositories and Researcher Practice programs.As a result, the original programs envisaged in ANDS TWG (2007) have been transformed into Developing Frameworks, Providing Utilities, Seeding the Commons (the Research Data Commons, that is), and Developing Capabilities.Some elements of the underlying data fabric will also be delivered through the Australian Research Collaboration Services (ARCS).ANDS will work with the National e-Research Architecture Taskforce (NeAT) in defining projects that will primarily contribute towards the Seeding the Commons and Providing Utilities programs.Some of the outputs of these activities will appear in ANDS programs in later years.So, what exactly are the ANDS Programs going to achieve, and how are they approaching their task?

Overview
The Frameworks program aims to influence and simplify the overall policy framework within which the ANDS goal is to be achieved, as well as defining how activities by researchers (in order to comply with their grant conditions) and The International Journal of Digital Curation Issue 1, Volume 4 | 2009 institutions (in order to comply with funding requirements) can contribute to a national research data commons.The Frameworks program will progressively work to simplify and reduce the number of licences under which data is created and shared.The primary collaborators in the Frameworks Program are institutional data holders, national initiatives such as the National Committee for Data for Science, cross-governmental groups, research-funding government departments, research-funding schemes, discipline leaders within institutions, and research office staff at institutions.The focus of Frameworks will, where possible, be on the higher-level aggregations of datamanaging entities rather than the individual entities themselves.

Discussion
ANDS is able to benefit in the Frameworks program from a unique environment.Demographically, Australia is a small country (population of 21 million), with a fairly homogenous university sector due to reforms that took place in the 1980s -1990s.All Australian universities cover a broad range of disciplines, and offer the usual range of undergraduate through to doctoral degrees.This is quite different from the very heterogenous situation in the US, for instance.Because of the structure of the Australian Constitution, the Commonwealth Government is responsible for funding the higher education sector (although the proportion that such funding represents of the total funding required has steadily fallen to an average of 40% across the sector).This has lead to a tradition of centralized direction and policy setting by government, particularly with respect to research funding.ANDS plans to take advantage of this to encourage desired data management behaviours.).This laid out, in addition to the normal sections on ethics and misconduct, obligations on researchers for data management and on institutions regarding data management support.Recent statements from research-funding bodies also indicate that they are keen to move towards progressively requiring greater data deposit and sharing.

Overview
This program will provide a range of utility data services at a sector-wide level (such as cross-discipline discovery services; national collection registry; persistent identifier service; federation registry; access policy registry), as well as working to improve existing and develop new ones.The human-facing utility services will use the AAF authentication services and ARCS authorisation services (informed by ANDS requirements) for access control.Note that some of the proposed registries (collection, federation, access policy) are essentially enabling; most users will never see them directly but they will contribute critically to elements that the users will see.As well as the provision of utility services, ANDS will develop a Federation Utilities National Framework.This will stimulate and support data federation utilities at the level of research communities and their federations.It will also work to develop and maintain community consensus within the research and public sector on a technical framework for the interoperability of data utility services (registries, metadata catalogues, schemas and harvesting guidelines, search services, etc).Development of this framework will be undertaken in collaboration with related UK and US activity.The primary beneficiaries of the Utilities program will be content providers and consumers in the

Discussion
One of the challenges in the design of the Utilities program has been the relationship between potential Australian services and international equivalents.Anecdotally, most researchers see their primary loyalty as being to their disciplines, rather than their institutions.This means that whatever ANDS chooses to do cannot be inconsistent with disciplinary practice.For some disciplines, this means that the ANDS role will be quite limited.It also means that there may be little value in running (for instance) a narrowly Australian service when the researchers in Australia will be working with colleagues overseas who are using an established international equivalent.

Overview
The Commons Program will start to seed the Australian Research Data Commons by seeking to make more content available.The amount of funding available for ANDS is insufficient to provide or support a repository solution across the entire research-producing sector.As the marginal cost of working with people is high, and ANDS only has limited funding, this program will be have very specific aims, providing focussed assistance to a number of target groups.The aim of the program is to ensure that the data and metadata generated within the program targets are captured, stored and made accessible through the Australian Research Data Commons.ANDS will target automated and semi-automated data, and metadata capture, to simultaneously improve the quality of what is captured and increase its quantity (by reducing the barriers to capture), ANDS will also aim to recruit information about data collections to be made available through the ANDS discovery services.Once groups have been selected (according to criteria that will be developed in year one) ANDS will fund staff to work within them, and the selected groups will provide the long-term storage infrastructure.It is expected that the researchers will need to be key players in this process for it to be effective.In the first year of ANDS the main activities related to this theme will be finding and training these staff (through the Capabilities program described below), and deciding on the most appropriate selection option, and the detailed criteria for selection.
At the same time, in year one ANDS will also concentrate on the recruitment of existing content into repositories, identifying existing repositories of useful content, and making all that content discoverable through the Australian Research Data Commons.Where institutions with valuable existing content do not have the required systems, ANDS will provide small agile teams to set up data repository infrastructure based on toolkits that have already been developed.If demand exceeds available capacity, ANDS will develop a transparent process for allocation of ANDS resources.Where repositories (or federations) already exist, this theme will assist with its integration into the Australian Research Data Commons.This may be work performed by staff in the program target (with ANDS advice if needed) or by ANDS staff (building on the technical consultancy expertise in the Capability program).This theme will primarily operate in year one (because the focussed assistance program will The International Journal of Digital Curation Issue 1, Volume 4 | 2009 not start until year two), but will also continue in years two and three if there is proven demand and the theme is delivering strategic value.

Discussion
One of the initial decisions made by the ANDS TWG was that ANDS would not provide storage for data (this is one of the ANDS Principles).ANDS assumes that data will be stored in institutional repositories, consistent with the recommendations contained in National Health and Medical Research Council document (NHMRC, ARC & UA, 2007).This is different to the UK discipline data centre approach, and was a deliberate decision.Institutions have obligations under NHMRC ( 2007) with which they are now starting to grapple.They employ the researchers who produce the research outputs, and they are likely to have a longer-term presence than a funded data centre.On the other hand, each institutional repository will not necessarily have the expertise to curate a widely heterogenous assemblage of data objects for the long term.ANDS will need to work closely with institutions to ensure that they are aware of the range of curation challenges that they have to address.ANDS will also need to deal with a very heterogenous mix of repositories that will be integrated into the Australian Research Data Commons.The solution here will rely on open standards whereever possible.ANDS will retain specialist expertise to assist with integration issues.Of course, none of this activity will happen in a vacuum, and ANDS will seek to learn from related activity overseas.A number of JISC projects are of interest: the Data Audit Framework1 , SCARP2 and DISC Datashare3 Projects.ANDS is aiming to pilot the Data Audit Framework at a higher level and in less detail, and to feed back its experiences with the online tool.

Overview
The Capabilities Program will build capability across the research and scholarly communications lifecycle in organisations, systems, services and people.This program will work with the sector to produce a capability maturity model for e-research and information infrastructure.It will provide audit, rating and certification systems and services.ANDS will also work with institutions to coordinate the development of national curricula for capability development.It will coordinate, enhance, and add national focus to institutionally based training initiatives.ANDS will base the activities of this program in the community by establishing a forum drawn from data stewards who are providing retention and access services to research data.These will include research-intensive organisations as well as government instrumentalities holding data of interest to researchers.Staff from this program will collaborate with and support the more targeted activity within the Seeding the Commons Program.As cohesive networks of research data are increasingly regarded as an important and enduring part of the collaborative research infrastructure, this program will focus in particular on building the capability of researchers and support staff to contribute to and better exploit national data infrastructure.

For Government
The benefits of ANDS for government are at least twofold.
In the first place, the Australian Commonwealth in general (and the Department of Innovation, Industry, Science and Research in particular) are concerned to ensure that Australian research is as effective as possible.ANDS, through its provision of improved access to better managed data, will encourage data sharing and re-use, thus facilitating synergistic and serendipitous research outcomes.
The Commonwealth Government is also keen to ensure that taxpayer-funded research is accessible to the taxpayers who provided the funding.This is consistent with other initiatives in the UK and EU around improved access to Public Sector Information (PSI).This accessibility for publicly funded research outcomes is the subject of a Commonwealth Government Accessibility Framework currently being developed.While not everything that is discoverable through the Australian Research Data Commons will be publicly accessible, much will.There is also the potential for ANDS to facilitate greater engagement between the public and the researchers, both as consumers and in some cases co-creators of the research.

Beyond ANDS
The review of the NCRIS Roadmap has already indicated a need to move beyond the ANDS approach of relying on institutional repositories.The final report (National Collaborative Research Infrastructure Strategy [NCRIS], 2008a) has foreshadowed the need for a national data grid to provide for long-term preservation of data.A dedicated high-performance network would link the nodes in this grid, allowing researchers to move data rapidly from instrumentation to computing resources and to institutional storage.Interestingly, this proposal is expected to extend to partnering with research organisations for the development of institutional nodes of the storage grid.This would be on the condition that the storage is used exclusively for research data; that the institutes co-invest in the infrastructure; that each institute publishes and adopts a data management plan; and that each institute ensures its researchers use and abide by the data management plan.Of course, there is no way of anticipating when this will become reality, or whether it will be funded.
At the same time as ANDS commences its work, two related initiatives are underway in the UK and the US.The United Kingdom Research Data Service 4 is undertaking work to develop an understanding of the UK's current and future research data service needs, work with other UK stakeholders to identify priorities for action, develop a number of scenarios for a possible service, and develop a detailed business plan for the preferred option(s).The Interim Report from this activity (UK Research Data Service [UKRDS], 2008) recommends a model with a number of similarities to the approach taken by ANDS.In the US, the National Science Foundation has announced the first round of successful projects to be funded under the first round of the Sustainable Digital Data Preservation and Access Network 5 (DataNet) call.The DataNet call aims to build sustainable infrastructure by creating a new type of organization that the NSF does not believe exists today.It is looking for librarians, archivists, and computer/ computational/information scientists who will work together to build excellent infrastructure for science and/or engineering, while engaging closely with intended users; domain scientists will be full partners in the process.

Conclusions
The Australian National Data Service is taking an approach that tries to pragmatically blend learning from existing and preceding initiatives in Australia and overseas, while working within the constraints of the available budget.The task of achieving the hoped-for cultural change in data management practice is significant, but there appears to be a readiness in the various sectors to engage with the ANDS program of activities.As we learn from our early experiences and finetune our activities over the next three years, we in Australia hope to be able to continue to learn from our colleagues in the data management community internationally and share our findings and outputs in a spirit of collaboration.
[ACLSCCHSS], 2006; Association of Research Libraries [ARL], 2006); National Science Board [NSB], 2005) have argued in recent years for a more systematic set of solutions to long-term curation of research data outputs.It was in this context that Australia started to look at a national approach.
In late 2007, the then Australian Commonwealth Department of Education, Science and Training (DEST) asked Monash University to be the lead agent (working A positive development was the release last year of the Australian Code for the Responsible Conduct of Research (National Health and Medical Research Council [NHMRC], Australian Research Council [ARC] & Universities Australia [UA], 2007 data facility managers and administrators, and research communities building and operating (new or already existing) data federations

The International Journal of Digital Curation Issue 1, Volume 4 | 2009 6
. Australian researchers are able to discover, exchange, reuse and combine data from other researchers and other domains within their own research in new ways.7.Australia is able to share data easily and seamlessly to support international and nationally distributed multidisciplinary research teams(ANDS TWG 2007, p. 6).