RESEARCHER’S WILLINGNESS TO SUBMIT DATA FOR DATA SHARING: A CASE STUDY ON A DATA ARCHIVE FOR PSYCHOLOGY

Data sharing has gained importance in scientific communities because scientific associations and funding organizations require long term preservation and dissemination of data. To support psychology researchers in data archiving and data sharing, the Leibniz Institute for Psychology Information developed an archiving facility for psychological research data in Germany: PsychData. In this paper we report different types of data requests that were sent to researchers with the aim of building up a sustainable data archive. Resulting response rates were rather low, however, comparable to those published by other authors. Possible reasons for the reluctance of researchers to submit data are discussed.


INTRODUCTION
The importance of data sharing has been recognized by many scientific organizations that, in consequence, have devised data sharing policies, declarations, or recommendations, for example, the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003); the OECD Declaration on Access to Research Data from Public Funding (OECD, 2004); and the final report of the High Level Expert Group on Scientific Data (Wood, Andersson, Bachem, Best, Genova, Lopez, et al., 2010). For economic and quality-assurance reasons, research funding agencies in the US already have data sharing policies at work (National Science Foundation-Directorate for Social, Behavioral and Economic Sciences, n.d.;US National Institutes of Health, 2003). In Germany, the current largest research funding organization in the country, the German Research Foundation (DFG), demanded in its Recommendations of the Commission on Professional Self Regulation in Science -Proposals for Safeguarding Good Scientific Practice (DFG, 1998: recommendation 7) that "Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin." A few years later these recommendations were stipulated in the Recommendations for Secure Storage andAvailability of Digital Primary Research Data, January 2009 (DFG, 2009) published with more detailed advice on the storage, documentation, quality, and dissemination of data.
Whereas the main intention of the Proposals for Safeguarding Good Scientific Practice (DFG, 1998) is the short to medium term storage of research data to reinforce transparency and prevent misconduct in science, the recommendations of 2009 (DFG, 2009) go beyond this. They aim to guarantee the availability of research data in the long run for future analysis such as in comparative historical analysis. Research data should be preserved not only for research quality control but also as a scientific cultural heritage.
Scientific associations in the field of psychology support the availability of scientific data. The American Psychological Association (APA) declared in its ethical principles that the psychologist should "… not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis …" (APA, 2010). This principle was adopted literally in a German Version of the ethical principles by the German Psychological Society (DGPs) and the Association of German Professional Psychologists (BDP, 1999(BDP, /2005. Scientific associations as well as funding organizations require the preservation and dissemination of research data. But what about the actual research practice? In psychology many scientists criticize the absence of a functioning data sharing culture (Breckler, 2009;Wicherts, Borsboom, Kats, & Molenaar, 2006;Wicherts, Bakker, & Molenaar, 2011;Weichselgartner, 2008).
To support researchers in fulfilling their responsibility for keeping their data available regardless of their personal resources and data management practices and to offer a place of deposit for data sets that are suitable for use in research and teaching, the Leibniz Institute for Psychology Information (ZPID) in 2002 started developing an archiving facility for primary research data in psychology. This work was partly funded by two consecutive grants from the German Research Foundation. In 2010, PsychData (www.psychdata.de) was accredited by the German Data Forum (www.ratswd.de/en) thus becoming an established research data infrastructure for the social, economic, and behavioral sciences in Germany. From the beginning, PsychData contacted researchers to motivate them to deposit data with the aim of building up a sustainable data archive. Many data requests were made, but the response rates were rather modest. In the following we will discuss the causes for this level of response from psychology researchers.

THE PSYCHOLOGICAL DATA ARCHIVE, PSYCHDATA
The psychological research data center PsychData serves as a data sharing platform enabling researchers to archive and disseminate their data within the scientific community. Moreover PsychData supports scientists in data documentation and management.
The development of PsychData began in 2002 with support from the German Research Foundation (DFG). In the following year archiving of the first data sets began. In spite of low financial and personal resources compared with other data archives, PsychData has developed an infrastructure that has gained international attention (de la Sablonnière, Auger, Sabourin, & Newton, 2012; Ruusalepp, 2008).

Data depositing at PsychData
Quantitative digital research data from all areas of psychology are archived in the PsychData archives. Depositing data at PsychData means high quality standards are in practice. PsychData adds value to the data, providing a qualified data documentation and ensuring data quality.
The minimum criterion for data deposits at the archive is the existence of a peer-reviewed publication based on the data. Before data submission, depositors must sign a license agreement to establish the terms and conditions of use of their data collection. This license agreement ensures the researcher's data copyright. With very few exceptions in industrial, administration and ministry contract research (which is at least in part confidential) there are no copyright constraints for funding agencies in German-speaking countries (i.e., Austria, Germany, Lichtenstein, parts of Switzerland, and Luxemburg) because their research is financed by public funds.
All data deposits should include the documentation necessary to describe and interpret the data completely. PsychData provides a documentation form that researchers can use to create specific domain metadata based on the established international standards Dublin Core and DDI. The Dublin Core Metadata Initiative (DCMI) element set is a standard aimed at facilitating descriptions and searching resources using the Internet. The Data Documentation Initiative (DDI) standard provides a set of rules specifically for describing social, behavioral, and economic data.
For all the submitted material, PsychData creates complete data documentation, thereby ensuring the long-term interpretability and usability of the data. The research data are revised for consistency and completeness. If necessary the data depositor is contacted to correct inconsistencies or answer open questions. After a final revision of the documentation by the data depositor, the research data are made available. In cooperation with DataCite, a Digital Object Identifier (DOI) is assigned to guarantee permanent citability. Likewise, the published research data can be adjunct to the publication list of the data depositor.

2.2
Finding and re-using data at PsychData When searching for data, a researcher can browse or search for study descriptions on the PsychData website. The research data are available to other scientists for research purposes only. A restricted data use agreement has to be signed to ensure the confidentiality of respondents and author rights. The data set(s) and corresponding documentation are sent to the data user by mail via compact disc (CD-R).

Selection criteria
The number of data sets archived is growing at a slow but steady pace. The active acquisition of data sets has played an important role from the beginning. While the minimum criterion for data deposits is the existence of a peer-reviewed publication based on the data, specific selection criteria were implemented for active acquisition. These selection criteria are based on the principle that PsychData should mainly preserve psychological data sets of unique value for the psychological research community (Weichselgartner, Günther, & Dehnhard, 2011).
Valuable psychological data sets are seen as a kind of "cultural heritage" and should be of particular interest for re-use. In particular the following data sets are considered important for psychological research:  Data of longitudinal studies because of their lengthy survey period (several years to decades), their often complex design, and their historical significance.  Data of large-scale cross-sectional studies that would be difficult to replicate because of the extensive effort in conducting.  Data collected for the standardization of psychological inventories. These normative data are an important reference measurement in the application of psychological inventories, for example, in diagnostics.  Epidemiological studies of psychological and psychosomatic diseases. These data are relevant for general research and for historic or cultural studies.  Data assessed during historically unique conditions. These data cannot be replicated and gain their relevance at the least in terms of the preservation of cultural heritage.
Up to now, these selection criteria were applied to carry out different acquisition procedures at irregular intervals. In the future they may be changed, that is, liberated to get a broader selection of researchers to be contacted for their data.

ACQUISITION PROCEDURES 2002-2012
Two different kinds of acquisition procedures were implemented: 1. General data requests combined with information letters to introduce the archive to the scientific community. 2. Requests for concrete data sets that were identified due to the selection criteria mentioned.

Data requests in general
At the end of 2001 with the start of PsychData, 811 psychology researchers were contacted, informed about the project, and asked for data submissions. Because of the initial project funding a small compensation could be offered for preparation and documentation of the data sets. A total of 17 researchers (2.1 %) deposited data at the archive immediately. One author contacted the archive seven years later and submitted six data sets altogether.
At the beginning of 2004, the psychological community in general was informed about the project via email. An information letter including a general request for research data was sent to 1305 psychological scientists in Germany, Austria, and Switzerland. Following advice from the DFG, information about PsychData was added in the same year when a new research project in psychology was accepted for funding by the DFG. From the 1305 psychologists contacted, 15 answers were received, seven of them positive reactions. None resulted in a data deposit. Only one researcher showed interest in data sharing because of the information added via DFG, but in the end no data were submitted.
For the database PsychAuthors (www.psychauthors.de) and the data archive PsychData, a common acquisition procedure was developed in 2011. PsychAuthors is a "who's who" of German psychology: researchers in psychology can create an author profile, which contains a complete publication list as well as information about the current place of employment, professional career, research and teaching interests, professional positions, and activities in the scientific community. The information is updated regularly.
The joint acquisition was addressed to two groups of researchers: The first group consisted of 49 psychologists whose 60 th , 65 th , or 70 th birthday was announced in the "Psychologische Rundschau", a well-known German psychological journal. Because these researchers were just pre or post retirement, a certain interest in preserving their own "scientific life's work" was supposed. The second group consisted of 129 postgraduates whose theses were completed in 2010 and added to PSYNDEX (www.psyndex.de), the German reference database of psychological literature and tests. In this younger population of scientists more openness to new media, networking opportunities, and data sharing was expected. Both groups were informed via mail about the services of PsychAuthors and PsychData. Only one person showed interest in the possibility for data archiving at PsychData, but no data deposit resulted as the data had been already published in another repository.

Requests for concrete studies
In addition to the general data requests combined with information letters, individual data requests were sent asking researchers for certain data sets in order to build up a relevance database.
In 2003 and in 2010 appropriate studies for the data archive were identified by conducting searches based on the selection criteria in all publications documented in PSYNDEX from 1992 through subsequent years. The identified studies were long-running longitudinal studies and short-running longitudinal studies on selected topics, twin studies, established questionnaires, and cross-sectional studies with a sample size greater than 1000 from the fields of clinical psychology, developmental psychology, educational psychology, psychogeriatrics, and organizational psychology. The 89 authors of these studies were asked for data submissions by personal request and also sent a leaflet with more detailed information about PsychData and a feedback form. From the authors, 34 answers were received, about 16 of them (17 % of the contacted researchers) showing interest in data archiving and data sharing; eight (9 %) data deposits resulted, one of them consisting of a longitudinal study with about 300 data sets.
In 2009, 11 authors of well-known and widely used psychological questionnaires for which new standardizations had been conducted were asked to provide the data from the older standardization samples. Standardization data for psychological tests and measures are of particular significance for psychological research. The responses, however, were for the most part negative (82%) because the data had been already deleted or lost. Only two authors showed interest in data archiving, and only one of them submitted data.

Total response rates
In the years 2003 to 2011 a total of 2302 information letters were sent by mail or electronic mail, providing information about PsychData services and combined with a data request. These more general requests resulted in 21 positive reactions and 18 data deposits (0.8 %).
Between 2003 and 2010 personal requests asking for concrete data sets were sent to 97 researchers. A total of nine (9.3 %) data deposits resulted. These reactions to data requests are similar to those reported by other psychologists in recent times: for a (planned) meta-analysis Wicherts et al. (2006) only received data sets from 11% of the contacted authors; after one reminder, the number increased to at least 27%. Also, for a metaanalysis Botella and Ortego (2010) asked the authors of 109 studies for their research data but obtained data from only 13 studies (12%).

DISCUSSION
Several reasons can be discussed for the lack of responses to the data requests: 1. Some authors answered that "it is still too early" to share the data because they hadn't finished analyzing the data (see also Wicherts et al., 2011;Koslow, 2000). In a survey in the Netherlands, 39% of psychologists indicated that research data should only be provided upon the elapse of an embargo period after the first publication (Voorbrood, 2010).
Data Science Journal, Volume 12, 4 December 2013 2. For another group of authors, data requests came "too late" as the data were not understandable, readable, or were even lost due to inadequate documentation and archiving practices (see Freedland & Carney, 1992;Wicherts et al., 2006;Wicherts et al., 2011;Voorbrood, 2010). 3. Another reason for withholding data could be that data are seen as a personal "valuable/precious good" (Koslow, 2000). 4. Researchers might be afraid that another scientist could discover something new in the data they themselves did not see (Koslow, 2000), for example, by using new statistical methods. 5. Another obstacle to data sharing could be the fear that weaknesses or errors in the statistical analyses would be discovered (see Wicherts et al., 2011;Ceci & Walker, 1983). 6. Very rarely were demands for "more" open access expressed, particularly by younger scientists.
They would like to provide their data directly via download without any restrictions such as signing a license agreement. These demands are unrealistic since psychological research data usually include personal information underlying data protection regulations.
Though there might be many arguments against data sharing, each of them can be rebuffed.
With regard to the "perfect timing" of data sharing, Koslow (2000) argues that the publication of research results implies the completion of the whole research data analysis and the underlying data should then be available for re-analysis. PsychData gives a certain flexibility to scientists willing to submit their data: A supplementary agreement can be signed that allows the data depositor to set an embargo period (maximum five years) during which data can only be made available to a user when the author has given his personal agreement.
Deficiencies in data documentation and management are repeatedly mentioned as a serious problem in sharing data. Retrospective documentation is often associated with a heavy workload, sometimes information is not reconstructible because records, notes, or former employees needed to shed light on the bygone material have disappeared. To support researchers, PsychData has developed a web-based tool that enables the coding, description, metadata generation, and long-term preservation of research data. Automatic validity checks are realized to guarantee a minimum quality standard for the research data as well as the documentation. After registration, the tool can be used for self-archiving purposes. Optionally, data can be transferred to PsychData and thus be made available to the scientific community.
Researchers who consider their data as a private good are reminded that most data that are part of a research project are financed by public funding. Thus it should be considered a "public good" rather than a "personal property". On the other hand, the time and effort that researchers often spend in the development of the research design, methodology, and data collection coordination are disregarded. These in a sense "creative" aspects of data generation may be the reason for considering data one's "own property". As opposed to publication, the creation and sharing of data do not provide compensation for researchers. Recently, calls for a greater recognition of scientific effort in data production (see Nature, 2009, Cambon-Thomsen, Thorisson, & Mabile, 2011 are signaling possible changes in the practice of gaining scientific reputation. Citation of research data is seen as the crux of receiving credit for data creation and sharing. Already organizations such as DataCite (www.datacite.org) have created significant momentum. DataCite is an international consortium bringing together data centers to assign persistent identifiers to data sets with the aim of supporting data citation, data discovery, and access. Thomson Reuters released a Data Citation Index in 2012 in order to make data visible and accessible from data repositories across disciplines and around the world (Thomson Reuters, 2012). Possibly these innovative developments will foster the importance of data production and data sharing. To give credit to data producers, all data sets archived by PsychData are assigned a digital object identifier (DOI) via da|ra (www.da-ra.de/en), one of the DataCite registration agencies. By signing a restricted data use agreement, data users are obligated to cite the data sets in their publications.
The discovery of new research results as well as the detection and correction of statistical inaccuracies by anyone other than the principal investigator actually are not arguments against but in favor of data sharing. The primary objective of a scientist is to explain, describe, or even predict real phenomena as comprehensively and accurately as possible. Having the databases of a study available would help to clarify inconsistencies and thus provide clearer scientific results, especially if they are needed for decisions in politics and society (Botella & Ortega, 2010;Nature, 2006). Data sharing as a regular part of scientific culture would lead to a better quality of research results (Wicherts, et al., 2011;Hedrick, 1985). Hedrick explains (1985, pp 130-131): "Although researchers are assumed to carefully check the accuracy of quality control, the pressure to work and publish quickly, the complexity of many analytic techniques, and simple human fallibility not surprisingly sometimes result in errors. Once again, an acknowledged policy of open access to data and the attendant risk that one's mistakes could be publicly exposed might increase the attention researchers give to their work and, therefore, might improve its quality. Both the scientific community and society would benefit from any reductions in errors resulting from an open-access policy."

CONCLUSIONS
Although PsychData implemented different acquisition procedures, response rates were rather low. The reactions received are similar to those of researchers who tried to access data from other scientists for reanalysis. Response rates are even lower compared with those of individual scientists requesting data for a certain research project. In the future the data selection criteria for PsychData should be reconsidered. Much of the data collected in the empirical sciences do not possess the characteristics of unique value and potential historical significance. Perhaps it would be good to choose from a larger pool of researchers when contacting them about sharing their data following a liberated selection criteria strategy.
Sharing data via a data archive means sharing it with the whole scientific community, a fact that might be more "frightening" to researchers than leaving data to an individual researcher with a clearly defined research objective. Nonetheless it should be mentioned that data submitted to the archive include longitudinal studies consisting of more than 400 data sets altogether.
As a consequence of the unwillingness of researchers to share their data, some authors argue for mandatory data archiving policies (see Wicherts, et al., 2011;Wicherts, et al., 2006). Data archiving is suggested to be a part of the publishing process (Wicherts, 2006) or of test reviewing and test evaluation processes (Fahrenberg, 2012 If mandatory policies of data sharing were to be established, who should be responsible for data storage, archiving, and dissemination? Neither the individual researcher nor the research institutes or universities hold the resources to ensure data availability in the long term. If journal publishers or commercial services would be responsible for data sharing, probably additional costs for data publishers and/or data users would arise. Additionally, depositing data could possibly mean the relocation of the copyright from the scientist to the journal publisher or even a commercial service, which is hardly desirable, specifically if the study was financed by public funds. There are very few exceptional cases of publication outlets (e.g., InterStat), which up to now abstain from copyright transfer. In addition, these enterprises are of little relevance for basic and applied research in psychology in the German-speaking countries.
It must be noted that, by far, most of the psychological research in the German-speaking countries (Austria, Germany, Lichtenstein, parts of Switzerland, and Luxemburg) is financed by public funds that originate in the resources of state universities (there are very few private universities currently in these countries), national research institutes (financed by taxes but under the control of the sciences), public endowment funds, or public organizations of the scientific community (e.g., Deutsche Forschungsgemeinschaft, DFG, the German Research Association and similar organizations in Austria and Switzerland, which are financed by direct and indirect taxes but which are under autonomous control and organization of the sciences). The few exceptions are found in contract research, which is financed by industrial firms, ministries, or administrations and is somewhat confidential. Such confidential data in applied psychological research refer mostly to psychopharmacology, technology, and military domains and are excluded from data sharing per definitionem because the research results are not published at all and are not available to the public. Therefore, with the above exceptions, data centers financed by public funds can be the appropriate solution as they hold the expertise to guarantee the quality and re-usability of data sets in the long run.
But are mandatory policies really the best solution? Academic freedom relies on the self-regulatory abilities of the scientific community. Following the principles of openness and transparency of research, reviewers of journal articles, books, funding proposals, or conference submissions could require data sharing as a precondition for acceptance. Data sharing could also be taken into consideration by members of appointments committees. These self-regulation activities presuppose that data sharing has already become a regular part of the "research culture" in a scientific discipline.
To change the data sharing culture, more scientific recognition of data production is necessary (Cambon-Thomsen, et al., 2011;Koslow, 2000). Data sharing must be a gain for the scientist, not an additional, nonprofitable workload. This aspect is particularly important because (in contrast to a mandatory obligation) incentives would not only increase the self-motivation for data sharing but may also stipulate the quality of the data.
Finally, a fundamental precondition for a data sharing culture is the improvement of the practices of data documentation and management. Data sharing requires efficient data management during the whole data lifecycle. Data documentation at the beginning of the research project means a relatively small investment of time and workload but adds considerable value to the research. Consequently, the development of standards and data management tools in close collaboration with the research community represents a future challenge.