Computational Reproducibility in The Wildlife Society's Flagship Journals

Scienti ﬁ c progress depends upon the accumulation of empirical knowledge via reproducible methodology. Although reproducibility is a main tenet of the scienti ﬁ c method, recent studies have highlighted widespread failures in adherence to this ideal. The goal of this study was to gauge the level of computational reproducibility, or the ability to obtain the same results using the same data and analytic methods as in the original publication, in the ﬁ eld of wildlife science. We randomly selected 80 papers published in the Journal of Wildlife Management and Wildlife Society Bulletin between 1 June 2016 and 1 June 2018. Of those that were suitable for reproducibility review ( n = 74), we attempted to obtain study data from online repositories or directly from authors. Forty ‐ two authors did not respond to our requests, and we were further unable to obtain data from authors of 13 other studies. Of the 19 studies for which we were able to obtain data and complete our analysis, we judged that 13 were mostly or fully reproducible. We conclude that the studies with publicly available data or data shared upon request were largely reproducible, but we remain concerned about the di ﬃ culty in obtaining data from recently published papers. We recommend increased data ‐ sharing, data organization and documentation, communication, and training to advance computational reproducibility in the wildlife sciences. © 2020 The Authors. The Journal of Wildlife Management published by Wiley Periodicals, Inc. on behalf of The Wildlife Society.

The scientific process can be generally described as developing a hypothesis that expands upon current scientific understanding, designing a study to test the hypothesis, collecting and analyzing data with appropriate and welldescribed methods, and disseminating the study's results and conclusions into the scientific body of knowledge. Thus, the nature of science is inherently iterative as researchers build upon the works of their predecessors. Integrity, transparency, and access are crucial aspects of this process, influencing the acceptance of findings, further research, and the application of results to real-world problems.
One cornerstone of the scientific process is the replicability and repeatability of research methods and results. Specifically, replicability is the ability to consistently reach fundamentally similar scientific conclusions whenever data are collected under similar conditions using similar protocols. Within the last 10 years, researchers have reported glaring issues in replication in several scientific fields, including cancer biology (Begley and Ellis 2012), experimental psychology (Open Science Framework 2015), economics (Camerer et al. 2016

), and
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. animal behavior (Wang et al. 2018). For example, in a comprehensive investigation of replication in social and cognitive psychology, Nosek et al. (2015) were able to replicate only 39 of 100 studies despite close collaboration with the original authors. Further, they reported that the strength of replicated effects averaged roughly half that demonstrated in the original studies. The study conducted by Nosek et al. (2015) highlights the need for repeated studies, as research findings must be considered preliminary until they are determined to apply to multiple situations and systems (Johnson 2002, Cassey andBlackburn 2006). A similar meta-replication study is unlikely to be conducted in wildlife ecology because it is expensive and often impossible to replicate large-scale or long-term studies or those conducted on rare or cryptic animals.
Replicability may not be generally feasible in wildlife ecology, but a reasonable partial test of the repeatability of studies in this field is reproducibility. Reproducibility is the ability of an independent researcher to use the same data and statistical analyses to reach the same conclusions as in the original study. For a study to be computationally reproducible, a reviewer needs access to the original data and the ability to follow the methods as described in the study to reach similar numerical results and conclusions as originally reported (Sandve et al. 2013). Reporting upon data collection, pre-processing, management, and analysis in a clear, well-documented, and appropriate manner is increasingly important as datasets continue to grow in size (Lewis et al. 2018). Sharing data and analysis code also engenders confidence in conclusions and can lead to an increase in citation rates (Vandewalle 2012, Piwowar andVision 2013). Even though reproducibility is considered a more feasible goal for research than replicability, it is not always easily attained. For example, even with a relatively straightforward analysis program (STRUCTURE), Gilbert et al. (2012) were unable to reproduce 30% of published results in population genetic analyses. Reproducibility may be hindered by a lack of access to data, missing or unclear methodology, or because of different decisions made when attempting to reproduce an analysis. A 2016 comprehensive survey of researcher perspectives on reproducibility in science reported that 52% of respondents stated that the reproducibility crisis was significant and 38% stated it was slight (Baker 2016). Only 3% of the respondents did not believe there was any reproducibility crisis. Furthermore, 33% of respondents had incorporated procedures for reproducibility in their own research endeavors within the previous 5 years, such as double-checking their work or asking others to review their process, but 34% of respondents had not established such procedures. The top reasons cited by respondents for the reproducibility crisis include selective reporting, low statistical power or poor analysis, and pressure to publish. The recognition of a general reproducibility crisis has prompted scientists in many fields to reflect on their own fidelity to reproducibility and scientific culture, and wildlife science is no exception. more discussion on open data and a shift towards a more collaborative culture in the wildlife sciences. The rapid increase in the volume and complexity of data has led to a nearly endless supply of new techniques to accumulate, store, and analyze data in wildlife ecology. Given the difficulty of reporting sufficient detail in a scientific paper for reproducibility of complex analyses, it has become even more important for authors to document and be willing to share their data, code, techniques, and specific decisions made during the analysis (Lewis et al. 2018). Our goal was to determine the extent of computational reproducibility for recently published wildlife research studies and to highlight specific ways that the discipline can enhance the transparency and integrity of its scientific research body.

METHODS
We randomly selected 40 publications each in the Journal of Wildlife Management (JWM) and in the Wildlife Society Bulletin (WSB). Eligible publications included research articles, reviews, invited papers, and notes that were published between 1 June 2016 and 1 June 2018. We selected the 80 papers from a pool of 417 eligible articles published between volume 80, issue 5 and volume 82, issue 4 for JWM (n = 242), and between volumes 40 and 42 for WSB (n = 175). We then assessed the suitability of each paper for review. We based suitability on whether each paper had quantitative results (e.g., numerical summary statistics, tables, or figures) that could potentially be computationally reproduced. For each of the studies that contained quantitative results, we searched the text, supplementary material, and online information to determine if data were readily available. If so, we downloaded those data and any related information (e.g., metadata, documentation, analysis code). If data were not readily available, we emailed a request for the data to the corresponding author 3 times, unless authors responded to the first or second request. Our requests for data were standard, explained our overall study, and promised to keep the identity of reviewed studies confidential. Concordia College's Institutional Review Board judged that our request to authors represented common practice of data sharing within the scholarly community and was thus exempt from needing Human Subjects Review approval. We assigned a primary reviewer for each paper for which we successfully found or were given data. We assigned studies by matching the statistical methods and specific analysis program cited in the paper, if any, with a primary reviewer's expertise, areas of research, and programming fluency (i.e., proficiency in R, MARK, SAS, or JMP).
Each reviewer thoroughly read their assigned papers to identify quantitative results that they would subsequently try to reproduce. Reviewers then explored the data and any available metadata, noting whether the data were preprocessed or raw and if the original analyses were conducted with code (e.g., SAS, R) or menu-based programs (e.g., JMP, Excel). We considered data to be raw if they needed processing prior to analysis. Processing included steps such as packaging data into input files for MARK or JAGS, calculating secondary variables like catch-per-unit-effort, or aggregating duplicate global positioning system (GPS) recordings. We then attempted to reproduce the results in each study by following the methods in the published paper using the original analysis software. If code was available, we referred to it as a guide and made sure to ultimately follow the methodology stated in the paper. We gave each study a 6-digit code to ensure author privacy. We used the codes and a Google form (Table S1, available online in Supporting Information) to populate a de-identified dataset with reproducibility scores and additional metadata associated with each study. We evaluated reproducibility in 3 ways: whether the original figures matched our reproduced figures, accounting for variation in the formatting of the figure (i.e., question [Q]11, Table S1); whether the numerical results cited in the tables or text matched our numbers, allowing for rounding error within the publication's significant digits (i.e., Q12, Table S1); and whether the conclusions reported in the original study were supported, regardless of whether specific numerical results matched (i.e., Q13, Table S1). We ranked these criteria on a 5-point scale with 5 indicating total reproducibility and 1 indicating complete lack of reproducibility (Table S1). We focused on these 3 reproducibility criteria with the idea that even if effect-size estimates varied slightly, the most important aspect of reproducibility would be to match the ultimate conclusions, which in TWS journals are often used to support management recommendations (Merrill 2015, Krausman andCox 2017). To reduce subjectivity in our scoring, a second reviewer read the reproducibility notes and scored the study, and we reported the overall reproducibility scores as the average of the 2 reviewers' scores. For reported proportions, we also provided 95% binomial confidence intervals (95% CIs) calculated using Wilson score intervals via the binconf function in the Hmisc package in R (Wilson 1927, Agresti and Coull 1998, Harrell 2019. Our data and analysis scripts are available for download or viewing on GitHub (www.github. com/aaarchmiller/reproducibility_in_wildlife_ecology) and the Data Repository for the University of Minnesota (https://doi.org/10.13020/jny1-wy60). These repositories include our reproducibility scores for each study, procedural notes that were generated during the review process, and the R code required to reproduce our study's results and figures. Data from the original studies that we collected during the review process are not available on the repositories and will not be shared.

RESULTS
We deemed 6 of the originally selected 80 studies unsuitable for review (Fig. 1). Of the remaining 74 studies, we were unable to review 42 because data were not available online and the corresponding author did not respond to any of the 3 email requests. Authors of 11 studies opted out for various reasons, such as wanting to save the data for a future manuscript (n = 1), not enough time to compile the data (n = 3), proprietary or confidential data (n = 5), or no reason given (n = 2). Authors from 2 studies agreed to send data but did not do so before our established deadline. We analyzed 19 of 74 studies (0.26, 95% CI = 0.17, 0.37) for reproducibility on our 5-point scale.
Of the studies that we were able to analyze, 8 of 19 were completely reproducible (i.e., conclusion reproducibility score of 5.0) and 5 more had scores that reflected at least some reproducibility (i.e., conclusion reproducibility scores of 3.0 to 4.5; Fig. 1). Six of the 19 studies (0.32, 95% CI = 0.15, 0.54) that we reviewed were not reproducible (i.e., conclusion reproducibility scores <3). For the degree to which figures were reproduced (i.e., Q11, Table S1), the average reproducibility score was 3.3 ± 1.6 (SD; Fig. 2). The overall average reproducibility score for the degree to which numerical results were reproduced (i.e., Q12, Table S1) was 3.2 ± 1.4 (Fig. 2), and the mean score for conclusions (i.e., Q13, Table S1) was 3.6 ± 1.6 ( Figs. 1 and 2). Given our small sample size, we were unable to attribute variation in reproducibility scores to any of the metadata we collected ( Fig. 2; Table 1).
Of the 11 studies that were not fully reproducible (i.e., scores of <5 on our 5-point-scale), we determined that 5 lacked sufficient documentation to reproduce their results. Examples of insufficient documentation included cases in which data or methods were inadequately described or subjective data processing decisions (e.g., choosing how to visually cluster GPS locations) were implemented without We determined studies to be either suitable or unsuitable for review. Of the suitable studies (n = 74), we attempted to get data from online resources or directly from authors. Of the studies that we were able to review (n = 19), we judged whether or not we could reproduce each study's conclusions on a 5-point reproducibility scale with 5 indicating total reproducibility. complete documentation. Other reproducibility failures were due to technical software compatibility issues (n = 1), an apparent transcription error in a table (n = 1), and incomplete data in an online shared repository (n = 1). We found what we believe was a discrepancy between the analysis described and that implemented by the study's authors in 1 paper. Two studies were mostly reproducible (i.e., scores of 4 or 4.5 on our 5-point-scale), but we found small discrepancies when we attempted to reproduce the effect sizes cited in the original papers.

DISCUSSION
We were moderately successful in reproducing those studies for which we were able to obtain data (13/19 studies), suggesting that authors who are willing to share their data, whether preemptively online or after being asked, were generally practicing reproducible data management and analysis. We were not able to evaluate non-response bias but suspect that this reproducibility rate is likely optimistic as it was based on papers whose authors may have been more prepared to share data because they were already following reproducible practices. A few patterns in studies with high scores point to ways to increase reproducibility. For example, a common feature of high-scoring studies was thorough documentation of data processing and analysis steps. Clear documentation, such as well-commented code and metadata, and explicitly stated analytical methods made it easier for reviewers to understand the data and follow subsequent analyses. We do recognize that an exhaustive description of methods, such as the steps used in menu-based analysis programs, is at odds with concise scientific writing. Thus, we recommend authors provide additional supplemental documents with specific code and steps required to reproduce all analyses. Generally, we recommend that wildlife ecologists review the many additional ways to improve reproducibility of data analysis, such as documentation of analysis program versions and creation of thorough metadata, described by Whitlock (2011), Sandve et al. (2013), British Ecological Society (2017), and Carey and Papin (2018).
There are 2 ways in which we may have failed to reproduce a study: the study's authors may not have provided enough data or documentation, or we, as reviewers, may have made mistakes in interpreting the original work. To avoid the second type of failure, we chose reviewers experienced with the software and analytic methods used in each study. We believe only 1 of the low reproducibility scores was due to differences between how we and the original authors implemented the analyses; all the other low scores were due to insufficient data, documentation, or technical issues.
Sharing well-documented data and analysis code offers a straightforward way to increase the reproducibility of wildlife science and has been encouraged by recent TWS editors   a We excluded 1 simulation-based study with no associated data from this analysis. b Applicable for studies with any code-based analyses. (Merrill 2014, Krausman 2018. These practices are also recommended, although not mandated, under the datasharing policy in the submission guidelines for both JWM and WSB. Our review was impeded by the lack of response or unwillingness to share data from our randomly selected study authors, and we were disappointed by our data recovery rate. In the context of this study, if more data had been readily available online, we would have had less need to contact the corresponding authors. Also, if more authors had shared their data, we probably would have been able to reproduce more studies. We also recognize that sharing data alone is not enough to ensure reproducibility (e.g., 1 of the datasets that an author archived in an online repository was incomplete), and it is not an option under some specific circumstances, such as with confidential or proprietary data. Anyone who has tried to document their analyses and share their data likely has realized that it is much easier to maintain a reproducible workflow from the beginning of a study than it is to retroactively compile and document the data and analyses after a manuscript has been prepared. The effort and time necessary to clean up a dataset is a common explanation for why archiving data and code is supported by many in theory but is not typically followed in practice (Nelson 2009). To ameliorate this situation, we believe that more training in the development and maintenance of reproducible workflows would benefit the discipline as a whole. Two of the authors of this study (AAA and JF) recently led a workshop on reproducible practices in data analysis at a national TWS conference. Attendees anecdotally reported that they did not have access to such training in their undergraduate or graduate programs. Likewise, the majority of respondents to a survey on data sharing selfreported that their institution did not provide training or incentives for responsible data management (Tenopir et al. 2011). Notably, in the reproducibility crisis survey recently reported by Baker (2016), 3 of the top 4 actions that respondents thought would increase reproducibility related to better training, mentoring, and supervision.
Although additional training in data management and documentation would certainly help advance reproducibility within the wildlife sciences, it may not be enough to resolve the reproducibility crisis without a stronger commitment to data sharing. Meticulously collected and organized data do relatively little to advance scientific understanding if they are stored on someone's hard drive until he or she retires, after which they may simply be lost (Whitlock 2011). We understand that many researchers feel protective of their data, whether because the data are proprietary, confidential, involve regulated threatened or endangered species, or because researchers have invested so much effort and time in collecting the data. For the former 3 reasons, there are ways to share aggregated, summary, or de-identified data that would at least begin to improve reproducibility (Alter andGonzalez 2018, Pérignon et al. 2019). For authors concerned about receiving credit for their hard-earned data, we note that the availability of data has been associated with higher citation rates (Vandewalle 2012, Piwowar andVision 2013) and may lead to new collaborations. Public archiving of data also supports ecoinformatics, which is the process of compiling ecological data to draw larger-scale conclusions than possible with individual research studies (Dengler et al. 2011, Michener andJones 2012). We recommend sharing data through an archival service that provides a stable Digital Object Identifier (DOI) and citation, such as the broadly available Dryad (datadryad. org) server. Researchers may also have concerns about losing control of information once it becomes publicly available, resulting in data being used inappropriately (Merrill 2014). We suggest that complete and thoughtful documentation and metadata should help mitigate the potential for accidental misuse of information; however, further discussion on strategies to minimize data misuse is warranted as data archiving becomes more common.
Publishers can also encourage data sharing in a few ways, some of which may require new approaches to publishing. Our recommendations complement the comprehensive report by Lin and Strasser (2014) who argued that increased data access is an integral part of research transparency and integrity. Data sharing engenders trust: a group of TWS advisors stated, "… the stipulation that datasets are published with articles will increase transparency" in a report commissioned by the TWS president J. Haufler in 2013 (Kroll et al. 2014:6), and Vasilevsky et al. (2017) reported that strong data-sharing requirements led to higher journal impact factors in the biomedical field. Thus, we propose that public archiving of data (unless legally or otherwise prohibited) be a requirement for publication, rather than a suggestion, and that publishers should also require thorough data documentation. If the policy of a journal is instead to require that data are made available upon request, then we recommend that publishers require a data storage and access plan to demonstrate that data would be readily and permanently available if requested. Data and code could also be made available to reviewers during the peer-review process regardless of whether data are otherwise publicly available. As described by Merrill (2014), journals that require statements of data-availability in addition to data submission have increased data access compared to journals that require data submission without explicit documentation. Data sharing can be incentivized, such as with reduced open science publishing fees or reduced page fees for those who readily share welldocumented data. Another way to incentivize data sharing could be to develop a publisher-controlled archival service to create accessible and documented data products with associated digital object identifiers (DOIs). Whether or not data sharing is required, corresponding authors should be required to directly acknowledge that they are responsible for vouching for the integrity of the data and analysis indefinitely and for providing stable and monitored email addresses.
Where data sharing is not possible or authors prefer to make data available only upon request, our study highlights the general need to improve communication among wildlife professionals. We were surprised by how difficult it was to get data from authors, and especially how hard it was to contact them in the first place. Of the 68 studies whose authors we sought to contact via email for data, we never heard back from the authors of 42 studies. In 24 of the 68 studies, the corresponding author had changed affiliation after the paper had been published and we needed to identify a new email address. Authors responded to requests sent to the updated email address in 13 of 24 cases, although some (n = 7) declined to share data. The lack of response to multiple emails suggests that our messages went to obsolete or neglected email addresses, were ignored, or were mistaken as fraudulent. Wicherts (cited in Stokstad 2018) also reported a low data recovery rate; 73% of authors (from 141 studies) did not reply or refused to share their data. It is notable that Wicherts reported to Stokstad (2018) that all these studies were published in journals that encouraged data sharing and open data access. Similarly, Alsheikh-Ali et al. (2011) reported that authors from only 9% of 351 studies published in high-impact journals with data-sharing requirements shared raw data online. Although JWM encourages open science and data sharing (Krausman 2018), only 9 of our target 74 studies had associated data available online. We can do better as researchers by providing and maintaining stable email addresses for corresponding authors or by linking identities, manuscripts, and current contact information with the ORCID (Open Researcher and Contributor ID) database. Beyond benefitting future researchers, a commitment to increasing data access and communication is of critical importance to the advancement of wildlife science and management.
The obstacles to progress and clarity are not insurmountable. We believe that if we all collectively commit to serving as individual stewards of wildlife science by implementing more reproducible workflows, increasing transparency, providing access to data, and ultimately fostering trust and cooperation, we can continue to stand as a discipline that believes in and advances scientific integrity for the long-term good.