The Replication Database: Documenting the Replicability of Psychological Science

In psychological science, replicability—repeating a study with a new sample achieving consistent results (Parsons et al., 2022)—is critical for affirming the validity of scientific findings. Despite its importance, replication efforts are few and far between in psychological science with many attempts failing to corroborate past findings. This scarcity, compounded by the difficulty in accessing replication data, jeopardizes the efficient allocation of research resources and impedes scientific advancement. Addressing this crucial gap, we present the Replication Database (https://forrt-replications.shinyapps.io/fred_explorer), a novel platform hosting 1,239 original findings paired with replication findings. The infrastructure of this database allows researchers to submit, access

(1) BACKGROUND In scientific research, almost every new hypothesis is based on previous findings; this epistemic connectedness is a core feature of science (Hoyningen-Huene, 2013).Scientific replication -the process of retesting a hypothesis with new data to determine whether the original study's conclusions can be supported (Parsons et al., 2022)-is essential for building a robust body of knowledge and ensuring the integrity and reliability of scientific research.From a theory-driven perspective, if the findings on which a theory has been built cannot be replicated, the theory needs to be discarded or modified.From a phenomenon-driven perspective, replication failures can shed light on important confounding factors that need to be addressed for the phenomenon or "effect" to be detected (e.g., Calder, Phillips, & Tybout, 1981).From an efficiency standpoint, it is important to know which scientific findings are replicable to ensure optimal allocation of resources and strategic steering of future work.Finally, replicability is an important part of building a more coherent body of evidence capable of informing practice and policy as a way to test the generalizability of a theory or procedure, especially in the causal claim of the theory (Syed, 2023).This can be done by more rigorously testing the heterogeneity of an effect through replication (Bryan et al., 2021;Syed, 2023).Robustness of effects through replication is one way to increase the quality of evidence for policy making (Brown et al., 2014).As a consequence, a lack of emphasis on replication research or reduced visibility of replications can hinder scientific progress and contribute to unnecessary waste of resources.
We propose that continually and transparently tracking replication attempts in an organized and systematic way can increase trust in science, promote the development of robust theory-driven research, and optimize the use of academic and institutional resources.For this tracking, we have created the Replication Database.Our database will provide researchers, educators, students, and practitioners with systematized and low-barrier open access to previous findings.Thereby, it will help reduce the waste of research resources, as the results of studies traditionally considered as "unsuccessful" are often not published and land in the metaphorical "file drawer" (e.g., Kulke & Rakoczy, 2018).By using a public and crowdsourced database for replications, researchers may further circumvent journal gatekeeping (Mynatt et al., 1977;Sterling, 1959).Moreover, a replication database could be used by researchers to monitor and evaluate meta-scientific factors that may affect replicability, contributing to both the theoretical development of metascience as a discipline and evidence-based reformations improving replication research and its evaluation.For example, this curated resource of replication attempts could be the first step in the development of standards and guidelines to determine when an effect or non-effect can be considered 'replicable', ensuring clearer, multidimensional, and more nuanced understanding and definitions when we talk about "failed" or "(not) replicated" effects.
Therefore, we aggregated, transformed, and expanded datasets from large-scale replication attempts (e.g., Open Science Collaboration, 2015), publicly available lists of replications (e.g., LeBel et al., 2018; CurateScience, https:// web.archive.org/web/20220128104303mp_/https://curatescience.org/app/replications), and individual replications conducted by ourselves or other researchers, with the ultimate aim to create a comprehensive replication database.Although the inclusion criteria for the database are not limited to psychology, most of the existing entries are based on original studies published in psychology journals.The current report provides a snapshot of 1,239 replication findings entered into the database.However, the database is intended as a living resource, and we are committed to updating it regularly as more replications occur to unceasingly facilitate finding, publishing, teaching, monitoring, and analyzing replications.
Researchers can freely use the dataset and/or an interactive Shiny Application (https://forrt-replications.shinyapps.io/fred_explorer,see Figure 1) to search and analyze the data.In addition, the Replication Database provides a short guide on the best practices of understanding replications, discussing key topics around replicability, such as: What is the overall replication rate?What features characterize successful replication attempts?What attributes are associated with original studies that are replicable?How do replication rates vary over time and across fields?These could be used as additional introductory teaching and learning resources.

STUDY DESIGN Inclusion Criteria
Inclusion criteria for the Replication Database were chosen liberally a priori.According to Hüffmeier et al. (2016), every study that tests the same hypothesis as a previous study could be deemed a replication.In our case, we required studies to specify which original study they had planned to replicate.As for research areas, studies from all social, cognitive, and beharioval sciences as well as medicine can be entered and validated.
The liberal definition of what constitutes a replication leads to variance in the closeness of replication studies.For example, some may reuse the same instructions, items, and analysis code, while others "merely" test the same hypothesis with newly created materials, in another language, and with a different type of sample.To capture these differences, we included optional variables about the similarities between original and replication study.These stem mostly from the Replication Recipe (Brandt et al., 2014).
Apart from an open-ended variable where all differences can be explained and evaluated, specific variables let researchers indicate whether the closeness of instructions, measures, stimuli, etc. is "exact", "close", "different", whether it cannot be evaluated ("does not apply") or whether it is "unknown".Arguably, we cannot define for all possible cases whether changing the language of a validated questionnaire should be considered close, which is why we have to rely on contributors to make informed assessments and specify the differences in the open-ended question.We advise researchers using these variables, to let further people code the variables and assess inter-rater agreement.
Most replication studies feature a limited number of focal hypothesis tests that can be paired with tests from previous studies (e.g., two paired standardized effect sizes).The database structure allows for entering multiple results per sample so that results from structural equation models, functional magnetic resonance imaging (fMRI) data, or other types of data may also be entered (see also section "Database Structure").For completeness, we also decided to include results from studies that cannot be converted to correlation coefficients (e.g., Cramer's V, Hazard Ratios, Bayes Factors).These cannot be included into metaanalyses or other kinds of quantitative summaries but are displayed when searching the database (e.g., via the reference list annotation tool).Finally, entries can optionally include test statistics, from which standardized effect sizes can be calculated.

Database Structure
The dataset has a multilevel structure (see Figure 2).Each row represents one phenomenon or effect (e.g., "Facial redness increases perceived anger"), for which the original finding's reference, the replication study's Note.Researchers can access the database to filter findings (e.g., for statistical power, validation status) and search among the entries.On the "Replicability Tracker" tab, replication rates for all selected findings are visualized.The high number of findings in the Figure is due to a more recent dataset included on the website.reference, study numbers (when an article features multiple studies), standardized effect sizes, and sample sizes are coded.Additional metadata variables (e.g., differences between replication study and original study, journal that published the original study) are optionally coded.
In cases where a single replication study replicated an original effect in multiple ways (e.g., with several different items), we recommend documenting each effect separately for thoroughness, although this is not feasible for all projects (e.g., if results are only shared in an aggregated way as in Vaidis et al., 2024).
The database structure accommodates various complex scenarios such as multiple independent replications of the same original study, one single study that replicated multiple original studies, or one replication of two different original studies.Several frequent scenarios are discussed in detail below and depicted in Figure 2.

One Single Study that Replicated One Original Study
In the least complicated case, there is one replication attempt entered into the database that corresponds to one original study.For example, Simmons and Nelson (2019) replicated Study 1b from Jami (2019) and reported the average effect size (effect sizes for all items separately are only visible in a plot).Thus, in the database, the average effect for each study is entered as one row.
If effect sizes for each of multiple items were coded, each pair of original and replication effect sizes would correspond to one row in the dataset and each row would be assigned the same values for the variable id_sample.If, for example, there is an entire correlation matrix for the pair of original and replication study, each pair of correlations will be entered in one row.Finally, if effect sizes for the original items plus a new item (i.e., an extension) are available, there can be five entries with the extension being coded as differing from the original study.
More complex studies may also nest replication effects of items or dependent variables in hypotheses (i.e., effect sizes are available for multiple dependent hypotheses and dependent variables).In the database, hypotheses and items can be specified in the "description" variable.As for collapsing or aggregating, coding was guided by what original effect sizes were available (e.g., ideally, every replication effect should be matched with an original effect).

Multiple Independent Replications of the Same Original Study
Independence of tests can refer to samples consisting of different people or studies stemming from different laboratories.In the Replication Database, we refer to independence of samples.In the case of registered replication reports (e.g., Bouwmeester et al., 2017), one original study is replicated by many different laboratories.In such a case, each laboratory's replication effect size is entered into the database with different values for the variable id_sample.The same pattern emerges if an effect is replicated by different laboratories.Note that for registered replication reports, it is also possible to "only" enter the aggregated replication effect size into the database (e.g., Vaidis et al., 2024 only shared the aggregate effect size in their report).
Note that the database entries' references are also supplemented by study number if more than one study is included in either report (e.g., "Cheung, B. Y., & Heine, S. J. (2015).The double-edged sword of genetic accounts of criminality: Causal attributions from genetic ascriptions affect legal decision making.Personality and Social Psychology Bulletin, 41(12), 1723-1738.Study 3" [emphasis added]).We plan to disentangle references and study numbers in the future (i.e., code them as two separate variables instead of one merged variable).

One Single Study that Replicated Multiple Original Studies
Occasionally, data is collected in one study (or in other words, from one sample) and used to test multiple hypotheses.For example, Soto (2019) collected data from N = 1,504 participants to compute 78 correlations for which previously published estimates had been available.In the Replication Database, these findings are represented as 78 rows that all have the same values for the variable id_sample and different original references, effect sizes, and descriptions.

One Replication of Two Different Original Studies
If a replication report does not specify which original study it strives to replicate, the replication findings cannot be entered in the database.If, however, the replication is a replication of multiple original studies, several options arise: First, if for example, an original study has been replicated and now a second replication study is conducted, both replication studies are coded as replications of the original study.If, however, the first replication study introduces new features (e.g., the experimental manipulation has been altered) and the second replication study sticks with the alteration, it can be coded as a replication of the first replication.In a case, where a replication is a mix of two original studies (e.g., items from both original studies were mixed), the replication findings are entered twice (i.e., one time for each original study).This duplication can be identified via identical values in the variable id_sample.The upside of duplicating entries this way is that users of the database can find the replication via both of the original studies.Note that such cases are very rare.

Effect Size Conversion
The dataset includes effect sizes that were reported in the original and replication studies and -where possible -effect sizes converted to correlation coefficients to achieve commensurability.Effect sizes were converted to Pearson correlation coefficients using R (version 4.3.2;R Core Team, 2018) with the packages esc (Lüdecke, 2018), metafor (Viechtbauer, 2010), and psychometric (Fletcher, 2022).Data was further processed with: dplyr (Wickham et al., 2018), lubridate (Grolemund & Wickham, 2011), pwr (Champely, 2020), and openxlsx (Schauberger & Walker, 2021).The code to convert entries from the submission portal to the database structure (see section "Submission of Individual Entries") is freely available on the OSF at https://osf.io/2rv9z.
We kept the original effect sizes.In addition, we converted Odds Ratios, Cohen's d, η², R², and Cohen's f to correlation coefficients.φ coefficients were used as correlations without conversion (no conversion needed).Standardized regression coefficients, Cramer's V, Bayes Factors, Hazard Ratios, Cohen's q, Risk Ratios, Spearman's Rho, and Kendall's Tau were not converted and can thus not be included in meta-analysis of effect sizes (see Table 1).Effect sizes were coded as reported in the research articles (reported effect sizes) and remained unchanged.For converted effect sizes, original effect sizes were coded to be positive.To maintain uniformity of interpretation, replication effect sizes were matched so that positive values indicate effects in the same direction, while negative values indicate reversals (i.e., the replication study shows an effect size opposite to that of the original study).For example, if the original effect size was r original = .24and the replication effect was r replication = -.04,no changes were made.If, however, r original = -.60 and r replication = .01,converted effect sizes were coded as r original = .60and r replication = -.01.

Coded Variables
An overview of all variables included in the database is provided in Table 2.  Success: Original and replication effect were both significant or both non-significant and effect sizes were in the same direction (if applicable).
Informative failure to replicate: The condition for success is not met.This can be due to the effect being in the same direction but not significant (e.g., due to a lack of precision in the measurements), a significant effect in the opposite direction, or a null effect.Practical failure to replicate: Reporting beyond significance testing indicated that reasons other than effect sizes led to the replication study not being interpretable (e.g., the target sample size was not reached, the study had to be discontinued).
Inconclusive: Reporting beyond significance testing indicated that the result is unclear (e.g., there were multiple tests, and some were successful and some were not, the hypothesis is not sufficiently specific).Mixed [only on aggregated levels and auto-coded]: When all replication findings for one original result are considered, results were not the same for all attempts.

no closeness_ measures
Closeness between the original study and replication study regarding measures.See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

VARIABLE DESCRIPTION EXAMPLE VALUES AND NOTES MANDATORY? closeness_stimuli
Closeness between the original study and replication study regarding stimuli.See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

no closeness_ procedure
Closeness between the original study and replication study regarding the procedure.See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

no closeness_location
Closeness between the original study and replication study regarding the location where the study was conducted (e.g., city-country-continent, lab vs. field).See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

no closeness_ renumeration
Closeness between the original study and replication study regarding remuneration (e.g., payment, feedback on personal data such as IQ values, course credit).See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

no closeness_ language
Closeness between the original study and replication study regarding language.See also Replication Recipe; 1 = exact, 2 = close, 3 = different, 4 = does not apply, 5 = unknown.

no differences
Specification of all differences between the original study and the replication written in bullet points or plain text. "

ci.lower_ replication
Lower confidence interval for the standardized effect size (replication effect), automatically computed."-0.0564059" no

ci.upper_ replication
Upper confidence interval for the standardized effect size (replication effect), automatically computed.

TIME OF DATA COLLECTION
The database as of October 2023 contains results from original studies that have been published between 1935 (Stroop, 1935) and 2023 (e.g., Röseler, Doetsch, et al., 2023).Like in most meta-analytical datasets, data collection times for the included studies are mostly unknown and only publication years are provided.Collection of meta-data is ongoing and will continue for the foreseeable future (e.g., via hackathons and workshops at conferences, collaborations with largescale projects, and literature alerts).After collecting the currently hosted data, aggregating and formatting of the datasets began in May 2022 using the Open Science Framework Registries webpage (https://www.osf.io/registries).

LOCATION OF DATA COLLECTION
Worldwide/asynchronously/remote.

SAMPLING, SAMPLE AND DATA COLLECTION
The presented dataset represents the Replication Database dated 16 th October 2023 and consists of multiple sub-datasets and individual replications.Historically, the basis was formed by an aggregation of data from OSF's registries (Röseler et al., 2022) and replications conducted by Feldman and colleagues ("Collaborative Open-science and meta REsearch, CORE", CORE Team, 2024).We then added large-scale projects, such as data from the Reproducibility Project Psychology (Open Science Collaboration, 2015) and others.All further entries that we had to code manually were labeled as individual submissions.These include data from CurateScience.org or specific journal issues dedicated to replications.We issued a call for results (https://osf.io/v4xjk)via 14 channels (i.e., conferences, social networks, and mailing lists) in March and April 2023 (for an overview see https://osf.io/d5r7c).Since then, project leads and research assistants have been manually coding studies from further lists, databases, and literature searches.We have also been reaching out to large-scale replication projects and asked them to help add their data.In late 2023, the Replication Database and the Framework for Open and Reproducible Research Training (FORRT) Replications and Reversals project joined forces, with a merging of the two databases taking place until late 2024.In parallel, we have been validating entries submitted by other researchers.An overview of data sources and distributions of the original publications throughout the years is provided in Tables 3-5 and Figures 3-4.Dataset descriptions and plots were created with R version 4.3.2(R Core Team, 2018) and the packages ggplot2 (Wickham, 2016), openxlsx (Schauberger & Walker, 2021), and psych (Revelle, 2024).Code to reproduce the results is available online (https://osf.io/j8qav).
In total, there are 1,239 entries (i.e., pairs of original and replication effects).Note that effect sizes and sample sizes could not be coded for 201 cases.The entries stem from 336 independent original studies and 468 independent replication findings.With independent, we refer to non-overlapping samples.For example, research articles reported results from up to 80 independent studies (see also Table 3 for a summary).
Replication outcomes were taken from the reported replications in the OSF registries, coded from author statements, and computed from reported effect sizes in some cases.Most findings were informative failures to replicate (k = 641) followed by successes (k = 447).Assessments could not be made for k = 133 findings, k = 15 were inconclusive, and k = 3 entries were practical failures to replicate (see also Table 4 for definitions of outcomes).

OUTCOME NUMBER OF ENTRIES DEFINITION OF OUTCOME
Inconclusive 15 Reporting beyond significance testing indicated that the result is unclear (e.g., there were multiple tests, and some were successful and some were not, the hypothesis is not sufficiently specific).
Informative Failure to Replicate

641
The condition for success is not met.This can be due to the effect being in the same direction but not significant (e.g., due to a lack of precision in the measurements), a significant effect in the opposite direction, or a null effect.
Practical Failure to Replicate 3 Reporting beyond significance testing indicated that reasons other than effect sizes led to the replication study not being interpretable (e.g., the target sample size was not reached, the study had to be discontinued).

Success 447
Original and replication effect were both significant or both non-significant and effect sizes were in the same direction (if applicable).

Not Available 133
No assessment of outcome has been coded (e.g., due to missing original or replication effect size or sample sizes or no clear evaluation in the replication report).
Data from the original projects (e.g., Open Science Collaboration, 2015) have been reformatted.In some cases, effect sizes have been standardized, and most references have been added (original materials mostly included short references without DOIs, only author names, or references in formats other than APA).Further, we added variables such as journals that published the original findings, 95% confidence intervals for original and replication effect sizes, outcomes, and replication study power.An overview of the number of effect sizes by source is provided in Table 5.
On average, replication effect sizes were smaller than original effect sizes.Replication effect sizes divided by original effect sizes (k = 1,050, M = 0.52, SD = 0.98, Min = -6.9,Max = 22.82, Md = 0.387; excluding cases with original effect sizes of 0). Figure 3 provides a scatterplot of original and replication effect sizes in the style of Open Science Collaboration ( 2015).An interactive version with an up-to-date dataset is available online (https://forrtreplications.shinyapps.io/fred_explorer).The distribution of relative effect sizes is displayed in Figure 4.

QUALITY CONTROL Validation for Individual Submissions
As a collaborative community effort from the contributors, all mandatory fields (Table 2) were systematically verified by one person per entry (listed in the variable validated_person).These seven contributors were students fulfilling course credits or research assistants.They were acquainted with statistical methods (e.g., effect sizes and null hypothesis significance testing) and used standardized instructions (https://osf.io/y3fm8).For example, they tested hyperlinks, and assessed descriptions and keywords for plausibility.The attribution of effects to one or multiple samples and the accurate naming of Sample IDs were also examined.
The person indicated by the variable validated_person scrutinized both the original and replication papers to ensure the congruence of reported sample sizes with the submitted information.Special focus was placed on the accuracy of sample sizes with regard to the removal of participants.Additionally, the effect sizes and their types were individually examined in both the original and replication papers.In case of uncertainties encountered during these steps, we contacted the contributor of the results for further clarification and LR was informed of the potential problems.

Validation for Batch Submissions
With batch submissions, we refer to submissions of many findings at once, such is the case for large-scale projects (e.g., Open Science Collaboration, 2015).In these cases, the original dataset was converted and entered in the database.For each batch of submissions, a project team member checked whether the entries regarding effect sizes, sample sizes, and references in the Replication Database aligned with those of the submitted studies.This work was again done by research assistants or the project lead.In some cases, the original authors of large datasets validated the entries or converted the data.

Dealing with Inconsistencies
In cases of inconsistencies, we corrected values to match the source material.We identified an error in a replication report, confirmed it with the author(s), and commented on PubPeer.If authors were unreachable, we relied on the original or replication reports.After other researchers flagged two errors in the CurateScience data (LeBel et al., 2018), we revalidated all CurateScience entries by comparing effect and sample sizes directly with the original reports rather than the database.For future errors in our database, we encourage researchers to submit a comment to this article via PubPeer (https://pubpeer.com).

Limitations
Several limitations arise due to the large size of the database, limited resources, collaborative data collection, and ongoing discussions about replication methodology.
• Deprecation of entries: Variables such as publication status may change over time from "pre-print" to "journal article".Although we ask all contributors to let us know if variables change, there is currently no procedure to guarantee that this variable is up to date.• Outcome variables: There are numerous ways to measure replication outcomes with regard to the original study's findings.Effect sizes or relative effect sizes are the most fine-grained way to code outcomes while also being able to compare them but some researchers or practitioners may prefer categorical values such as success or failure.Although the database includes the evaluations suggested by Brandt et al. (2014), the current coding scheme is inconsistent as some entries were taken from what replicators coded in the OSF registries when publishing result reports using the Replication Recipe postcompletion template (Brandt et al., 2014) and some were computed based on the entered effect sizes or were filled out by contributors of the findings who Note.X-axis was truncated for readability and some relative replication effect sizes are not visible.The dashed line represents the median of 0.387, k = 1,050.The solid line represents 1, that is, the relative effect size that results from both effects being the same.
Cases where the original effect size was zero were removed due to the ratio yielding infinity.
would otherwise not have categorized the replication attempt using these labels.Note that more objective classifications such as suggested by LeBel et al. (2019) can be computed based on the present values (e.g., signal vs. no signal, direction).• Replication closeness: As described above, replication closeness is difficult to measure, hard to validate, and should be used with caution.Currently, coding replication closeness is optional, which is why it is also missing for a large proportion of entries.• Ignorance of nested designs: Although commensurability of different effect sizes is statistically possible through conversion, caution should be exercised when interpreting effects from between-subject designs compared to those from within-subject or nested designs as estimates such as significance level or power will be skewed.Note, however, that the design has been coded and cases can be filtered for it.
• Quality control: Due to crowdsourcing and limited resources, the dataset is likely to contain errors.
In the trade-off between comprehensiveness and correctness, we strive for the former to maximize visibility and findability of replications.For better or worse, researchers can easily go from our database to the original reports.Data from large scale projects was only compared to their data as not every single study could be checked.Checks do not include reproductions of analyses but only comparison of values.In many cases, we noticed discrepancies between entered sample sizes and degrees of freedom from the respective tests as researchers entered the total sample size and not the sample size used for the respective tests.For individual submissions, we reached out to the contributors and could resolve all inconsistencies.• Coding of samples, studies, dependent variables, and items: Entries are coded so that dependent samples (i.e., samples that belong to the same replication study but were used to replicate different original findings) and study numbers from original and replication findings can be identified.However, there is no standardized procedure to code hypotheses, dependent variables, or items.These are usually collapsed in the description but future research or a revision of the database may benefit from a more differentiated coding procedure.

DATA ANONYMIZATION AND ETHICAL ISSUES
Because all entries concern scientific contributions such as research articles or datasets, we did not anonymize the data.

EXISTING USE OF DATA
Subsets of the data (e.g., data from Many Labs) or aggregated versions have been used for meta-research (e.g., Sotola, 2023).At the time of publication, we are aware of two projects that have used the entire database.
- (3) DATASET DESCRIPTION AND ACCESS The datasets and materials are openly available in the OSF repository (https://osf.io/9r62x/)and will be updated continuously as the database grows.

DATA TYPE
Secondary data, processed data, aggregated data.

FORMAT NAMES AND VERSIONS
Datasets are available in .csvand .xlsxformats.

LICENSE
CC-By Attribution 4.0 International.

LIMITS TO SHARING
The data is not under embargo.It contains the names of researchers who conducted original studies and replication studies (i.e., references) and the names of researchers who contributed to the dataset.The data may be updated with further replication findings and we plan to maintain and extend the Shiny Application for several more years.
Please cite this article and along with it the most recent version of the OSF-project (https://osf.io/9r62x)that includes a version number and contributors who joined the project since 04/2023.

PUBLICATION DATE
An initial version of the dataset has been shared on 22/01/2023, on the Open Science Framework (OSF; https://osf.io/2a3gb).The reported results are based on the version from 16/10/2023.

(4) REUSE POTENTIAL
We encourage others to use the Replication Database for their research or for educational purposes, add their replication findings to the database, or merge it with other existing databases.We suggest using it for a wide variety of different purposes.
• Increase findability of replications: Researchers, teachers, policy-makers, and professionals often rely on scientific evidence.With the database, they can easily and quickly get an overview of the potential robustness, generalizability, and heterogeneity in effects.• Summarize replication efforts by area: The dataset can be used to summarize the robustness of findings by disciplines, research areas, phenomena, journals, time of publication, or researchers.This way, researchers can identify areas where replications are common or uncommon, which may aid in planning replication attempts, monitoring replication affinity, or determining directions of future research.For example, if for a phenomenon, some replications are successful and others are not, they can be compared and reveal potentially relevant background variables.
• Inclusion in traditional meta-analyses: With metaanalyses often struggling to include unpublished findings, replications, and null-findings, we believe that the Replication Database as a low-threshold opportunity to publish replication attempts can help researchers find studies that they can include in their meta-analyses and that may correct for the publication bias.

• Validation data for bias-correction methods:
Methods that predict replication rates or correct meta-analytical effect sizes for publication bias and questionable research practices are often evaluated using simulated data (e.g., Carter et al., 2019) and validations with existing data need to rely on few and scattered large-scale projects (e.g., Sotola & Credé, 2022).With the replication database, these proposed methods can easily be tested against a large set of real data.In turn, the dataset can inform simulation studies about characteristics of replication studies from different research areas in psychology.• Inform replication guidelines: With replication guidelines still being developed, we believe that the Replication Database can support the development of evidence-based replication guidelines and evaluation protocols.For example, if certain features of replication studies affect replication outcomes positively (e.g., preregistration of the study's methods and analysis plan), recommendations to preregister replication studies can rest on this evidence.• Teaching: At the moment, textbooks and teaching materials are highly likely to include findings that could not be replicated.In the past, problems regarding findability of replication attempts made it difficult to provide a more nuanced discussion.The Replication Database can help researchers revising these materials and including more recent findings for the discussed phenomena or theories via a reference list annotation tool.This way, references can be read and annotated with respect to replication attempts (e.g., if there have been any replication attempts and what their outcomes were).Moreover, instead of relying on singular findings, teachers and lecturers can for example ask students to examine replications, compare them with the original findings, and thereby help them develop skills to critically evaluate bodies of research.
Finally, replication studies have become an integral part of undergraduate research (Boyce et al., 2023;Jekel et al., 2020;Korell et al., 2023;Quintana, 2021).The database provides a low-threshold opportunity to make student replications visible.
We invite researchers to join our effort to make replications in psychological science and beyond transparent in a systematic manner.
Note.Contributors of database entries received the CRediT role "Resources".Contributors coding variables or converting values were assigned the CRediT role "Data Curation".
Please note that to write a static report, the up-todate database is necessarily larger and mistakes in the present version have been corrected for more recent versions.We took some of the text for this manuscript from our previous dataset publication at the JOPD (https:// openpsychologydata.metajnl.com/articles/10.5334/jopd.67).

Figure 1
Figure 1 Replication Tracker and example functions.

Figure 2
Figure 2 Multilevel structure of the Replication database using fictitious data.Note.OSF: Open Science Framework.

"
which the original effect size can be found in the publication of the replication study.no p_n_orig Page number on which the original sample size can be found in the publication of the original study.no p_n_rep Page number on which the original sample size can be found in the publication of the replication study.no result Result of the respective replication test.

power
Replication study power based on replication N and original effect size converted to r, automatically computed."0.358" no orig_journal Journal that published the original findings."Scientific Reports" no

Figure 3
Figure 3Original and replication effect size by significance of replication effect and power of the replication study.Note.k = 1,051 pairs of original and replication effect sizes converted to correlation coefficients.Some code for the plot was taken from Open Science Collaboration (2015).Power: Statistical power of the replication study given the replication sample size and the original effect size.P-value of the replication study was estimated based on converted effect sizes and may be skewed for nested designs (α = 5%).Points on the diagonal solid line reflect cases where replication effect size = original effect size.Points on the horizontal dashed line represent replication effect sizes close to 0.

Figure 4
Figure 4 Histogram of relative replication effect sizes.

Table 1
Conversion of standardized effect sizes.

Table 3
Description of entries from the Replication Database.