Preregistration specificity and adherence: A review of preregistered gambling studies and cross-disciplinary comparison

,


Introduction
A preregistration is a time-stamped, immutable document posted on an online repository that outlines the details of a proposed research study, including the hypotheses, methods, outcomes of interest, and data analysis plan.Historically, preregistration has been used primarily for randomised control trials (RCTs) and later for systematic reviews and meta-analyses (Dickersin & Rennie, 2003;Stewart et al., 2012).More recently, researchers performing other forms of quantitative and qualitative studies (Haven & Van Grootel, 2019) have begun to adopt this practice.The number of researchers preregistering these types of studies is increasing yearon-year (Kuperschmidt, 2018), with 17,000 new preregistrations posted on the online repository Open Sci-ence Framework (OSF) in 2020 alone (Centre for Open Science, 2020).This trend has been largely prompted by concerns regarding the replicability and reproducibility of the extant literature (Allen & Mehler, 2019;Simmons et al., 2011), and preregistration is one of several practices (e.g., open data, preprints) that researchers are using to improve the transparency and rigour of their research as part of the open science movement.
Proponents of study preregistration have advanced three overlapping and mutually compatible perspectives regarding its value.First, preregistration increases transparency (Nosek et al., 2018).Transparency in the research process has multiple benefits, such as improving the ability to detect questionable research practices (QRPs; e.g., selective outcome reporting) and publication bias (Munafò et al., 2017;Norris et al., 2012), and enabling the differentiation of planned, a priori analyses from unplanned, post hoc analyses (Nosek et al., 2019).
Second, preregistration assists with reducing Researcher Degrees of Freedom (RDoF)-that is, the many methodological choices available to researchers when collecting, analysing, and reporting their findings (Bakker et al., 2020;Wicherts et al., 2016).Reducing RDoF can be important as the freedom to make datacontingent decisions during the research process (e.g., when deciding which inference criteria to use or how to deal with outliers) can inflate the risk of finding falsepositive results or Type-I errors (Wicherts et al., 2016), which when done strategically is known as p-hacking or asterisk hunting (Head et al., 2015).Third, Lakens (2019, p.1) argues that preregistration is valuable as it allows for "others to transparently evaluate the capacity of a test to falsify a prediction".The degree to which a test is capable of falsifying a prediction is termed its "severity" and, as Lakens discusses, more severe tests are more impressive and indicative of a solid theoretical underpinning1 .Several QRPs can reduce the severity of tests by reducing the likelihood of researchers being able to falsify their hypothesis, including optional stopping (i.e., continuously checking and analysing data during the collection phase and only stopping when a statistically significant result is observed) and HARKing (Hypothesising after the results are known).Thus, readers can better evaluate the severity of tests reported in preregistered compared to non-preregistered studies as these QRPs can be more easily detected in the former (Lakens, 2019).
While we focus on just three possible benefits of preregistration above, many others have been proposed.For example, Rubin (2020) lists 17 potential benefits-including tangential or secondary benefits such as reducing publication bias and increased reporting of null results.However, Rubin questions whether some of these benefits may be equally or better achieved by other means (e.g., transparent reporting at publication).Further, several arguments against study preregistration have been made, including that the practice is timeconsuming (for authors to write and reviewers to evaluate), it discourages exploratory research or serendipitous findings, and pre-specified study plans can be flawed (Pham & Oh, 2021).A comprehensive discussion of the potential advantages and disadvantages of preregistration is beyond the scope of this article, so we refer interested readers to the article by Simmons et al. (2021).
Despite ongoing debate, the limited available evidence supports the value of study preregistration.Preregistration of RCTs has aided in the detection of outcome switching (Chen et al., 2019;Vassar et al., 2020) and appears to have resulted in an increase in published null findings, indicating a potential reduction in the likelihood of statistical false-positives (Kaplan & Irvin, 2015).Preliminary evidence shows that the effect sizes reported in preregistered studies in the psychology literature are considerably smaller than non-registered studies, suggesting the latter contain effects that are inflated by QRPs and publication bias (Schäfer & Schwarz, 2019).Yet, the value of preregistration is limited by at least two factors.First is the degree to which preregistrations specifically describe all aspects of the planned study.If key study details like hypotheses, primary outcomes, sampling procedures, and analysis plans are not clearly and comprehensively specified, then the many benefits of preregistration listed above fail to materialize.Second is the extent to which researchers actually follow (i.e., adhere to) their pre-specified plans and declare any deviations (Nature Human Behaviour, 2020).If, post-preregistration, a researcher changes their criterion for outlier removal or the cut-off score used to divide groups and fails to declare such deviations, the anticipated benefits of the practice (i.e., restricted RDoF and, to some extent, transparency) are again diminished.
To date, three studies have evaluated modern study preregistration practices according to the specificity of the research plans and the degree to which the researchers adhered to them.Bakker et al. (2020) examined the specificity of preregistrations registered on OSF (osf.io) during 2016 that used either structured or unstructured templates.The authors adapted the list of RDoFs developed by Wicherts et al. (2016) to create a scoring protocol that assessed the extent to which the preregistration restricted each RDoF (e.g., "Deciding on how to deal with outliers in an ad hoc manner") by being "specific" (i.e., all phases of the research process are described), "precise" (i.e., descriptions of the research plan can only be interpreted in one way), and "exhaustive" (i.e., explicit acknowledgment that the plan will not be deviated from).We use the term specificity here as shorthand for these three principles.Bakker and col-leagues found that specificity was higher in the sample using a structured template but was relatively low for both samples, particularly regarding the selection of measured variables and covariates.Claesen et al. (2019) investigated 16 articles published in Psychological Science between 2015 and 2016 with 27 corresponding preregistrations (some articles contained multiple, separately preregistered studies).They assessed whether the authors of these publications adhered to their preregistrations in eight areas (e.g., exclusion criteria, statistical model), finding that 26 studies (96%) included at least one deviation that was not declared.Only one study disclosed all deviations and all studies deviated from their preregistration in one of the eight areas.Ofosu and Posner (2019) evaluated 195 preregistrations from the economics and political science fields registered between 2011 and 2016.Only 49.7% of the sample was judged to contain sufficiently detailed descriptions of the four key areas they deemed necessary for a complete preregistration (i.e., hypotheses, primary dependent variables, treatments or independent variables, and the statistical model[s] to be tested).Of the 95 preregistrations with a corresponding published article, more than a third failed to include at least one preregistered hypothesis and 18% presented tests of unregistered hypotheses.
Collectively, the findings from Bakker et al. (2020) and Claesen et al. (2019), and Ofosu and Posner (2019) highlight a need to continue to examine preregistration practices and how they can be improved to maximise the potential scientific benefits of preregistration.In the present study, we aimed to better understand the strengths and weaknesses of more recent preregistration practices (i.e., 2017 onwards).We did this by simultaneously determining the degree to which researchers studying gambling sufficiently specify all aspects of their studies in preregistrations and the extent to which they adhere to their pre-specified plans.Preregistered studies from the gambling field were selected as the sample for two reasons.First, most of our research team are experienced gambling researchers, which uniquely positioned us to determine whether all relevant details were specified when describing the use of field-specific measures and datasets (e.g., online gambling account data).Second, the gambling field is fraught with concerns regarding impartiality and QRPs due to the frequent involvement of the gambling industry in funding and supporting research (Livingstone & Cassidy, 2014), and open science practices such as preregistration have been proposed as a strategy to combat the risk of bias when undertaking industry-supported research (Louderback et al., 2020).Accordingly, we aimed to understand how effectively gambling researchers are currently preregistering their studies by comparing their preregistration specificity scores (according to Bakker et al., 2020 scoring protocol) with the specificity scores recorded for the randomly selected, cross-disciplinary preregistrations in Bakker and colleagues' study.As the discussion of open science principles and practices in the gambling field has been limited until recently (Blaszczynski & Gainsbury, 2019;Heirene, 2020;Heirene & Gainsbury, 2020;LaPlante, 2019;Louderback et al., 2022;Wohl et al., 2019), we hypothesized that preregistrations of gambling-focused research studies would have lower specificity levels (i.e., have lower scores on the RDoF scoring protocol) than the cross-disciplinary sample.

Methods
The hypotheses, methods, and analysis plan for this study were preregistered on OSF (https://osf.io/3jy6q).Unless otherwise stated, we adhered to the methods outlined in our preregistration.We have included a "Deviations from preregistration" subsection later in the methods section outlining any major deviations from our preregistration and minor deviations are presented in footnotes.The study data, analysis scripts, and materials, including details of transparent changes, can all be accessed on our OSF page (our project's OSF Wiki lists and describes all documents related to this study).

Search and Selection Process
Our complete process of searching for and selecting registrations is presented in Figure 1.We searched the OSF repository (www.osf.io) on three occasions throughout 2020 for preregistrations of gambling studies by searching the terms "gambling", "wagering", and "betting" separately.To be included, a preregistration had to: • outline the plan for a study that was primarily focused on a gambling-related concept or concepts; • be written in English; • report at least one hypothesis; • not be for a review and/or meta-analytic study as these studies involve unique forms of RDoF and risks of bias that would require a separate scoring system (e.g., PRISMA-P; Moher et al., 2015); • not be for a clinical trial (defined according to the National Institute of Health as the prospective placement of participants to an experimental condition using randomisation methods and testing the effects of an intervention) as these also involve unique forms of RDoF and risks of bias that would require a separate scoring system2 (e.g., CONSORT; Schulz et al., 2010) OSF searches and the selection of preregistrations were performed by BK.RH checked 20% of included and excluded registrations for the accuracy of the selection process according to the above eligibility criteria and agreed with all original selection decisions.In our preregistration we stated that a second researcher would only check 10% of included and excluded registrations but we decided to review a larger sample of selections to ensure the accuracy of the process.

Sample Size Determination
To compare our sample with the 52 cross-disciplinary preregistrations analyzed by Bakker et al. (2020) and thereby test our hypothesis, we aimed to include a minimum of 53 gambling study preregistrations.This was based on an a priori power analysis conducted using G*Power V3.1.9.4 for a one-tailed, normal parent distribution, Wilcoxon-Mann-Whitney test with power of 0.80, an effect size of 0.5 (d, which corresponds to a Cliff's D (the effect size we report) of 0.33), and alpha (α) at 0.05, which estimated that 53 preregistrations per group (N = 106 overall; 53 gambling and 53 crossdisciplinary studies) would be required.Bakker et al. originally selected 53 preregistrations for evaluation but had to remove one as it was withdrawn from OSF.Our effect size was based on Bakker and colleagues' suggestion that a medium effect (d = 0.5) is indicative of a practically meaningful difference between two samples of preregistrations.As Bakker et al.only included 52 preregistrations, we conducted a post-hoc power analysis to determine our obtained power using the specifications for our a priori calculation.Our actual power was 0.79, slightly less than the desired value of 0.80 (our a priori and post-hoc power analysis protocols can be found on OSF: https://osf.io/dnfqa).
We needed to conduct three separate searches of the OSF repository between March and October 2020 in order to identify 53 preregistrations meeting our criteria (see Figure 1).We did not summarize or analyze the data until all 53 preregistrations were identified and coded by two researchers.Although there were 55 gambling preregistrations meeting our eligibility criteria available on OSF at the time of our third and final search (see Figure 1), we restricted our sample size to the number provided by our a priori power analysis.

Sample Description
The characteristics of our sample are presented in Table 1 alongside the characteristics of the 52 crossdisciplinary preregistrations evaluated by Bakker et al. (2020) for comparison.None of the preregistrations were for Registered Reports.We extracted the design and overarching research question for preregistered gambling studies (https://osf.io/ad6wj).Overall, experimental studies were most common (N = 24), followed by cross-sectional (N = 18), cohort (N = 5), and longitudinal survey studies (N = 5; we were unable to identify the design for one preregistered study).Examples of research questions included: "How did the COVID-19 lockdowns affect gambling participation in Australia?","Does the presence of unclaimed prize information result in a greater urge to gamble along with higher frequencies of scratch card play?", and "What is the role of debt stress in the relationship between gambling participation and mental health comorbidities?".The data for the cross-disciplinary sample studied by Bakker et al. were accessed from the authors' OSF page3 .All of these preregistrations were posted on OSF as part of the "Preregistration Challenge" (or "Prereg Challenge"), a competition held between 2015 and 2018 by the Centre for Open Science.The competition aimed to increase researchers' experience with preregistration and required participants to use a highly structured template to preregister their studies (a cash prize of $1,000 was awarded to all researchers who preregistered their studies using this template and published their findings in an eligible journal).The template asked researchers 26 questions about their planned study, including the research questions, hypotheses, sampling plan, variables, design, and analysis plan.This template remains available on OSF as the "OSF Preregistration" format (the form can be accessed here).Bakker et al. (2020) labeled the Prereg Challenge template as a "structured format", compared to the "Standard Pre-Data Collection" template which they labelled an "unstructured format" as it only contains two questions that ask authors whether they have begun data collection and whether they have looked at data.We compared our sample with Bakker et al.'s structured format preregistrations instead of their unstructured sample as our preliminary scans of OSF indicated that the OSF Preregistration format was most commonly used by gambling researchers.This template was the most frequently used format in our final sample (Table 1).There was no overlap between the two samples.

R6
Presenting exploratory analyses as confirmatory (HARKing) Same as RDoFs T1 (Question 1) and D4 (Question 7) Note: Specificity questions are summarized here for space purposes.The scoring protocol containing all full questions can be found on our OSF page: https://osf.io/a34u7.

Scoring Preregistration Specificity
We used Bakker et al.'s (2020) scoring protocol to evaluate the specificity of preregistrations.This protocol contains 23 questions4 which provide scores for 29 RDoFs from Wicherts et al. (2016) checklist (all 29 RD-oFs and the associated preregistration specificity scoring questions are presented in Table 2).Specificity scores represent the extent to which the preregistration restricts potential RDoFs arising during the research process.Specificity scores range between 0 and 3: • 0 = Not specified: opportunistic use of RDoF not restricted at all.
• 1 = Some specification but lacking details: opportunistic use of RDoF is restricted to some extent5 .
• 2 = Detailed specification: opportunistic use of RDoF is completely restricted, but no explicit statement confirming that authors will not deviate from this plan by adding additional methods/processes.
• 3 = Detailed specification and statement that authors will not deviate from their plan by adding additional methods/processes: opportunistic use of RDoF is completely restricted.For example, in a recent preregistration written by two of the present authors, we outlined the reasons why a participant's data may be excluded from analyses before stating: "Individuals will not be excluded from analyses for reasons other than those stated here." • N.A. = RDoF item not relevant to preregistration.
Like Bakker et al. (2020) and Ofosu and Posner (2019), we counted the number of hypotheses proposed in each preregistration.Further, given concerns regarding many gambling researchers' potential conflicts of interest due to their connections with industry and/or government, we also scored preregistrations on whether relevant disclosures were reported.We used the journal International Gambling Studies' (IGS) three-factor disclosure framework to structure our assessment.IGS' framework requires authors to disclose [1] funding sources for the work, [2] any competing interests, and [3] any constraints on publishing the findings made by funders or stakeholders.We scored preregistrations on each of the three factors as either 0 (no mention) or 3 (relevant disclosure reported).
During the scoring process, we found it necessary to add our own "decision rules" to Bakker and colleagues' protocol that helped inform how we scored questions and enhanced our consistency across preregistrations.
For example, in order to obtain a score of two or higher on question 10 (corresponding to RDoFs D7 and C4), researchers need to specify various details of the sampling plan, including "how many and how additional participants or data points are sampled when pre-set sample size is not reached?";however, many of the studies preregistered in our sample involved online convenience sampling with minimal criteria for eligibility and, as a result, these researchers had almost total control over the number of participants they recruited.Therefore, not reaching their pre-set sample size was not a concern for them and an associated plan did not need to be prespecified.As such, we developed a decision rule that stipulated that preregistrations of these studies could score ≥ 2 for question 10, provided they had specified all other required details of their sampling plan.Our full scoring protocol, including these decision rules, is shared on OSF and the original protocol by Bakker et al. can be accessed on their OSF page.

Scoring Preregistration Adherence
We developed a protocol for scoring gambling researchers' adherence to their preregistrations with 32 questions-29 corresponding to the 29 RDoFs and three corresponding to disclosures (i.e., funding, conflicts of interest, and constraints on publishing).For example, for RDoF A1 ("Choosing between different options of dealing with incomplete or missing data on ad hoc grounds") we asked: "Are the procedures used to deal with missing data consistent with those reported in the preregistration?".Our full adherence scoring protocol is available on OSF and summarized versions of the questions are outlined in Table 5.There were eight possible responses to each question: • 0 = Yes, consistent with preregistration-no deviation.
• 1 = No, deviation from preregistration made and declared by the authors and a justification for change is provided.
• 2 = No, deviation from preregistration made and declared, but no justification for deviation is provided.
• 3 = No, deviation made and not declared or justified by the authors.

11
• U = Unable to determine due to lack of detail reported in: [1] the preregistration (scored as UP) (e.g., randomization procedure not reported in preregistration but used in study), [2] the article (scored as UA) (e.g., randomization procedure described in preregistration but not in the article), or [3] both (scored as UB) (e.g., randomization is used but is not specified in either the preregistration or article).

Scoring Risk of Bias in Reporting
As we scored all articles for adherence according to the 29 RDoFs proposed by Wicherts et al. (2016), we decided (post-preregistration of the present study) to provide further information about the quality of the preregistered study articles by assessing them according to the remaining six RDoFs proposed by Wicherts et al. relating to the risk of bias in reporting.For example, for RDoF R3 ("Failing to mention, misrepresenting, or misidentifying the study preregistration") we asked: "Is the preregistration clearly mentioned and linked/signposted to in the article and easily accessible (e.g., not embargoed)?".We developed seven questions to cover these RDoFs (see Table 6) and appended them to our adherence scoring protocol; all were scored as "1" (yes) or "2" (no).These items were separate from the preregistration scoring RDoFs and were only used to assess articles.

Scoring Procedure
Two researchers (RH and either BK or AS) independently coded each preregistration and associated article6 using the scoring protocols outlined above, before convening to discuss any inconsistencies and to agree on final scores.Coders documented their scores in two separate "scoring frameworks" (Microsoft Excel files).All disagreements were resolved by the two coding pairs without the need to consult a third team member.No researcher was involved in coding their own preregistered study, and the scores of preregistrations authored by one or more of our research team (N = 17) were checked by an external researcher for accuracy.
In our preregistration, we stated that we would pilot code 10% of our sample.There were 33 preregistrations in our sample after the first OSF search, and so we selected four of those with associated articles for pilot coding.After independently coding these, the level of inter-coder reliability achieved for specificity and adherence scores was quantified using Krippendorff's alpha (κ).We used the R package "irr" (Gamer et al., 2019) to calculate κ (analysis script shared on OSF: https://osf.io/67x8k).We achieved a level of intercoder consistency of κ = 0.859 (2 raters, 104 items) for specificity scores and κ = 0.809 (2 raters, 156 items)7 for adherence scores.As we achieved our pre-specified minimum level of consistency (i.e., κ ≥ 0.7), we proceeded to score the remainder of the sample.The master scoring framework containing the final, agreed-upon scores used to compute the findings presented here can be accessed on OSF (https://osf.io/b7cyu).The time required to score preregistrations and associated articles was considerable-approximately 1 hour for specificity scoring, 1.5 hours for adherence scoring, and 15 minutes for scoring risk of bias in study reporting per researcher.

Data Analysis
All data analyses were performed using R version 4.0.2(R Core Team, 2020).We have shared all of the analysis scripts used for this study on OSF, along with an HTML document presenting the annotated analysis code (and associated outputs) used to preprocess the data and compute all of the results presented here.
We summarized specificity scores by computing the arithmetic mean, standard deviation (SD), and median values for each RDoF and overall (i.e., mean scores on all items were summed and divided by the total number of items [N = 29]).For adherence and risk of bias in reporting scores, we simply tallied the number of each type of response for every question.To compare gambling and cross-disciplinary preregistration specificity scores, we employed 30 Wilcoxon-Mann-Whitney (Wilcoxon) tests (29 RDoF specificity scores & 1 overall score).The decision to use non-parametric Wilcoxon tests is consistent with the strategy used by Bakker et al. (2020) and did not require data to be normally distributed (scores were right-skewed; see Figure 2).As NA scores were common, particularly for some items (i.e., RDoFs D1, C1, C2, A2, A8, A9, & A11; see Table 3), we used the same method of dealing with missing values employed by Bakker and colleagues.This involved a two-way imputation procedure based on corresponding row and column means, performed using the following calculation: i + j − OM for missing observation (i, j), where i is the mean for the item (e.g., RDoF 1), j is the mean score for the preregistration across items, and OM is the mean for all observed items (see Bernaards and Sijtsma, 2000).
To minimize the false discovery rate (FDR), we used the Benjamini-Hochberg correction technique (Benjamini & Hochberg, 1995).This process involved multiplying all 30 p-values returned from our Wilcoxon tests by their rank after ordering them from largest to smallest (e.g., if our fifth largest p-value was 0.006, this would become: 0.006 × 5 = 0.03).In our preregistration, we stated that we would compare all original pvalues to their corresponding Benjamini-Hochberg "critical value"-calculated as: (i/m)Q, where i is the rank of the p-value (ordered from smallest to largest), m is the total number of tests undertaken, and Q is the FDR we selected (i.e., 0.05).However, instead we multiplied p-values by their rank to produce "corrected p-values" to make for easier interpretation of our findings in our summary table (Table 4).
To determine the magnitude of differences in specificity scores between the samples, we calculated Cliff's Delta (D) effect sizes (Cliff, 1993).When used as an effect size, D represents the extent to which two distributions of ordinal values overlap (Romano et al., 2006).D values range between -1 (all scores in Group 2 > all scores in Group 1) and 1 (all scores in Group 2 < all scores in Group 1), with 0 representing total overlap between samples.Romano and colleagues compared D values to benchmark values for effect sizes when using Cohen's d (Cohen, 1988), finding a d of 0.2 (small effect) corresponds to a D of approximately 0.147, a d of 0.5 (medium effect) corresponds to a D of approximately 0.33, and a d of 0.8 (large effect) corresponds to a D of approximately 0.474.

Deviations From Our Preregistration
We made a small number of deviations from our preregistered plan to best address the aims of the present study.We recorded all deviations and our reasoning for each in Transparent Changes Documents (hereafter "changes documents") that were uploaded to OSF.All major deviations are also reported here.
First, as described in our changes document, we decided to score specificity by providing a response for each of the 23 questions in Bakker et al. (2020) protocol and then later use these question responses to impute a score for each of the 29 RDoFs.This enabled us to provide a more detailed overview of preregistration specificity because of the dependencies present when scoring according to RDoFs.For example, RDoF A14 is "Choosing the estimation method, software package, and computation of SEs [standard errors]" and-when using Bakker et al.'s original protocol-a single speci-ficity score is assigned to this RDoF based on two questions with unique answers: 21a and 21b (see Table 2).Thus, we prevented the loss of granular information provided by paired questions like 21a and 21b.The outcomes for each question (as opposed to RDoF) are shared on OSF (https://osf.io/b7cyu).
Second, in our preregistration, we stated that we would perform a maximum of two search and selection processes and stop sampling after the second, regardless of whether we had identified 53 preregistrations (our pre-specified target).However, after the second search, we had identified 49 relevant preregistrations (see Figure 1), and as we were still coding these several months later (thus sufficient time had lapsed to ensure more gambling studies had been preregistered), we decided to undertake a third search to try and reach our desired sample size (see changes document 2).
Third, as stated in our changes document 3, we planned to calculate summary descriptive values (i.e., arithmetic mean and median) for adherence scores, but we agreed that the scores 1-3 represented qualitative categories that described whether and how authors deviated from their preregistration and not an ordinal scale from "no deviation" to "major deviation."Additionally, we added the option to assign "U" (unable to determine) scores (see changes document 1), meaning any summary values (e.g., means) would not have accounted for these categorical scores.
Finally, we initially hypothesized that gambling registrations would have consistently lower specificity scores than the cross-disciplinary sample and chose to use onetailed Wilcoxon tests; however, after performing the one-tailed tests as preregistered, it became clear that the direction of differences was not consistent, and therefore two-tailed tests were warranted to detect all differences between the samples.As such, we have recorded the outcomes from the one-tailed tests and reported these on OSF, but report two-tailed test outcomes here (see changes document 3).

Table 3
Preregistration Note: Specificity scores range between 0 and 3 (higher scores indicating greater specificity).See subsection 'Scoring preregistration specificity' for more details on the scoring protocol.All figures reported here were calculated using non-imputed specificity scores.Frequency counts for all RDoF item scores can be found in our analysis process document: https://osf.io/wqrn8.
Outcomes from the Wilcoxon tests comparing the groups' specificity scores are presented in Table 4. Gambling studies' preregistrations were significantly more likely to include hypotheses that clearly described the variables of interest (RDoF H1: medium effect size) and stated the direction of the hypothesized effect (RDoF H2: medium effect), potentially reducing the risk of HARKing (RDoF R6: small effect).
In relation to study design, gambling preregistrations contained significantly more specification of sampling plans (D7: large effect) than cross-disciplinary preregistrations and were more likely to explicitly exclude the possibility of studying additional dependent variables other than those preregistered (D4: small effect).Conversely, descriptions of manipulated variables were significantly more specific in cross-disciplinary preregistrations (D1: medium effect).
In relation to data collection procedures, gambling preregistrations were significantly more specific in their descriptions of blinding (C2: very large effect), data handling during collection (C3: small-medium effect), and when data collection will end (i.e., "stopping rules"; C4: large effect).
Gambling preregistrations were also significantly more specific in their descriptions of four (of 15) RDoFs relating to the analysis process, including data preparation when working with complex datasets requiring preprocessing (A2: very large effect), the process of measuring or scoring the primary dependent variable (A6: medium effect), excluding the possibility of studying additional dependent variables (A7: small effect), and the process of measuring or scoring non-manipulated independent variables (A11: large effect).Descriptions of how manipulated variables will be used in analyses, however, were significantly more specific in crossdisciplinary preregistrations (A8: medium-large effect).
Overall, the mean specificity score for the gambling sample (mean = 0.97, SD = 0.40, median = 0.83) was greater than for the cross-disciplinary sample (mean = 0.78, SD = 0.23, median = 0.81; medium-large effect), although this difference was not statistically significant after correcting for multiple testing with the Benjamini-Hochberg procedure.
Exploratory Analyses.We calculated the mean overall score per gambling study preregistration, grouped them by year of registration, and plotted them in Fig- ure 3A.The mean specificity score of preregistrations increased year on year from 2017 (median = 0.73), through 2018 (median = 0.78) and 2019 (median = 0.98), and then dropped slightly in 2020 (median = 0.86).
We also grouped the mean overall score by the template used and plotted this in Figure 3B.Open-ended preregistrations had the highest specificity score (median = 1.46), followed by those using the OSF preregistration template (formerly "Prereg Challenge"; median = 0.82), the template from AsPredicted.org(median = 0.83), and finally the OSF standard pre-data collection template (median = 0.59).However, 10 (91%) Openended preregistrations actually used the OSF preregistration template in a Word document format.Combining all preregistrations that used the OSF template in some form results in a median specificity score of 0.90.
The conspicuous outlier in both panels of Figure 3 (mean score = 2.64) was a preregistration written by the first and last authors before we conceived of this study and was developed specifically to achieve high scores on the RDoF scoring protocol developed by Bakker et al. (2020).Overall, the mean specificity score was higher for the 17 preregistrations written by one of the present authors (M = 1.27,SD = 0.47) compared to the rest of the sample (M = 0.83, SD = 0.26).
We performed Spearman's rank-order correlations between the aggregated scores for all RDoF categories (e.g., Data collection, analysis).Specificity scores in every domain were strongly and positively correlated with every other (see Figure 4).

Number of Hypotheses.
Many hypotheses reported in preregistrations could be interpreted as either single predictions or multiple independent but related predictions.For example, one hypothesis was: "We predict that participants will report a higher likelihood of winning, excitement, and urge to gamble as well as hypothetically purchase more scratch cards when scratch cards are presented with unclaimed prize information compared to when scratch cards are presented without unclaimed prize information (i.e., ticket remaining information and game number conditions)" which, while reported as a single hypothesis (no. 2 in a list of 4), contains four predictions that could be tested separately.The number of hypotheses therefore varied depending on whether all predictions reported as one hypothesis were assumed to be one hypothesis (M = 3.96, SD = 3.51, min = 1, max = 22) or multiple independent hypotheses (M = 6.4,SD = 7.54, min = 1, max = 44).Eleven (20.75%) preregistrations presented their hypotheses in this way.

Figure 2
Distribution of Specificity Scores for Gambling and Cross-Disciplinary Preregistrations.These density plots show the relative distribution of specificity scores given for each RDoF item for both samples of preregistrations (non-imputed scores used).* and # indicate statistically significant difference between samples: * = gambling preregistrations > cross-disciplinary; # = cross-disciplinary > gambling preregistrations (see Table 4 for test outcomes).Note: Scores of 1 were not possible for the following RDoFs: T1, T2, D1, D3, A2, A5, A8, and A9.Scores of 1 and 2 were not possible for the following RDoFs: D2, D4, A7, and A10.Also, while this figure displays the relative distribution of scores for each RDoF rather than exact frequency counts, the number of scores contributing to each density plot varies between the samples due to differences in the number of NA scores (see Table 3).

Figure 3
Gambling Preregistration Specificity Scores.Figure 3A shows each preregistration's mean overall specificity score, grouped by the year of registration.Figure 3B shows the same values but grouped by the template used to structure the preregistration.Both use non-imputed, original scores.

Figure 4
Correlation Matrix for Aggregated Specificity Scores.All Spearman's rank-order correlations were significant at the p < 0.05 level.Only gambling preregistrations were included.).D values can range between -1 (all gambling preregistrations score higher than all cross-disciplinary ones) to 1 (all cross-disciplinary preregistrations score higher than all gambling ones)

Table 5
Gambling researchers' adherence to their preregistrations: We answered all questions in relation to the confirmatory hypothesis tests.Undeclared deviations (i.e., scores of 3) are colored red for ease of detection.While we developed 29 questions for each of the 29 RDoFs (and 3 related to disclosures), due to dependencies in the RDoFs the same question was asked for 6 pairs of items (e.g., RDoFs D4 and C4) and so we removed all responses to duplicated questions before performing calculations to prevent weighting some questions more than others.UP = Unable to determine due to lack of specificity in preregistration.UA = Unable to determine due to lack of specificity in article.UB = Unable to determine due to lack of specificity in both the preregistration and article.
Reporting of Disclosures.Sixteen (30.2%) preregistrations included a funding disclosure, 10 (18.9%) reported a conflict of interest statement, and 9 (17.0%) reported whether there were any restrictions on publishing.However, almost every preregistration that included a disclosure was authored by one or more of the present team.After removing our preregistrations, only 1 (2.8%) of the remaining 36 included a funding disclosure, and none reported conflicts of interest statements or restrictions on publishing.
Adherence to Preregistrations We found 17 articles associated with 20 preregistrations.Scoring was done at the level of the preregistered study and thus scores for 20 articles are presented.We found 13 (65%) articles included at least one undeclared deviation (i.e., a score of 3).The number of undeclared deviations per study ranged from 0 to 8 (M = 2.25, SD = 2.34).The number of articles containing at least one undeclared deviation was 3 (100.0%) in 2017, 4 (66.7%) in 2018, 4 (50.0%) in 2019, and 2 (66.7%) in 2020.Only 4 articles declared a deviation from the preregistration and provided a rationale for the change (i.e., a score of 1; the range of this score [per article] was 0-8, M = 0.85, SD = 2.3).
Figure 5 presents the proportion of each adherence score given across all questions and articles.A score of 0 was most common, indicating no deviation from the preregistration.The different "U" scores were also common, indicating that it was frequently difficult to determine whether authors had deviated from their preregistrations.
Combined, U scores made up 40.6% of the total responses given, with most (36.5%) made up by UP (unable to determine due to a lack of information in the preregistration) and UB (unable to determine due to a lack of information in both the preregistration and article) scores.A score of 2 was not awarded to any article, indicating that all reported deviations were accompanied by rationale.Table 5 presents the distribution of adherence scores for each question.Undeclared deviations most commonly related to the hypotheses tested, the variables included in tests, and the statistical analyses selected to test hypotheses.
UP scores, which indicate that there was poor specificity of an item in the preregistration despite being relevant to the study, were common in relation to the operationalization of independent variables, the estimation techniques used to estimate the statistical model(s), the statistical software used to conduct analyses, inference criteria, research funding, and competing interests.UB scores, which indicate a lack of specificity in both the preregistration and article despite being relevant to the study, were common in relation to the procedures used to randomly allocate participants to conditions, coding and handling data during data collection (e.g., dealing with mistakes made by participants or equipment), dealing with missing data, handling outliers, testing statistical assumptions, the software used to perform analyses, and possible constraints on publishing findings.

Risk of Bias in Reporting
The outcomes from scoring the risk of bias in study reporting are presented in Table 6.We operationalized RDoF R5 (misreporting results and p-values) as failing the online tool 'statcheck' (http://statcheck.io), which uses the test statistic and degrees of freedom from reported outcomes to recalculate p-values and highlight any discrepancies between reported and recalculated values.Statcheck was able to identify all of the components required to recompute 60 p-values in seven articles (the tool may have been unable to find the information required to compute p-values in some articles for several reasons, including because none were reported, results were not reported in APA style, or difficulty reading PDF files).We found six (10.0%) statistical reporting errors, one (1.67%) of which was a decision error (i.e., a p-value misreported in a way that may affect whether it is interpreted as statistically significant [it crosses the 0.05 threshold]), spread across two articles (which reported four preregistered studies between them).However, we decided to manually inspect all errors and found that one non-decision error and the one decision error were mistakes made by statcheck misidentifying outcome values.

Figure 5
Distribution of Adherence Scores.The proportion of each type of adherence score for the entire set of responses across all questions and articles.There were 520 total responses (26 questions * 20 articles).Scoring: 0 = Yes, consistent with preregistration-no deviation; 1 = No, deviation from preregistration made and declared by the authors and a justification for change is provided; 2 = No, deviation from preregistration made and declared, but no justification for deviation is provided; 3 = No, deviation made and not declared or justified by the authors; U = unable to determine due to lack of detail reported in the preregistration [UP], the article [UA], or both [UB]; NA = Not applicable.

Table 6
Summary Are any hypotheses reported that weren't stated in the preregistration? 3 17 0 Note: Scores were assigned for each preregistered study reported as opposed to each article, other than for RDoF 5 which had to be scored at the article level and therefore scores for two of the 17 articles are represented five times in the frequency counts presented as these reported results from five of the preregistrations in our sample.

Discussion
The aim of this study was to better understand modern preregistration practices and how these can be improved to maximize their potential scientific benefits.We assessed the degree to which gambling researchers sufficiently specified all aspects of their studies in preregistrations (N = 53), the extent to which they adhered to their plans, and the risk of bias in reporting preregistered studies in the field.We also compared the results for our sample with the results from a similar study that analyzed a cross-disciplinary sample of 52 preregistrations (Bakker et al., 2020).In the following subsections, we discuss the results from each of these assessments, the implications and limitations of our findings, and recommendations for improving preregistration practices.

Preregistration Specificity
Similar to Bakker et al. (2020), we found that gambling researchers' level of specificity was low for many RDoFs, indicating that RDoF in these particular areas was not restricted by preregistrations.Mean specificity scores were less than 1 (which represents the minimal possible specificity; 0 represents 'not specified') for 15 RDoFs, including descriptions of the independent variables and how they will be measured (D1 & A8); all variables (e.g., covariates, moderators) included in analyses (D2 & A10); the primary dependent variable(s) (D4 & A7), power analyses (D6), participant randomization (C1); blinding procedures (C2); coding and handling data during collection (C3); handling missing data (A1); dealing with statistical assumptions testing (A3); and handling outliers (A4); the estimation method, software package, and computation of standard errors (A14); and the hypotheses, sufficiently so as to prevent HARKing (R6).These findings suggest the intended benefits of preregistration-such as restricting and enabling an evaluation of test severity-are not fully achieved by current levels of reporting.One area where specificity levels were relatively high (mean >2) was in the description of study hypotheses.While some hypotheses were vaguely specified (see Number of Hypotheses subsection of results), most researchers presented hypotheses that enabled us to discern the key variables under study as well as the direction of the predicted effect(s).This is positive given the centrality of hypotheses to preregistrations, and represents an area of good practice.
Despite generally low specificity levels and contrary to our hypothesis, 12 RDoF specificity scores from our gambling studies sample were significantly higher than those from the cross-disciplinary sample in Bakker et al. (2020).There are a number of possible reasons for this.
First, all studies in the cross-disciplinary sample were registered in 2016 and mean specificity scores appear to have improved over time (42.6% of articles in our sample were published in 2020, 31.5% in 2019, 16.7% in 2018, and 7.4% in 2017).Second, there may have been differences in scoring between our study and that of Bakker and colleagues.As stated in the Scoring Preregistration Specificity subsection, we developed multiple decision rules to guide our scoring and these often focused on how we could award more scores in circumstances where the proposed methods were not aligned with the scoring system so as not to unfairly disadvantage these preregistrations.For example, question two in the scoring protocol asks, "Is the direction of the hypothesis specified?"and in order to obtain a high score of 3, a preregistration must also state the sidedness of the statistical test of the hypothesis; however, some of the preregistrations used Chi-Squared tests and/or analysis of variance and the sidedness of these tests cannot be specified.As such, we awarded a score of 3 in these cases so long as the direction of all predicted differences was clearly specified.Third, 17 (32.1%) of the gambling preregistrations were authored by one or more of the present study's team, all of whom are dedicated to improving the transparency of their work through preregistration.The mean overall specificity score for registrations authored by one of the present team was considerably higher than the remaining sample of registrations (1.27 and 0.83, respectively).

Adherence to Preregistrations
Researchers may deviate from their preregistration for a number of reasons-due to requests from referees or editors during the peer review process; after finding a more appropriate statistical test of their hypothesis or unexpected, but logical, reasons to exclude particular participants; or more concerningly, to increase the chance of observing statistically significant findings and/or to inflate effect sizes.Thus, deviations can be positive, resulting in more informative and/or scientifically rigorous outcomes, or negative, resulting in misleading or inaccurate findings.Either way, it is essential that researchers transparently report any deviations so that others can judge their appropriateness and potential impact on the validity of the findings reported.
Our findings support existing research on clinical trial registration (Goldacre et al., 2019;Vassar et al., 2020) and general study preregistration (Claesen et al., 2019;Ofosu & Posner, 2019) in suggesting that many researchers do not transparently declare deviations from their pre-specified plans.We found a lower proportion of articles included undeclared deviations (65%) than Claesen et al. found in their sample of preregis-tered studies published in Psychological Science (96%).This could be explained by the outlet of publication (none of our sample were published in Psychological Science) or, perhaps more likely, improved reporting standards since the 2015-2017 period studied by Claesen and colleagues.Unreported deviations in our sample were most common in relation to hypotheses (35% of articles), the variables included in hypothesis tests (25%), and the statistical models used to test hypotheses (25%).These results are consistent with Ofosu and Posner's (2020) observations in the economics and political science literature, who found the median article failed to report 25% of registered hypotheses, 18% included tests of non-registered hypotheses, and 19% articles deviated in the statistical models used (only one of which declared this deviation).Breaking down the types of hypothesis deviations in our study, four articles (20%) failed to report preregistered hypotheses, two (10%) reported non-registered hypotheses, and one (5%) altered preregistered hypotheses (e.g., by changing non-directional to directional predictions).These findings suggest changes to hypotheses post-registration are more diverse than simply developing post-hoc hypotheses most consistent with the outcomes (i.e., what Kerr [1998] termed "pure HARKing").
Our findings expand on previous fidelity studies (Claesen et al., 2019;Ofosu & Posner, 2020) by also reporting the number of instances when we were unable to tell whether authors deviated from their preregistrations due to insufficient detail in their preregistration (UP), article (UA), or both (UB).These outcomes are essential for understanding the value of current preregistration practices.If, as was frequently the case in our study, one cannot determine whether the methods reported in an article are consonant with the allied preregistration, then the value of the practice is seriously diminished.Our breakdown into UP, UA, and UB scores revealed that ambiguous and/or incomplete reporting in both preregistrations and resulting articles often precludes efforts to cross-check pre-planned methods with those actually used.Preregistrations often included insufficient details of statistical estimation methods to enable comparisons with published articles, and both preregistrations and articles frequently failed to provide any detail regarding procedures used to handle outliers, data handling during collection, testing of statistical assumptions, dealing with missing data, the software used to perform analysis, and randomization procedures.Claesen et al. (2019) also reported that they found it difficult to assess whether authors had deviated from their preregistrations because neither "preregistrations nor the published studies were written in sufficient detail" (p.9).

Risk of Bias in Reporting Preregistered Studies
Our evaluation of the risk of reporting bias is, to our knowledge, the first study to use (Wicherts et al., 2016) checklist for this purpose and provides further insights into preregistration and reporting practices.Of 20 preregistered studies, data were shared for 12 and analysis scripts were available for six.These rates are substantially higher than those found in the wider gambling literature for sharing data and analysis scripts, which were both found in less than 4% of studies in a random sample of 500 gambling research studies for the 2016-2019 period (Louderback et al., 2022).The higher rates found in our study might be because researchers who preregister their studies are more likely to engage in other open science practices.
We found four articles (of 17) that did not mention the study preregistration or link to it, hampering attempts by readers to compare the article with the preregistration.One article (for two preregistrations) did not report a third study that was preregistered.When we contacted the author to inquire about this, they stated that they had originally submitted the preregistered study to a journal and reviewer comments led them to perform two additional experiments, but they did not explain why the outcomes from the original study were not reported anywhere.Further, three articles were not reported sufficiently to enable replication and two (for four preregistrations) contained statistical reporting errors, obfuscating interpretations of findings and replication attempts.These instances represent opportunities for additional education about transparency in reporting that will help advance the gambling field and beyond.

Limitations
There are several limitations of our findings that are important to note.First, our sample of preregistrations and articles was restricted to the gambling studies field.Although this conferred the benefits discussed in our introduction (i.e., subject expertise aided evaluation of reporting; concerns of bias in the field), gambling researchers typically come from the fields of psychology, neuroscience, and public health.Therefore, our outcomes might not generalize beyond these disciplines, despite the similarities between our findings and evaluations of preregistered studies in economics and political science (Ofosu & Posner, 2019).Second, we restricted our search for gambling preregistrations to OSF and excluded other repositories like AsPredicted.org,which may have implications for the generalizability of our findings to all gambling preregistrations.However, AsPredicted.orgdoes not currently offer the abil-ity to search for relevant preregistrations (we contacted AsPredicted.org to see whether we could search their database but did not receive a response).
Third, our statistical power was likely lower than aimed for as we specified our α level at 0.05 in our a priori power analysis but used a multiple testing correction method (i.e., Benjamini-Hochberg) that essentially sets a separate (often much lower) α level for each test.Further, we performed this power analysis under the assumption that we would perform one-tailed tests, but (as explained in the "Deviations from our preregistration" section) later determined that two-tailed tests were more appropriate.We conducted a post-hoc sensitivity power analysis to determine the effect size required to obtain statistical significance, given our use of two-tailed tests and actual α (0.05), target power (i.e., 0.8), and group sample sizes.This determined that an actual effect size of d = 0.69 (equivalent to a Cliff's D of 0.425) was required.We report the protocol and outcomes from this analysis on OSF (https://osf.io/dnfqa).
Fourth, changing our Wilcoxon-Mann-Whitney tests from one-tailed to two-tailed after determining that group differences were not unidirectional may have inflated our Type-I error rate (i.e., the risk of finding falsepositive outcomes).Fifth, our exploration of preregistration adherence was limited because articles were only available for 20 of the preregistrations in our sample.
Sixth, there might also be limitations to the specificity scoring protocol we used to evaluate preregistrations.To achieve a maximum score of 3 on most RDoF items requires preregistration authors to explicitly state that they will not deviate from their pre-specified method by, for example, using additional eligibility criteria or reasons for excluding data points.Although such statements may add value in restricting RDoF, this approach is unconventional in scientific research and therefore scores of 2 and 3 could be viewed as equivalent until the value of making explicit promises not to deviate from preregistrations has been empirically evaluated (for interested readers, we have recreated Table 2 and Figure 3 after recoding all scores of "3" to "2" and uploaded this in a supplemental document on OSF: https://osf.io/93hxe).Additionally, some parts of the scoring system largely apply to experimental research (e.g., RDoF items D1 & A9 relate to manipulated independent variables & RDoF C1 relates to blinding procedures).This meant there was a high proportion of NA values for these RDoF items and therefore the imputed values for these items-and test outcomes based on their use-should be viewed cautiously.
Finally, while no author scored their own preregistration, the specificity level of the gambling preregistra-tion sample was augmented by the inclusion of those authored by one or more of the present research team.The overall mean specificity score without our preregistrations was 0.83 (SD = 0.26), changing to 0.97 (SD = 0.40) with our preregistrations.After removing preregistrations authored by one or more members, we plotted the overall specificity scores for the gambling sample by year (2017-2020; replicating the format of Figure 3.1) and found year-on-year increases in scores, suggesting that improvements in preregistration reporting are not simply the result of our team's progression in this area (see the section of our analysis process document titled "Impact of our registrations on our outcomes" for this plot; https://osf.io/wqrn8).

Implications of Findings
Our findings have several important implications for understanding and advancing the value of preregistration in scientific research.At present, study plans presented in preregistrations would benefit from additional specificity to prevent researchers needing to make datacontingent decisions (e.g., when to cease data collection) that could potentially bias findings (Wicherts et al., 2016).Further, the majority of articles reporting preregistered studies contain at least one undeclared deviation from the preregistration, and a notable proportion do not mention that the studies were preregistered, precluding evaluations of test severity (Lakens, 2019) and preregistration fidelity.What is more, the failure to clearly describe methods in both preregistrations and corresponding articles was problematic and obfuscated evaluations of consistency.In one case, it took two researchers six hours each to score one preregistration for specificity and adherence due to ambiguity and a lack of clarity in the preregistration and inconsistencies with the article.
There are several factors that likely contribute to these difficulties beyond the control of researchers.For example, strict journal word counts can prevent authors from fully explaining their methods, and requests from reviewers and editors made during the review process can lead to changes in the terms used or the analyses conducted that make comparisons with preregistrations difficult.While these issues are not present when writing preregistrations, preregistration remains a relatively new component of the research process and, to date, research institutions have provided little formal training and guidance for preparing preregistrations.Additionally, the time and resources required to undertake preregistration have not been factored into existing funding structures.
Overall, our findings indicate that, if an overarching goal of preregistration is to reduce RDoF and this can be achieved via writing highly specific research plans, gambling researchers are not currently achieving this goal as their preregistered plans are often vague and lacking in details about the proposed methods.If the goal is to allow readers to evaluate the severity of hypothesis tests, then this too may not be achieved by current gambling study preregistrations as frequently too few details of planned methods and analyses (e.g., alpha level, stopping rule) are reported to enable proper evaluation of the extent to which the tests could falsify the predictions made in the preregistration.Finally, if the goal of preregistration is to enhance transparency, then disclosing any deviations from the preregistration is an obvious and useful way to further that overarching goal; yet it seems like researchers in gambling studies are not currently doing so for many preregistration deviations.
These conclusions are concerning as the time required to preregister studies is not insubstantial.Ofosu and Posner (2019) found 88% of economics and political science researchers surveyed spent, on average, at least a week writing their preregistration, 32% spent 2-4 weeks, and 26% spent more than a month; although the majority of those surveyed agreed that the time dedicated to preregistration was worthwhile and that it allowed them to receive useful pre-study feedback and/or it saved time downstream.Still, the time investment has been raised as an objection to preregistration (Ofosu & Posner, 2020), and preregistering one's study with sufficient detail is challenging (Nosek et al., 2019).Evaluations of how preregistering studies impacts the reporting quality, reproducibility, and replicability of published research are needed to confirm whether the benefits justify the additional effort required to review preregistrations.
Preregistration practices appear to be improving.We observed increases in specificity and decreases in the proportion of articles containing undeclared deviations from 2017 to 2020.We provided further evidence that more structured templates like the OSF preregistration8 and AsPredicted.orgformats typically result in higher levels of specificity than less structured templates like the OSF standard pre-data collection format.Future research in this area could compare additional templates to identify those that result in higher levels of specificity, such as the recently developed Psychological Research Preregistration-Quantitative (PRP-QUANT) Template (Bosnjak et al., 2021).Finally, undertaking this study has provided unique insights into the difficulties faced when trying to interpret preregistrations and evaluate researchers' adherence to them, which we have used to proffer suggestions for improving the value of preregistration for researchers and organizations in-volved in the scientific enterprise (journals, research institutions, and funding bodies) below.

Five Recommendations for Researchers Preregistering Their Studies
1. State what it takes to falsify your hypothesis: Lakens (2019) recommended that authors of preregistrations do this, and this strategy would overcome many of the issues we observed in gambling study preregistrations.As described, several authors presented multiple predictions as a single hypothesis without specifying whether one or all needed to be supported in order to view the hypothesis as being supported by their data (and possibly increasing the likelihood of authors being able to state that their hypothesis was at least "partially supported").Further, some hypotheses were so vague as to be almost impossible to falsify (e.g., "The removal of opportunities to bet on live sporting events [due to COVID-19 shutdowns] will lead some sports bettors to engage in other forms of gambling. 9") and thus tests of these predictions will lack severity (Lakens, 2019).Scheel (2022) reports similar issues when assessing Registered Reports-hypotheses were so vaguely specified that it was unclear how they could be operationalized and tested.These issues can be at least partially avoided by stating what outcome(s) would falsify one's hypotheses.

Use a structured preregistration template:
Structured templates like the OSF preregistration format are associated with better specificity and can help researchers understand what information they need to include in their preregistrations to ensure their study plan is sufficiently specified.The highly detailed PRP-QUANT template may be of particular value for quantitative researchers in psychology and related fields (Bosnjak et al., 2021).Authors can further enhance the specificity of their preregistrations by using Bakker et al. (2020) scoring protocol as a guide, as we did when preregistering this study.

Ensure consistency between preregistration and article:
Researchers should make it as easy as possible for others to compare their pre-specified study plan with the resulting article.This can be achieved by using consistent terminology between the two (e.g., for variables and statistical models); by providing each hypothesis with the same, consistent label (e.g., H1); and, if using OSF to post preregistrations, by (re)naming their overarching project page (or relevant subcomponent) with the title of the final article.We found many OSF pages contained multiple preregistrations with similar names and overlapping content, making it difficult to discern which preregistration belonged to which article.Users can now rename past preregistrations on OSF and so we encourage all researchers to do this in retrospect, if necessary.
4. Clearly and directly link to your preregistration: Difficulties in connecting preregistrations and articles were also found by Claesen et al. (2019) and, as they recommended, could be further avoided by including a clear link directly to the allied preregistration(s) in articles and not simply a link to the overall project page.
5. Report all deviations from your preregistration: We recommend that authors report all protocol deviations within their study article under a clear heading like "Deviations from preregistration," as we have done here.However, space constraints may make it difficult to fully report each deviation, the rationale for the change, and the likely effect on study outcomes.Claesen et al. (2019) have developed a document for recording all of this information (https://osf.io/xv5rp/)and we have used similar "Transparent changes documents" for this study (https://osf.io/qep2a/)and others (https://osf.io/j6tud/).Whichever format is chosen, researchers should share these documents on an accessible repository (e.g., OSF) and/or alongside their article as supplemental material.
Five Recommendations for Journals, Research Institutions, and Funding Bodies to Improve the Value of Preregistration 1. Support transparency, not a clean narrative: Echoing the arguments made by the Nature Human Behaviour editorial team (2020), journals should encourage researchers to transparently report all aspects of their studies, including deviations, regardless of whether this makes the find-ings appear less conclusive or compelling.Others (e.g., Frankenhuis and Nettle, 2018) have suggested that a fully transparent presentation of results, including clear labelling of confirmatory and exploratory analyses, can actually foster creativity and knowledge sharing because all results are presented instead of only significant or "interesting" findings.
2. Remove word count restrictions on methods sections: Understanding exactly how research data were obtained, analyzed, and interpreted is fundamental to scientific understanding.Yet, many journals' word limit policies leave researchers with too little space to fully describe these processes.Word restrictions, if required at all, should be reserved for the introduction and discussion sections of articles so that researchers can freely describe all aspects of their methods and results.Reviewing preregistrations alongside submitted manuscripts could determine whether authors have preregistered a minimum set of study details (e.g., hypotheses, sample size rationale, measurements, analyses) and any deviations.However, this would likely require incentivizing reviewers, whether monetarily or via increased recognition of peer-reviewing contributions when considering candidates for jobs, promotions, and funding opportunities (see Moher et al., 2020).

Provide training and guidance on preregistration:
Teaching researchers about the scientific benefits conferred by study preregistration and providing training courses and guidance on how to write preregistrations will help to ensure that we maximize the benefits of this practice and avoid wasting resources on insufficiently detailed and poorly followed preregistrations.
5. Make preregistration mainstream: Research institutions and funding bodies should consider study preregistration a normal component of conducting hypothesis-testing research.The time and resources required to preregister studies should be factored into funding programs and workloads so that researchers have sufficient time to write their preregistrations in a way that will achieve the intended benefits.Journals can also support this ef-fort by including links to preregistrations alongside their articles' key information (e.g., DOI, author list), by considering the development of novel direct integration strategies within methods sections, and by requiring manuscript sections dedicated to highlighting deviations.

Conclusions
A preregistered study is not necessarily better, more rigorous, or more impactful than a non-preregistered one.Preregistration allows others to better evaluate studies by being able to detect deviations from prespecified plans and to differentiate confirmatory from exploratory analyses.They may also reduce the number of data-contingent decisions researchers need to make when performing their studies and thereby reduce the effects of (conscious or unconscious) bias on study outcomes.Our evaluation of preregistration practices in gambling studies indicates that preregistration activity is increasing in the field and improvements in specificity are occurring, although our sample was limited to only four years (2017)(2018)(2019)(2020).Further improvements in writing preregistrations and reporting the associated studies are necessary if we want to maximize the value of this process and improve the quality of the scientific literature.We hope the recommendations provided here will be useful for all researchers in achieving these goals, both in gambling-focused research and in science more generally.
available.It has been verified that the analysis reproduced the results presented in the article.The entire editorial process, including the open reviews, is published in the online supplement.
specificity: Summary of specificity scores for gambling & cross-disciplinary preregistrations

Analysis
Are the procedures used to deal with missing data consistent with those reported in the preregistration?
of risk of bias in reporting scores Code Researcher Degrees of Freedom Question Yes (N) No (N) NA (N) R1 Failing to assure reproducibility (verifying the data collection and data analysis)Are data shared and accessible to all? replication (re-running of the study) Are the methods reported sufficiently, to allow replication?Including all study materials used?, misrepresenting, or misidentifying the study preregistration Is the preregistration clearly mentioned and linked/signposted in the article and easily accessible?so-called "failed studies" that were originally deemed relevant to the research question Are any experiments that were preregistered not reported?

3.
Review preregistrations alongside articles: As highlighted by Claesen et al. (2019), existing systems (e.g., open science badges) reward authors for simply performing the act of preregistration, regardless of what information is included.
Does it indicate details of the estimation technique used to estimate the statistical model and compute standard errors?
21b: Does it specify which statistical software package and version is used for running the analyses?A15 Choosing inference criteria (e.g., Bayes factors, alpha level) 22: Does it indicate the inference criteria (e.g., Bayes factors, Alpha level)?