An observational study on the adherence to study registrations in German interventional and observational studies from various fields

Background The registration of studies, especially in the case of clinical trials, is required by the declaration of Helsinki and the policies of various scientific journals. However, numerous analyses have found considerable discrepancies between published articles and accompanying trial registrations. The aim of this study is to assess such discrepancies for a sample of studies with recruiting locations in Germany. Additionally, the association between the adherence to registrations and possible involvement of Coordinating Centers for Clinical Studies (KKS) as well as Universities of Excellence was tested. Methods For a sample of 376 interventional or observational study registrations, we found 115 published articles. Subsequently, we searched for discrepancies in the study design, the key inclusion criteria, the interventions, the blinding, and a primary and a secondary outcome. Results We found discrepancies in 26% of all studies, most frequently in the secondary outcomes, where 16.5% of the secondary outcomes per study that were registered in most detail had discrepancies. When running regression models for adherence discrepancies, the only variable that had a statistically significant association with better adherence was registration on ClinicalTrials.gov. The association of potential involvement of a KKS with adherence ratings was positive, too, but statistically insignificant. Conclusions In summary, the amount of discrepancies between registrations and published articles remains elevated.


INTRODUCTION
Clinical Trial Registries serve many different users, from researchers to clinicians and patients.All rely on registry data being correct and up-to-date, as it is challenging to identify errors or inconsistencies.However, a way to identify such inconsistencies is to compare registry data to other released documents of the same study, such as study protocols or published articles.that the results from many clinical trials are published only with substantial delays or not at all (Riedel et al., 2022).
The aim of the present study is to conduct a similar assessment for interventional trials and observational studies that were run in Germany and to check the association of some specific German structural factors with adherence, namely Universities of Excellence and Coordinating Centers for Clinical Studies (KKS).Universities of Excellence are universities that have been selected to participate in a German national support program with the aim to foster research and competitiveness, including substantial financial support.KKS are organized as a network of currently 26 member centers that share the common goal of supporting clinical research, e.g., by quality control, advisory services, and data management.
We defined a rating scheme for registration quality, applied these ratings to a sample of studies from several trial databases, searched accompanying publications, and lastly compared registered and published study information.Results on registration quality have already been published elsewhere (Thiele & Hirschfeld, 2022).The aim of the article at hand is, first, to manually assess the adherence of published articles to their accompanying registrations.Secondly, the association of study characteristics and the German structural factors with adherence errors is to be analyzed.
R code and data for reproducing the results are available at https://zenodo.org/record/7920215.

Data sources and study eligibility
Complete details on the procedure for creating the database of registrations from which studies were sampled were described previously (Thiele, Hirschfeld & Von Brachel, 2021).The result was a comprehensive database derived from joining ClinicalTrials.gov, the German Clinical Trials Register (DRKS), and the International Clinical Trials Registry Platform (ICTRP).The ICTRP is a meta-register that aggregates study registrations from 17 international registries, including ClinicalTrials.govand the DRKS.Data from all registries were downloaded during the first weeks of 2021.Instead of downloading directly from ClinicalTrials.gov,we used the pipe-delimited files from the Aggregate Analysis of ClinicalTrials.gov(AACT).We filtered the database to include only registrations that had at least one recruiting location in Germany.That database consisted of 35,912 study registrations and could be considered to be largely free of internal duplicates.We then drew a sample of 400 registrations and excluded extension studies, studies with multiple parts, registry studies, and follow-up studies.We did not filter the sample further, so it included interventional trials and observational studies from various fields.We searched publications for all remaining studies.

Automatically extracted factors
Some factors were extracted automatically: The presence of a KKS, of a UoE, and whether a study was pre-registered.Both the KKS factor and the UoE factor were determined using regular expressions.There is no information on which studies were supported by a KKS, so instead we flagged studies that a KKS could have potentially supported by considering the KKS' formal establishment date at the sponsor's location.

Publication search
All manual searches and assessments were conducted for the sample that was drawn from the merged database.Publications were searched using the trial ID, the trial title, and the sponsor on Google Scholar and PubMed.If multiple matching publications were found, we used the one that seemed to present the main results.We only included full articles and pre-prints of full articles, but no poster abstracts or conference proceedings.

Registration quality and adherence
A point rating scheme for registration quality has been described and applied previously (Thiele & Hirschfeld, 2022).This rating scheme incorporated quality factors from STROBE, CONSORT, the WHO Data Set, and previous literature.It included the following categories and maximum point ratings in parentheses: study design (1), inclusion criteria (2), interventions (2), blinding (1), primary outcomes (5), and secondary outcomes (5).Now the adherence could be checked for all pieces of study information in the aforementioned categories that obtained a registration score higher than 0 (insufficient information).To be able to handle the heterogeneous sample that had not been filtered based on, e.g., discipline or study type, we only checked the adherence for each of the primary and secondary outcomes that had the highest registration quality score, thus the most detailed registration.
We assessed the adherence to the study registration for the entries in the sample for which a matching publication was found.For adherence ratings, we compared the reported study design and characteristics for each of the six above mentioned criteria.Discrepancies were marked as being present or not and additionally we recorded the type of discrepancy.If the information in the published article was more specific or less specific than in the register, we did not mark this as a discrepancy.However, for the key inclusion criteria, we recorded these differences in specificity to check their frequency and whether published articles or registry entries tended to be more specific.The adherence ratings were done independently by two raters and differences in ratings were subsequently discussed and settled.Regarding primary outcomes, we marked it as a discrepancy if any other than the registered primary outcome was the primary outcome according to the article.In many articles, however, the discrimination between primary and secondary outcomes is not made explicitly and we did not mark that as a deviation from the registration.
We checked three criteria concerning the study design: the allocation, the sample size, and the inclusion criteria.Primary and secondary outcomes were checked on the type of the outcome, the measure (e.g., BDI-II), the time frame, the metric (e.g., 'change from baseline' or 'end value'), and the aggregation method (e.g., proportion of patients or mean value).For the intervention, we checked the active ingredient and the dosage or, for non-pharmacological studies, the intervention type and its description.We did not assess observational studies or control arms for this criterion.Finally, we also checked if the blinding was carried out as registered, but did not include errors from that assessment in the regression model for reasons discussed in the final section.

Regression models for structural factors
We have estimated logistic regression models for the occurrence of at least one adherence error per study.Independent variables were the registration year, the sample size (divided by 1,000), UoE involvement, the study type, registration on ClinicalTrials.gov,registration on the DRKS, and potential involvement of a KKS.To allow for comparisons between different model specifications, we estimated separate single regression models for UoE and KKS, the former two models with the additional control variables, and a full model.

RESULTS
Of 400 registrations that were sampled from the overall database, we excluded 7 extension studies, 8 studies with multiple parts, 8 registry studies, and one follow-up study.The remaining 376 studies had a mean start date in September of 2011 and a mean enrollment of 1175 participants.We found published articles for 115 of these studies.With 320 out of 376, the vast majority of studies in our sample was interventional.A total of 56% of all studies were pre-registered.Seventy-seven studies were run by sponsors that were co-located with a KKS and a small amount of 22 had Universities of Excellence as their sponsors.A total of 82% of all studies were registered on ClinicalTrials.gov,40% on the European Clinical Trials Register (EUCTR), and 22% on the DRKS.Due to cross-registrations, these numbers do not add up to 100% (see Table 1).
When considering only the subset of studies for which published articles were found, most of the above means and percentages stay roughly the same.Only the difference in the percentage of pre-registered studies was statistically significant, with 40% of studies with articles being pre-registered, compared to 63% of studies without published articles that were pre-registered.
We found most adherence discrepancies for secondary outcomes, with 16 out of 97 (16.5%) studies having a discrepancy there.Primary outcomes had better adherence, with nine studies with adherence errors out of 111 (8.1%).We found adherence errors in the inclusion criteria for nine out of 107 (8.4%) studies, in the interventions for six out of 95 (6.3%) studies, and in the study design for three out of 114 (2.6%) studies (see Table 2).All in all, we found adherence discrepancies for 30 out of 115 (26.1%) studies.
We categorized the found adherence errors (see Table 3) for the 115 studies that had published articles.The most frequent errors were differing cutpoints for inclusion criteria, which occurred in five studies, non-reporting of outcomes, which occurred three times for  primary and 11 times for secondary outcomes, and different time frames for assessment of secondary outcomes, which occurred five times.Comparing the rating categories, we found most adherence errors in the secondary outcomes, with 11 occurrences of non-reporting and five occurrences of differing time frames.Additionally, three studies had differing study designs.
Concretely, some examples of discrepancies across all of the rating categories were the following.A study registered its design as 'a randomised multi-centre partial cross-over trial', however, the article does not mention any cross-over of patients from one treatment group to the other.As part of the inclusion criteria, a study registered a cutoff on the Body Mass Index of 35, while the article mentions a cutoff of 40.One article reported Methotrexate as the intervention, as registered, but additionally Ifosmamide was given.This discrepancy was acknowledged as a protocol amendment in the published article.One study analyzed a training program for cancer patients and registered the reduction in fatigue as the primary outcome.However, 'feasibility' was introduced as a new primary outcome and fatigue was reported as a secondary outcome.While the most common discrepancy for secondary outcomes was non-reporting, some studies had more subtle discrepancies.For example, one study registered that response rates would be determined at 3-month intervals.However, the published article mentions only one point in time (after six months).
After estimating five logistic regression models for occurrence of at least one adherence error per study, we found reduced odds ratios if the structural factors in question were present, namely UoE and KKS.However, all coefficients were statistically insignificant, irrespective of included control variables.The only statistically significant variable was registration on ClinicalTrials.gov.Thus, registration on ClinicalTrials.govcorresponded to better adherence than would be expected by chance.The odds ratio for adherence errors when a study was registered on ClinicalTrials.govranged from 0.14 to 0.17, depending on the model, compared to odds ratios between 1.12 and 1.26 for registration on the DRKS (see Table 4).
Lastly, we also assessed whether the registered inclusion criteria were identical, more specific, or less specific in the published article.We did this for the 112 studies that had a registration quality rating for the inclusion criteria larger than zero.The key inclusion criteria were more specific compared to the registration in 21 published articles, less specific in 21 articles, and one study had additional key inclusion criteria in both the registration and the published article.We found no differences in the key inclusion criteria for the remaining 69 studies.
As an additional analysis and to be able to discuss some limitations of this study, we selected 50 papers for studies that were registered on ClinicalTrials.gov.For these 50 papers, we checked whether changes to the registry entry had been made after the paper was submitted or published.To do so, we used the 'last_update_submitted' variable from AACT and manually extracted the earliest submission or publishing date per paper.We used the publishing date when the submission date was unclear.We found that for 26 out of 50 studies, changes had been made to the registry entry after the paper was submitted or published.

DISCUSSION
We did not find statistically significant associations with the German structural factors, namely Universities of Excellence and Coordinating Centers for Clinical Studies.The only statistically significant factor was registration on ClinicalTrials.gov,which was positively associated with adherence.This may be due to actually better study conduct, or simply due to more frequent updates to the study registration, so that the registration matches later protocol modifications.As our data suggested, changes to registry entries after the publishing of articles might indeed be frequent.Twenty-six of 50 selected studies on ClinicalTrials.govhave made changes to the registry entry after submission or publication.In summary, we found a considerable number of discrepancies, similar to previous literature (Ewart, Lausen & Millian, 2009;Fleming et al., 2015;Rosenthal & Dwan, 2013).Thus, we suggest to always validate pieces of information contained in trial registries using other published information, when possible.The structural factors were not statistically significantly associated with adherence ratings.Rather, registration on Clinicaltrials.govwas statistically significantly associated with adherence, but it is unclear what the driving factor behind this finding is.
The data for assessing the association of adherence with KKS and Universities of Excellence was limited.We found published articles for 20 studies that were potentially supported by a KKS and for only five studies with Universities of Excellence as their sponsor.Accordingly, this resulted in rather wide confidence intervals.These intervals indicate a wide range of odds ratios for the association of any adherence error with the source registry and the involvement of a KKS.For example, the association with adherence errors of registration on the DRKS ranges from a decrease to about a third of the odds to a fourfold increase of the odds.While also statistically insignificant, the association of involvement of a KKS with adherence errors has 95% confidence intervals consistent with a sharp decrease of adherence errors.The 95% confidence intervals range from an odds ratio around 1.6 to odds ratios at the lower bounds of the confidence intervals of 0.06.The 95% confidence intervals for the involvement of a UoE are, however, extremely wide (e.g.[0.09,10.76]for the full model), and consistent with both a tenfold increase or decrease of the odds.Thus, these results do not rule out a practically meaningful association, when leaving statistical significance aside.Notably, the upper limits of 95% confidence intervals for the study being interventional are relatively high at odds ratios around 20.It is debatable whether this points to an actual tendency of interventional studies to have more adherence errors.An alternative explanation could be that interventional studies were registered in significantly more detail than observational studies: A study being observational instead of interventional was associated with a decrease of 0.1 in the sum of rating points (Thiele & Hirschfeld, 2022).
The registration information was missing or unclear for the study design of 15 studies, for the primary outcome of seven studies, for the secondary outcome of nine studies, for the inclusion criteria of eight studies, for the intervention of one study, and for the blinding of 16 studies.The adherence could not be rated in these cases of unclear registrations.About one third of primary and secondary outcomes each was completely registered.
Putting our results into perspective, there are, e.g., already results from other studies (Jones et al., 2017) on non-reporting of secondary outcomes.Other results were usually reported on a per-study level, which makes comparisons not quite straightforward.Assuming a median of four secondary outcomes per study (Jones et al., 2017) and a lognormal distribution of the number of secondary outcomes, we estimate the probability of a single secondary outcome to be omitted at around 4%.In our sample, we found non-reporting of secondary outcomes to be more common at 9.5%.
Originally, it was planned to also include adherence errors regarding the blinding in the regression models.However, the blinding turned out to be difficult to check for discrepancies.Nearly all interventional trials have registered sufficient information on the blinding, but trials were sometimes registered as, e.g., single blind without specifying if the patient or the assessor was blinded.According to the CONSORT statement (Schulz, Altman & Moher, 2010), it is a requirement to state who exactly was blinded, but this in practice rarely done for both the registration and the article.Most problematic was the apparently very general use of the term 'double blind' in many articles.Often, a trial was registered as, e.g., quadruple blind while the published article described the trial as double blind.A few trials gave the necessary details on who exactly was blinded, so that in these cases 'double blind' may have turned out to be in line with the definition of quadruple blinding (blinding of participant, care provider, investigator, and outcomes assessor).In most cases, though, there was no information on who exactly was blinded, leading to high apparent error rates in the blinding category.Concretely, when not assuming that 'double blind' also encompasses triple or quadruple blinding, 20 out of 56 trials had worse blinding in the published article compared to the registration.26 trials were in-line with the registration, 4 had not published sufficient information to judge what kind of blinding was carried out and one article reported a better blinding than the one that was registered.Thus, including these discrepancies would have had a severe impact on the results of the regression model while it was often unclear if blinding was actually not carried out as registered, due the very general use of the term 'double blind', as mentioned before.Alternatively, a relaxed version of the blinding assessment would have been conceivable which assumes that double, triple, and quadruple blind are equivalent as long as any further pieces of information do not show discrepancies.However, that assessment would have identified very few discrepancies at all, having virtually no impact on the estimated models and the qualitative conclusions based on these models.
Although there can be errors in multiple categories, adherence errors were modeled as a binary variable.This was done because there was a low number of studies with more than one adherence error, not allowing for reliable modelling of differences in the number of adherence errors.Concretely, there were 22 studies with one adherence error, five studies with two adherence errors, one study with three adherence errors, one study with four adherence errors, and one study with five adherence errors.
The methodology of this study was more forgiving than other methodologies from the literature in that it did not penalize, e.g., addition of new secondary outcomes.This was done to facilitate the manual analysis of a relatively large sample.It also accounts for the large heterogeneity in the sample, which was not restricted to certain study types or topics.On the other hand, this allowed for an assessment across scientific fields and study types Thus, we generally arrived at higher, but still not satisfying, adherence rates: 30 out of 115 (26.1%) studies had adherence errors.In line with previous findings (Ewart, Lausen & Millian, 2009;Fleming et al., 2015;Rosenthal & Dwan, 2013), we identified particularly many discrepancies in secondary outcomes.

Limitations
In the context of this study we were only aiming to analyse how well a snapshot of registrations reflected published information, so we did not track possible modifications to these trial registrations.
Given the large heterogeneity in the sample at hand that contained interventional trials as well as observational studies from various fields, we did not rate all registered outcomes, but only the primary and secondary one with the most detailed registration.Rating the more detailed registrations makes finding discrepancies more likely, but all in all this procedure probably led to the detection of fewer discrepancies.
A further research questions that could have been pursued is the comparison of published results with results data that is contained in trial registries.Some studies have done so (Hartung et al., 2014), but most trial registries, apart from ClinicalTrials.gov,have no results data or for only very few studies.Another possible outcome could have been whether discrepancies favor statistical significance, which again some studies have assessed (Jones et al., 2017).
Regarding changes to registry entries that had been made after publication, it is important to note that these can of course be unproblematic from an ethical perspective.Additionally, such changes might also pertain to pieces of information that are irrelevant for this analysis.It would still be preferable to use date-aligned versions of the registry entries when checking adherence, if possible.At least for ClinicalTrials.govand DRKS there was software introduced recently that should simplify this task (Carlisle, 2022).
Lastly, there is some evidence that other study documents than registry entries offer more detailed information on clinical trials, e.g., study protocols (Wieseler et al., 2012).However, analyzing these documents is much more involved than analyzing registry entries due to their length and the free-form way of presenting information, in contrast to the standardized structure of registry entries.Again, given the size and large topical variety of the sample at hand, we did not try to incorporate these sources.

Table 1 General descriptive information of the sample and the subset of studies for which published articles were found
. P-values are obtained from two-sample tests for equality of proportions, and 95% confidence intervals are estimated using Wilson's score method with continuity correction.
Notes.d, Difference in proportions; KKS, Coordinating Center for Clinical Studies; U of Exc., University of Excellence; CT.gov, ClinicalTrials.gov;DRKS, German Clinical Trials Register; EUCTR, EU Clinical Trials Register.

Table 4 Summary of logistic regression models for the occurence of adherence errors with p-values in parentheses and 95% confidence inter- vals.
All estimates show odds ratios.