A Systematic Online Living Evidence Summary of experimental Alzheimer’s disease research

of


Background
Alzheimer's disease (AD) is a devastating neurodegenerative disorder characterised by progressive cognitive decline and memory loss.By 2050, it is estimated over 100 million people will be living with the condition worldwide (1).As the global population ages, AD continues to weigh heavily on healthcare systems, society, and the wider economy.Over the last two decades, many billions have J o u r n a l P r e -p r o o f been spent on laboratory research conducted across the pharmaceutical industry and academic institutions in concerted efforts to develop disease-modifying treatments (2).The use of preclinical models has helped deepen our understanding of disease aetiology and has enabled researchers to evaluate thousands of potential therapeutic compounds for safety and efficacy prior to testing in humans.Positive data from preclinical trials have encouraged numerous clinical trials; unfortunately, nearly all therapies tested have failed to demonstrate significant therapeutic benefit for those living with AD (3)(4)(5).To illustrate the scale of the problem, it has been estimated that since the millennium, over 400 trials testing AD targeted therapeutics have failed (6).The recent Food and Drug Administration approval of two monoclonal antibody therapies, Lecanemab and Aducanumab is an encouraging development.However, their "real world" efficacy is still to be evaluated, and both approvals have been accompanied with controversy over the strength of the evidence justifying their use (7,8).
When planning laboratory experiments to investigate AD pathology or evaluate new therapeutic target, it is important to consider how the resulting data will fit into the broader context of existing knowledge.Thought leaders in AD emphasise the need for robust and reproducible target validation to facilitate drug discovery (9).Rather than relying on evidence from a single study, we should seek incremental evidence from a range of experiments that attempt to answer the same and related research questions across different laboratories and model systems.AD is a complex, multifactorial disorder.No model can be fully representative of the human condition, though some may be more relevant for investigating specific aspects of the disease (e.g.tau pathology) than others (10), and could provide mechanistic insights into how these facets of the disease manifest (11).As pointed out by others (12), 15 new therapies have been identified over the last decade to treat multiple sclerosis (MS) -another highly complex neurological disease -which indicates that lack of a "perfect" animal model doesn't prevent progress.In MS, there have been efforts to selectively target the inflammatory aspects of the disease which are reproduced in animal models (13), and successful clinical trials have appropriately aligned their outcomes with preclinical efficacy studies.
By improving our understanding of the quantity and quality of existing research, we could significantly reduce research waste.By taking stock of the literature in its entirety, we could prevent the unnecessary duplication of experiments.Furthermore, by combining effect sizes across different studies measuring similar outcomes, we can develop recommendations on the sample sizes required in different contexts for adequate statistical power (14).Through examining gaps in our existing knowledge, we can ensure that we prioritise and fund experiments which are likely to be of most scientific value.However, in practice, the approach to identifying prior studies is often haphazard and too reliant on the journals that we subscribe to, the conferences we attend, or the study findings shared within our networks.Systematic reviews and meta-analyses seek to provide an unbiased overview of the evidence, assess rigour and reporting, and identify which experimental design factors may influence reproducibility and predictive value (15,16).Taking a systematic approach can facilitate a deeper understanding of what makes research reliable, how it can most effectively be improved, and promote more informed decision-making (17,18).Across biomedical research, most preclinical systematic reviews have focused on the internal validity of in vivo experiments modelling human diseases.Past work has identified methodological weaknesses and poor reporting quality (where studies do not report the details of experimental design, conduct, and analysis).Persistent failures to report such measures have been associated with inflated estimates of treatment efficacy and likely lead to false positive results, where a drug appears to improve outcome but in reality does not (19)(20)(21).Previous work has highlighted the extent of the problem within the in vivo preclinical AD literature (22)(23)(24)(25)(26), J o u r n a l P r e -p r o o f with poor reporting of key experimental design characteristics and measures to reduce the risk of bias.In a retrospective review of the preclinical evidence that informed six high-profile AD clinical trials (27), the authors concluded that some were "very unlikely to succeed" based on the prior evidence.Of those reviewed, 4 (Tramiprisate, Semagacestat, Bapineuzumab, Solanezumab) had incomplete or inconsistent in vivo data from animal models and 2 had in vivo data which did not support progression to phase 1 clinical trials (Tarenflurbil, Gammagard).Although some of the pitfalls of the compounds were known at the time, a thorough and rigorously conducted systematic review of the evidence may have provided clear guidance about where the gaps were, how strong the evidence was for a specific outcome to be measured in patients (e.g. the ability to reduce levels of existing amyloid plaques or cognitive improvements), and the likelihood of clinical benefit.
In recent years, we have endeavoured to perform wide-ranging and comprehensive systematic reviews of the in vivo and more recently in vitro literature, often retrieving tens of thousands of potentially relevant citations from biomedical database searches (28).We have observed that the specific animal or cell-based model(s) used and outcome(s) evaluated are not always clear without reading the full text of a published article, due to insufficient detail in the title, abstract, and other searchable fields (29,30).This can result in a trade-off between retrieving too many irrelevant studies and missing potentially important studies.In highly research-intensive fields, including AD, the pace of evidence generation plus the time, expertise, and resources required to complete such a review presents likely presents a significant barrier to systematic scientific advancement.After billions of pounds, millions of animals, and thousands of experiments, there is a huge body of potentially useful data dispersed across the literature.Curating these data in a form which allowed them to be accessed and exploited quickly, with minimal manual effort, could be transformative.
Harnessing technological advancements such as natural language processing and machine learning (28), we have developed an integrated workflow of automated tools to systematically collect, curate, and visualise evidence from in vivo experiments.Systematic Online Living Evidence Summaries, or SOLES projects (31), are available as publicly accessible interactive dashboards, refreshed with new evidence on a regular basis.Here, we describe the development of AD-SOLES; a dashboard to accelerate evidence-driven preclinical research in AD models.Using the dashboard, all AD research stakeholders including researchers, funders, patients, and the public can gain a better understanding of the quantity and quality of the existing evidence.We intend that AD-SOLES be a platform to support (i) research-on-research (including systematic reviews) of in vivo AD models, (ii) research improvement activities, and (iii) evidence-based decision making.

Methodology Automated citation retrieval
We retrieve relevant citations from across three biomedical sources: PubMed, Web of Science (WoS), and Scopus.Instead of limiting our search to include specific models or in vivo research, we use broad and simple search terms (Table 1) to identify all potentially relevant research related to Alzheimer's disease.This is partly due to an uncertainty in whether citations have been indexed in enough detail to selectively retrieve experiments in animal models, and to preserve AD studies in in vitro and clinical populations for future expansions of AD-SOLES.On a weekly basis, new citations are retrieved programmatically using modified versions of existing R packages to query WoS (wosr (32)), Scopus (ScopusAPI (33)), and Pubmed (RISmed (34).)Each tool uses application programming interfaces (APIs) to find and retrieve relevant citations.We modified each function to format the retrieved data and retain the most important metadata (including title, authors, abstract, DOI, pages, volume, issue, journal, URL, and database accession numbers).Once new citations are identified, we use the Automated Systematic Search Deduplicator (ASySD) (35) to remove any duplicate copies of citations.Citations from the previous two months of search results are also compared by ASySD to remove any citations that have been retrieved previously.

J o u r n a l P r e -p r o o f
Once complete, the unique set of new citations are added to the AD-SOLES database.To capture any additional duplicate citations that have been missed, we also perform a comprehensive deduplication process (using automated and manual deduplication functions in ASySD) every 6 months.

Screening for in vivo research
Using screening decisions from human reviewers, we trained a machine learning algorithm hosted at the EPPI Centre, University College London, having applied this tool successfully in previous systematic reviews (14,36) and classification tasks (30).To train the algorithm for our classification task, we collated 4,182 verified screening decisions (where at least two human reviewers were in agreement about whether a publication did or did not include reports using an in vivo AD model from four systematic review projects: an earlier attempt to create a living systematic review of AD models (37), an ongoing review of Open Field Test measurements in animal models of AD (38), an ongoing review of in vitro slice electrophysiology measurements in AD models (39), and an older review of interventions in transgenic AD models (23).Human decisions were sent to the machine learning algorithm alongside their corresponding titles and abstracts.Each time new publications are retrieved, the machine algorithm is re-trained and applied, leading to marginal differences in performance between each iteration.We keep a log of performance (see Table 2 for performance metrics used) with unique identifiers for each run.This is to ensure we can track any significant changes over time that may suggest we need to generate more training data to improve performance.

Retrieving full texts
To retrieve full texts (either in PDF, XML, or text format), we use the DOI of included studies to query the Unpaywall (40) and CrossRef (41) APIs and retrieve downloadable links to open access full texts.We also make use of Elsevier (42) and Wiley (43) APIs to programmatically access and download additional full texts available via our institutional subscriptions (University of Edinburgh).

Study feature tagging
To tag each publication by animal model(s), outcome measure(s), intervention(s), species, and sex(es), we developed customised dictionaries of regular expressions or "regex" patterns, which are highly specialised to search bodies of text for specific instances of characters, words, and phrases (44).All regex dictionaries used within AD-SOLES are available on the Open Science Framework (OSF) at https://osf.io/yhxq4/.For any future updates, we will upload a versioned file to this OSF project.
Model dictionary: We extracted a list of models identified in a previous review of transgenic Alzheimer's disease models (23).We supplemented this list with a curated database of transgenic models and alternative names available via the Alzforum website (45).We first converted the list of models into regex strings using an in-house R script.This conversion included added word boundaries between each word (to ensure that "APP" didn't match with "PAPP") and placing Boolean "OR" operators between each possible variation for each model (to ensure that 3xTg or 3 x TG or 3 x Tg-AD signalled a match for the 3xTg-AD model.Early validation results indicate that there are often matches with references to other work mentioning a specific model.In an attempt to improve this, we created a regex to extract model sentences and applied the regex dictionaries for model, sex, and species to those sentences. Intervention dictionary: Across SOLES projects, we use a list of over 12,000 compounds obtained from DrugBank (46) which has been programmatically converted into regular expressions.For AD-SOLES, we also extracted a list of interventions, target types, and drug classes from the Alzforum website (47) and developed regexes for each drug to capture synonyms, alternate spellings, and punctuation differences.
Outcome dictionary: We extracted a list of behavioural outcomes identified in a previous review of transgenic Alzheimer's disease models (23) and converted these terms to regular expressions through manual review of studies in this annotated dataset to check for variations in language.For example, the "Morris water maze" may also be called "water maze" or "MWM" or "Morris maze".We also developed additional regular expressions to support an ongoing review of in vitro hippocampal slice electrophysiology in AD models (39).
Sex dictionary: We developed a simple regular expression pattern for male and female animals.

Species dictionary:
We developed simple regular expression patterns for the most commonly used animals in neurodegeneration research.

Model sentence extractor:
A specialised regex pattern was also created to extract sentences within a publication containing a description of the animal model describing where the model was obtained from and/or details of model generation.Regex dictionaries for model sex, and species were then applied to this extracted text directly, with the aim of improving specificity compared to full text performance.

Study feature tagging: validation
To estimate the usefulness of feature tagging and determine the optimal approach, we performed a validation study.We applied regex dictionaries to the title/abstract/keyword fields (tiabkw method) and full text of each included study in AD-SOLES.For model, sex, and species tagging, we also extracted model sentences and applied regex dictionaries to the extracted text.When applying regex dictionaries for interventions throughout the development of SOLES projects, we have become aware that there are often many spurious matches within the full text due to non-specific drug synonyms and uses in other contexts.For example, a compound could be used as an intervention in J o u r n a l P r e -p r o o f one study and as a culture medium in another.At present, the intervention dictionary is not specific enough for use on full texts.For this reason, we only apply intervention regex dictionaries using the tiabkw method.
We collated citations in AD-SOLES which had been tagged with at least one model, outcome, sex, and species using multiple approaches.From the fully tagged studies, we obtained a random subset of 100 articles to manually check.A single reviewer read the full text of each study and checked whether each of the identified tags were accurate or not, providing a TRUE/FALSE decision beside each tag on a google sheets spreadsheet.Following this, decisions were imported into R for analysis.It is not possible to calculate the true sensitivity of a regex approach using the approach described here.For example, it is unclear how many studies using a certain model in the AD-SOLES database have not been tagged.Instead, we estimated the positive predictive value (precision), and specificity of each approach for each tag type.We also estimated the sensitivity for different approaches based on the validated model, sex, outcome, species, and intervention tags that had been identified in the subset.In other words, for all of the validated model tags across the 100 studies, what proportion were correctly identified using only the tiabkw approach?
To identify optimal approaches going forward, we also aimed to compare full text regex match frequencies at different thresholds (1 match or more, >1 matches, >2 matches), 1 or more title/abstract/keyword matches, and 1 or more model sentence matches.Optimal approaches were defined as having a precision of >0.80, indicating that when a study was tagged, there was an 80% likelihood that the tag has been correctly applied.Where there are multiple approaches with similar precision, we will preferentially select for the one with a higher sensitivity.

Transparency assessments
To obtain estimates of data sharing and code sharing practices across the AD literature, we employed the ODDPub tool (RRID:SCR_018385) developed to support automated open data detection in biomedical research articles (48).ODDPub was previously validated on randomly sampled publications from PubMed and had an estimated sensitivity of 0.73 and a specificity of 1.0.Any articles in PDF format were converted before running the tool, as ODDPub requires articles in text format for processing.To obtain the open access status of publications, we queried the CrossRef database using rcrossref R package (49) using the DOI of included articles.

Risk of Bias assessment
To assess risk of bias reporting, we developed an automated tool for use in preclinical experiments (50).The tool uses natural language processing models to provide a probability score on the following measures to reduce the risk of bias: (1) random allocation to groups, (2) blinded outcome assessment, (3) conflict of interest statement, (4) compliance with animal welfare regulations, and (5) reporting of animals excluded from the analysis.Probability scores of greater than 0.5 indicate that a measure is reported.The tool is python based (51), and we implemented this into our R based workflow using the reticulate R package.This tool was previously validated and found to achieve F1 scores of 0.82, 0.82, 0.83, 0.91, and 0.47 for random allocation, blinded outcome assessment, conflict of interest statement, compliance with animal welfare, and reporting of exclusions respectively.

Additional metadata
Often, newly identified citations may lack abstracts, DOIs, or other important metadata.To retrieve information that is missing form a citation record, we pull additional metadata from CrossRef and J o u r n a l P r e -p r o o f OpenAlex ( 52) databases (via openalexR (53) and rcrossref ( 49)).We also use OpenAlex to maintain a record of retracted studies.

A "living" workflow
Each week, we run an R script containing each step of the workflow to retrieve, screen, and tag new evidence as it emerges.Newly curated datasets are sent to the underlying AD-SOLES database, which feeds into the web application in real time.In this way, we are able to continually refresh AD-SOLES with minimal human effort.

Web dashboard
We created a web application using R Shiny to allow users to visualise, interrogate, and download subsets of the AD-SOLES database.The code underlying the shiny application is available on GitHub (54) and the website is openly accessible at: https://camarades.shinyapps.io/AD-SOLES/.

Data integrity and version control
All data is stored in a Postgres SQL database hosted on Amazon Web Services.Starting from June 2023, we deposit weekly database snapshots on the Open Science Framework following each search update (available at https://osf.io/8r3p7/).

Research included in AD-SOLES
As of this date (8 th June 2023), we have retrieved a total of 510,217 citations from across biomedical databases, of which 335,642 were considered by ASySD to be unique (see Figure 1).Following classification by the machine learning algorithm, 35,546 studies are included in the database.3,219 publications were removed from our pipeline as they are highly likely to be abstracts only, and 32 were removed as they were retracted.Of these included publications, we were able to retrieve 27,692 (77.8%) of the full texts.The machine classifier performs with an average sensitivity of 95.1% and an average specificity of 93.7%. Figure 2 shows performance across different machine classifier runs (#1 representing the 1 st time the classifier was applied and #30 representing the 30 th run).
The pace of publication of research in in vivo Alzheimer's disease models has grown substantially over time (Figure 3).

Study tagging validation
The random sample of 100 studies used to validate study tagging approaches were derived from a subset of 2,837 included studies that had at least one tag for model, sex, species, and outcome using each method (full text, tiabkw, model sentence) and had a tag for an intervention using the tiabkw method.Of the 100 selected studies, 2 were excluded from the tagging validation as they were conference abstracts (identified by human reviewers).Within each of the 98 studies assessed, there were often several classifications applied (e.g.multiple models).In total, across all tagging methods there were n=352 (model), n=134 (sex), n=190 (species), and n=240 (outcome) tags identified.For drugs, using only title, abstract, and keyword fields, we identified n=202 tags.Overall, tiabkw matches were less sensitive than other methods but highly specific i.e. if a model was mentioned in the title, keywords, or abstract, it was highly likely to be used for the experiments.Full text matches and model sentence matches were less specific, but are likely to be useful in addition to tiabkw matches for enhancing sensitivity.
For model, sex, and species, the best performing approaches were the tiabkw regex and searching within the extracted model sentence (Table 3).For model, the precision of the model sentence method was not as high as expected (0.793) but deemed good enough for application in AD-SOLES.For outcome measure, more than one mention in the full text or one or more mention using the tiabkw method were the most favourable approaches.The logic underlying the AD-SOLES application was guided by the optimal approaches (Table 4).  ) have been successfully tagged with at least one model, outcome, and intervention respectively using the optimal approaches.Sunburst plots visualising the number of publications in each category are shown in Figures 4-6.The most common model is APP/PS1 (Figure 4) described as "Generic APP/PS1" in the SOLES platform due to inability to distinguish between different APP/PSEN1 mutation models.The Morris Water Maze is the most commonly measured behavioural outcome (Figure 5), while Donepezil is the most commonly used treatment (Figure 6).
In the most commonly used model category (transgenic mice with APP + PSEN1 mutations), we see a continued preference to use male animals or a combination of male and female animals within experiments (Figure 7).
Using the interactive matrix functionality within AD-SOLES, it is possible to visualise overlap across tags.For example, looking across interventions which target the Cholinergic system (Figure 8), Donepezil has been tested in 5xFAD, APPSwe/PSEN1de9, and APP/PS1 (generic) models, while Tacrine and Huperzine A, and Galantamine have only been tested in a small number of experiments in APP/PS1 models.Currently, 21,980 / 35,546 articles in the AD-SOLES database have been assessed for risk of bias reporting.A subset of full texts have not yet been assessed due to large file sizes (>50000 bytes) causing memory issues when processing full texts.Reporting of conflict of interest statements (61.0%) and welfare approval (59.6%) was moderate across the dataset, while reporting of randomisation to experimental groups and blinded outcome assessment was low (20.0%and 21.3% respectively).Very few studies were found to have reported exclusion criteria for animals / datapoints (7.6%).

Downloading relevant research
Using the study tags, research users can download relevant citation lists from within the web application.The searchable study table (3) contains all of the citations present in the AD-SOLES database with associated metadata (Year, Author, and Title, with a link to the publication if available).Study feature tags which have been applied to each citation are also visible.Users are able to search the title, abstract, and keywords of included studies using Boolean AND/OR logic and filter results by model, intervention, outcome measure, and year of publication.To support the need for systematic reviews where it is essential a study is not missed, we have a "high sensitivity" toggle that can be switched on to include studies where an intervention, model, or outcome is mentioned anywhere within the full text.

Discussion
Data curation to support evidence-based research and discovery Based on our analysis of the AD-SOLES dataset, the evidence from AD animal models is continually expanding, presenting a mounting challenge for research users to keep up -to date with the emerging data.To address this, we have developed AD-SOLES to harness the full potential of existing data to inform future research; an openly available dashboard with an integrated workflow of automated tools to support curation.To date, we have identified over 35,000 publications likely to contain experimental data from AD models.Through synthesising this vast evidence base, we can reduce the burden on basic scientists to continually stay up to date with the latest research developments relevant to their line of enquiry.Using AD-SOLES, we aim to make it easier to grasp the quantity and quality of existing evidence in a specific animal model; for a specific intervention; or measured on a given intervention.

J o u r n a l P r e -p r o o f
We envisage benefits not only for laboratory researchers, but for research funders, charities, and other stakeholders who need to stay up -to date with the current research landscape.Those conducting literature reviews of specific areas (for example, a PhD student starting a research project, or a team of researchers aiming to conduct a meta-analysis) can use AD-SOLES as an accelerated starting point, enjoying the benefits of automation without the need to apply machine learning or other automation approaches themselves.
Going forward, we aim to expand AD-SOLES to encompass both in vitro and clinical data relating to Alzheimer's disease pathology and potential drug targets.It has been argued that promoting reproducibility of preclinical research alone is not sufficient, and that we need to triangulate the evidence from multiple approaches to answer one question (55).Through integration with drug discovery databases and the application of our existing natural language processing tools, we will look to map pathological concepts and drug targets across different levels of evidence to support target validation.Given the sheer volume of data available, we will leverage technological advancements to identify patterns and insights that would be challenging to identify manually.

Continuous monitoring of rigour and transparency
Through continuous monitoring of key aspects of rigour and reporting quality over time, AD-SOLES could provide insights into areas where improvement is most needed.From the data currently available, it is clear that measures to reduce the risk of bias are severely under-reported, in agreement with other recent reports in the preclinical AD literature (26).To facilitate efficient and cost-effective drug development, it is important that experiments are adequately powered and rigorously conducted to reach the correct conclusions on efficacy (56).It is important to note that a failure to report does not necessarily determine that an experiment was conducted without appropriate controls.However, by taking steps to improve conduct and reporting, we can more adequately assess the strength of the cause and effect relationship (internal validity) within an experiment.
Issues around the animal model used and the context they are being used in being too different from the human condition may also contribute to translational failures (57).In the dashboard currently, it clear that male rodents continue to be employed preferentially over female rodents for modelling AD.Variations in husbandry conditions, background strains, genetics, age, comorbidities, and a host of other factors may impact upon results (58,59).In future, we anticipate integrating tools to extract other important methodological details pertaining to the rigour, and transparency of AD experiments into the AD-SOLES pipeline.Efforts will concentrate around key criteria, such as those laid out in specialised guidelines developed to improve preclinical design and efficacy in AD (24)

J o u r n a l P r e -p r o o f
Through integration with OpenAlex, we hope to expand the capabilities of AD-SOLES to monitor the impact of research funded from different sources.As others have suggested (3), the vast amounts of grant funding provided for hundreds of translational AD projects in experimental models could be retrospectively evaluated to inform future decision making and encourage a more open dialogue between research stakeholders on the uptake of open research practices.

Reciprocity with systematic reviews and curation efforts
The development of AD-SOLES has relied heavily on existing annotated datasets from preclinical systematic reviews of Alzheimer's animal models.If automated tools are to perform optimally, on a par with a human reviewer, tool developers need as much training data as possible.Going forward, we hope to initiate a reciprocal relationship with researchers who conduct systematic reviews using our curated datasets.Annotated data could be fed back into the database, to be used for future tool development and validation.Where possible, we will also seek to align, integrate, and collaborate with other initiatives to curate dementia research, such as AlzPed ( 26), a manually curated database of over 1,000 preclinical AD experiments.
Given the utility of systematic reviews to identify new research avenues and guide research progress, we hope to foster and encourage a greater uptake of these approaches in the preclinical AD literature.

Limitations
An important consideration in the AD-SOLES workflow is the omission of potentially relevant records at multiple points.With the current machine classifier performance, we expect to include ~95% or relevant research and exclude >90% of irrelevant research.This trade-off was deemed necessary to ensure that the metrics displayed on the dashboard were specific to the AD literature.However, this could pose an issue under some circumstances when relevant records have not entered into the pipeline.In the near future, we plan to manually screen subsets of the publications which fall close to the boundary of inclusion to provide additional edge-case data to train the classifier.Currently, over 30% of relevant publications do not have openly accessibly full-texts, or full-texts which are not accessible under our institutional subscriptions.This prevents us from applying automated natural language processing tools to determine risk of bias reporting, open data reporting, and other study characteristics.There is still some way to go before all full-texts will be fully accessible via automation technologies.
The tagging of studies by model, intervention, and outcome measure requires additional validation and improvement to reach optimal performance.Most studies are not fully tagged, which may reflect that those experiments have experimental features which are not on our list, that the regex approach isn't sensitive enough to detect all instances, or that those studies are less relevant or do not use AD models.Many elements, including pathological outcome measures, non-rodent models, and novel therapeutics are not sufficiently managed by our current approach.While employing regular expression dictionaries is suitable for cases with limited variations, leveraging natural language models (NMLs) such as PubMedBERT (62) holds promise.Recent work to identify chemical entities within AD research found that significant performance gains were made when NLMs were combined with a dictionary-based approach (63).We recently developed and validated a NLM-based preclinical PICO (population, comparator, intervention, outcome) tool for this purpose (64).We plan to validate both dictionary and natural language processing-based methods for the extraction of key study characteristics versus gold standard human annotations.In the future, the widespread adoption of recognised ontology terms and identifiers such as research resource IDs (65), mouse genome database identifiers, or strain numbers would simplify this process and reduce reliance on J o u r n a l P r e -p r o o f increasingly sophisticated language models.Alternatively, requesting researchers to tag studies with a predefined list of characteristics during journal submission or publication could achieve a similar outcome.
Comparable to other software tools, the AD-SOLES pipeline and dashboard will require ongoing maintenance to ensure it remains usable, accessible, and up to date.We will engage with dementia research users and stakeholders to determine where more development is required, and will seek to acquire long-term funding to support sustainability.

Conclusion
AD-SOLES aims to provide a valuable resource for researchers, funders, and other stakeholders in the AD research community.Through the use of AD-SOLES, we hope to facilitate evidence-based research, promote rigour and transparency, and foster collaborative evidence synthesis projects within AD research.

Figure 1 :
Figure 1: Sankey flow diagram of publications currently in AD-SOLES database There were 1,221 new articles in 2010, 1,741 in 2015, and 2,312 in 2020.Since the start of 2023, we have already retrieved 2,679 new included articles (79.3% of 2022's total) as of 8 th June 2023.

Figure 3 :
Figure 3: Number of citations included in AD-SOLES per year

Figure 2 :
Figure 2: Performance metrics of machine algorithm for in vivo AD research over subsequent iterations J o u r n a l P r e -p r o o f

Figure 4 :
Figure 4: Sunburst plot of models in AD-SOLES Segment indicates the relative proportion of the literature in that category.J o u r n a l P r e -p r o o f

Figure 5 :
Figure 5: Sunburst plot of outcomes in AD-SOLES.Segment indicates the relative proportion of the literature in that category.J o u r n a l P r e -p r o o f

Figure 6 :
Figure 6: Sunburst plot of interventions in AD-SOLES.Segment indicates the relative proportion of the literature in that category.

Figure 7 :
Figure 7: Articles with APP/PSEN1 models stratified by sex of animals in AD-SOLES Note: The bar height reflects a downward trend in use of APP/PS1 models between 2021-2023, while the colour fill shows the number of publications using male and female animals

Figure 8 :
Figure 8: Matrix gap map with drugs targeting the Cholinergic system tested on APP/PSEN1 models in AD-SOLES Study quality and transparency At present, 31,245 out of 35,546 (87.9%) citations were findable (via DOI) in the CrossRef database.The proportion of open access publications has increased considerably since 2008 (Figure 10).Overall, 57.5% of publications in this dataset are open access.Stratifying open access (OA) status by type, we see the options of green OA (depositing accepted manuscript in open repository) and bronze OA (available via publisher but not formally licenced for re-use) publication became less popular, while gold OA (immediate, unrestricted access) publishing gained traction (Error!Reference source not found.).

Figure 9 :
Figure 9: Number of open access publications over time in AD-SOLES.Green bars indicates that a paper is open access; grey indicates closed access.Data source: CrossRef linkage with n= 31,245 articles

Figure 10 :
Figure 10: Overall percentage of publications reporting key measures to reduce the risk of bias in AD-SOLES.Tool: ROB preclinical tool (50) applied to N =21,980 full text articles.

Table 2 :
Performance metrics for machine classifier

Table 3 :
Results from AD-SOLES study tagging validation TP: true positive, FP: false positive, TN: true negative, FN: false negative.N/A cells indicate that the measure cannot be calculated.Rows highlighted in blue indicate optimal approaches.

Table 4 :
Optimal logic for AD-SOLES study tagging Models, outcomes, and interventions Of the 35,546 studies included, 20,670 (58.2%) 16,390 (46.1%), and 20,446 (57.6% (61)the latest version of the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines.This could, in future, feed into to the development of specialised "living" guidelines (60), research improvement targets, and initiatives to maximise the validity, transparency, and reproducibility of in vivo experiments, while minimising research waste.The dashboard currently shows an upwards trajectory in open-access publications, with a greater proportion being fully open access than ever before.However, it seems that data sharing and code sharing are still not commonplace.In light of the recent AB*56 controversy, where a prominent researcher was accused of fabricating data derived from AD mouse models(61), transparency in how an experiment was conducted and analysed is paramount.Over the entire dataset, it is concerning that less than 1% of publications reporting open data or open code availability statements.However, it is extremely encouraging to see the continual trend of increasing data sharing in over the last 10 years (from 3% in 2013 to 18% of the dataset in 2023).