Improving laboratory animal genetic reporting: LAG-R guidelines

The biomedical research community addresses reproducibility challenges in animal studies through standardized nomenclature, improved experimental design, transparent reporting, data sharing, and centralized repositories. The ARRIVE guidelines outline documentation standards for laboratory animals in experiments, but genetic information is often incomplete. To remedy this, we propose the Laboratory Animal Genetic Reporting (LAG-R) framework. LAG-R aims to document animals’ genetic makeup in scientific publications, providing essential details for replication and appropriate model use. While verifying complete genetic compositions may be impractical, better reporting and validation efforts enhance reliability of research. LAG-R standardization will bolster reproducibility, peer review, and overall scientific rigor.

also recommend core documentation on animal model characteristics, such as genetic modification status, genotype, and manipulated gene(s), as well as genetic methods and technologies used to generate and validate the animals 8 .However, in addition to environmental factors (sanitary status, diet, etc.), the genotype-phenotype relationship can be markedly influenced by the genetic background of experimental animals 9,10 and the reproducibility of the genotype-phenotype relationship is significantly impacted by breeding paradigms, source colony, and genetic drift 11 .Even differences in genetics, too often perceived as subtle, for example, the differences between C57BL/6 substrains in mice, can have a significant impact 12 .Therefore, a need remains for a more comprehensive description of the genetics of research animals to enable data interpretation and reproducibility.Such information is also needed to allow other scientists to acquire, maintain, and use experimental models for investigations building on published data.Here, we propose a framework of reporting guidelines, complementary to the ARRIVE guidelines, to support the documentation in scientific publications of the genetics of animals used in research.To be clear, this framework does not aim to impose standardization of animal genetics 13 , but, rather, to improve the documentation associated with animals used for scientific research.It is intended to be applicable to all animals used for research.
Our proposed framework applies to the full range of animal species used in life-science research and, in the case of genetically engineered animals, to those modified by either classical or current methods of genome engineering.Here, we discuss how this reporting framework is designed to document the genetic background and validation (defined here as the act of verifying) of animal models and to link this information to infrastructures that support the community in sharing data and materials.We also examine the fundamentals of genetic validation for animal models and we present the role of supporting frameworks for reporting animal genetics.We call these recommendations the Laboratory Animal Genetic Reporting (LAG-R) guidelines.By standardizing information, the LAG-R guidelines will improve the sharing and replicability of models across research teams and will guide peer reviewers in their assessment of the essential genetic information for animal models provided in manuscripts.

Reporting genetic backgrounds and genetic alterations
The limitations of current standards for comprehensive reporting of the genetics of animals in research publications have been the subject of many discussions, both informal and structured, among relevant research societies and consortia in recent years.The lack of standardization in reporting currently results in a deterioration of information regarding research animal models, particularly as they transition between different laboratories.This has many negative consequences, particularly if the animal model was not fully described in the initial publication and subsequent breeding records are partial or absent.This can prevent re-analysis of data, as fundamental and basic variables on research materials are not documented.It can also lead to misinterpretation of published studies.In addition, imprecise definition of genetic background may lead to the use of experimental animals with different genetics and phenotypes in subsequent studies, which is a known contributing factor in true or perceived irreproducibility of research 14 .This wastes significant research resources 15 , including those used to reconstitute missing information on both genetic background and genetic alterations 2 , as well as those used to re-establish genotyping assays 16 .The worst possible consequence can be the waste of experimental animals.By contrast, appropriate documentation and reporting will contribute to reduction and refinement practices in keeping with the 3Rs 17 and will result in better management of animal use 6 .
The development and implementation of documentation and reporting standards of laboratory animal genetics present an opportunity to improve research reproducibility.Here, we propose two sets of features to be documented (Table 1): the first is applicable to all laboratory animals; the second, with additional criteria, is to document the genetic alterations of engineered and other genetic models.
The genetic background of animals can be described by species 18,19 , strain 9,20 , sub-strain 12 , breed/stock 21 and breeding history (to trace contamination as an impact of the breeding scheme and genetic drift) 22 .Standards for their descriptions, such as the use of species-appropriate nomenclature conventions, are already defined.These are major intrinsic factors that have biological impact, and that must be fully documented to strengthen research reproducibility, complementing the ARRIVE guidelines.Table 1 summarizes the minimal information needed to correctly identify species and lineages.Criteria 1-5 report on information that is part of animal records in the laboratory.Criterion 6 reports the documentation of genetic assays that have recently become more accessible, both practically and financially, for many animal species.This last item validates the information provided in items 1-5.Examples of documentation are shown in Supplementary Table 1.
The genetic alterations carried by laboratory animals can affect phenotypes and require documentation.However, they are rarely fully documented, for both technical and historical reasons.Naturally occurring alleles or mutations obtained by chemical mutagenesis (e.g., ENU) require significant work to be defined [23][24][25][26] .Alleles obtained by gene targeting in embryonic stem (ES) cells are typically described by a schematic, with the full sequence of the targeting event rarely provided (with the particular exception of alleles that are produced by some high-throughput programs).Genome editing requires careful validation (including by sequencing) of the resulting allele (for example 27 ), but documentation of the sequence of the entire region of interest is typically not provided.Furthermore, a consensus on the criteria for molecular validation of genetically engineered animals is yet to be defined.This is of particular importance because methods for both generation and validation of mutations are evolving rapidly, and new methods to validate larger regions of interest and identify both discrete and structural variations in genomes are emerging (for example, refs.28-30).Of as much importance as the genetic background, the documentation of genetic alteration(s) and their validation represent essential areas for improvement in research reproducibility.The second part of Table 1 presents a set of criteria that should be used to describe genetic alterations in laboratory animals, which includes information about experimental design and confirmatory validation data.Examples of documentation for a genetically altered line and a checklist to support reviewers in assessing the documentation of genetics are shown in Supplementary Tables 1 and 2, respectively.

Fundamentals of genetic validation for animal models
Validation of genetics refers to the verification of the overall genome for all animal models and, in the case of genetically altered animals, to the characterization of a specific region of interest.In some instances, the initial step will be to ascertain the most accurate taxonomy of the model at hand.In other contexts, the aim will be to check the pedigree of the stock.Specifically, inbred and outbred models present different challenges, as inbreeding aims to maintain genetic stability, whereas outbreeding aims to maintain genetic diversity 31 .Variations can be discrete alterations or structural changes that are likely to accumulate over time, and which should be identified and/or managed whenever possible.

Genomic stability and quality
The genomes of animals change over their lifetime 32 and with each generation 33,34 : both the accumulation of natural mutations (with fixation of variations resulting in genetic drift) and the modification of allelic diversity as a result of crossing patterns (inbreeding for outbred lines or contamination of lines by other backgrounds) could affect the genome composition of a laboratory animal 31 .For example, MMRRC UNC has performed a preliminary analysis of 230 lines, and they have estimated that approximately 40% of these lines do not match their name.The most common discrepancies are lack of congenicity, inbreeding, or the presence of additional genetic backgrounds (information from F.P.M.V., MMRRC).In addition, contaminating transgenes or unexpected altered alleles are also observed but at a low frequency.Good documentation and breeding strategies play an essential role in managing the quality of the genetic background of laboratory animals 35,36 (Table 1), but additional techniques for capturing genetic variation are becoming available and affordable for many animal models (Supplementary Table 3).In particular, the Mouse Universal Genotyping Array (MUGA) panels can be used to verify the presence of many commonly used constructs, as well as to corroborate the composition of the genetic background of mice 22 .Similar panels are available for other species 37,38 .Similarly, quantitative polymerase chain reaction (qPCR) or digital polymerase chain reaction (dPCR) assays can be designed to detect common constructs used in a field of research or in a particular species.With the development of new technologies to evaluate genetic quality, a full genome assessment (including an understanding of the frequency of both discrete and structural variations) will become more accessible.Different laboratory animal species and scientific questions call for a different depth of genetic validation.However, in all cases, the more complete the validation of the genetics of the animal models, the more reliable and reproducible the experiments will be.Advances in the understanding and documentation of genetic variation do not prevent the occurrence of genetic changes during animal breeding but they allow researchers to monitor such changes and to re-evaluate phenotypes with the knowledge of newly described causative or modifying variants 39 .However, although genetic control is complementary to good practices in animal colony management, it has limited power when downstream crosses are made without rigor.Finally, many other factors affect the reproducibility of a given experiment, and a number of these are already covered in the ARRIVE guidelines 6 .
For a small number of laboratory animal species, advanced frameworks designed to manage their genetics already exist, mainly as a result of the length of time and the context in which they have been studied.These frameworks include knowledge of the species' sequences and pedigrees, and support structures dedicated to maintaining genetic integrity.In those cases, reporting, traceability, and control of the genetic background of animals are even more essential.Where possible, crossing with wild-type reference animals is good practice to reduce de novo variations and construct contaminations within the genome.For example, current practice in mice is to backcross for two or three generations, depending on the level of inbreeding (https://www.jax.org/news-and-insights/jax-blog/2018/April/how-to-refresh-your-mutant-or-transgenic-mouse-strains 40 ).Likewise, to avoid inbreeding depression in zebrafish stocks, each new generation should be produced by an outcross, and crosses between siblings should be performed only when absolutely necessary [41][42][43] .Although practices may vary across research communities and as fields evolve, the traceability of breeding patterns remains important in all circumstances.
Furthermore, all genome engineering techniques have the potential to introduce additional genetic changes while modifying the region of interest: both chromosome number and structure vary in embryonic stem (ES) cells when cultured 44 ; both gene targeting and genome editing have the potential to insert additional copies of the donor template away from the target 45 ; random insertion of DNA (transgenics) can introduce structural variation at the site of insertion 46 ; and nucleases used in genome engineering activities are not entirely specific and can cause off-target variation 47 , though this is rare when specific design practices are used and must be evaluated in the context of known variation in the animals being studied 48 .Techniques have also been developed to identify these unwanted events 49,50 .These are discussed in the context of genome engineering validation in the next section.

Assessing the genetic alterations
Genetic engineering was once restricted to a limited number of laboratory animal species 33 , but as a result of advances in genome-editing techniques, there is now almost no limit to the species that can be genetically engineered.Standards for validation are evolving in parallel with the technology.
Different modes of alteration have differing potential for unwanted outcomes.For example, whereas random insertions are the aim of additive transgenesis, other engineering technologies aim to target a specific locus.Therefore, no universal recommendation can be made for the validation of a genetic modification.When a specific locus is targeted, validation should aim to characterize the genetic change at the region of interest.In all cases, genetic changes resulting from the engineering method should be assessed throughout the genome.As for maintaining genomic stability and quality, multiple crosses to the reference genetic background will mitigate the potential genome-wide impact of genetic engineering and should be reported (Table 1).For targeted events, both the sequence of the region of interest and the local structural integrity should be examined, the latter for exclusion of deletion, duplication, and inversion events.Supplementary Table 4 lists the various molecular assays that can be employed to interrogate these two aspects.Ideally, genetic quality would be regularly assayed, but importing or onboarding animals is the most important point at which to check the quality of newly acquired or generated models.A combination of methods that elucidate both the sequence and structure of the target locus and the genetic makeup of samples (Mendelian, mosaic, or chimeric animals) is required.The methods used will depend on a number of factors, such as the laboratory setup, whether a donor sequence was used in the mutagenesis process and the length of the modified segment.For example, point mutations are easily characterized using Sanger or next-generation short-read (NGS) sequencing, whereas verification of very large knock-ins is likely to require a long-read-based sequencing approach.The functional characterization of the products of mutated genes is also an important aspect of research quality but is outside the scope of these recommendations.

Further genome validation following genome engineering
All mutagenesis techniques have the potential to generate unpredictable and/or additional changes in the genome, outside of any region of interest, and these can be transmitted through generations.These may be discrete mutations 51 , additions, insertions 52 , or structural rearrangements (including chromothripsis 53,54 ), in addition to naturally occurring genetic variation, as discussed above.It is essential to be aware of the occurrence of these nonconforming events to ensure the correct interpretation of results and research quality.A number of technologies and simple assays can be employed to screen animals for the presence of off-target events (including random integrations, which can be detected by dPCR), and these are summarized in Supplementary Table 5.However, some molecular changes have no recognizable pattern, meaning that no specific genotyping assay can be designed, or may affect difficult-to-characterize features, such as large segments or repeated sequences.Changes of this type will, therefore, require more sophisticated investigations, such as nextgeneration sequencing (Supplementary Table 5).No single technology yet allows for the unbiased and complete acquisition of the sequence of a whole genome 47,55 .

The role of supporting frameworks for reporting animal genetics
There are numerous supporting frameworks for standardization initiatives that facilitate the knowledge and management of the genetic quality of laboratory animals.These include nomenclature guidelines, as well as repositories of information (such as ontologies, research data, metadata, and annotations) and materials.
Advanced systems of nomenclature are continuously being developed to describe animals and genes in a standardized fashion.Taxonomy resources include the National Center for Biotechnology Information taxonomy database 56 .In addition, the Vertebrate Gene Nomenclature Committee assigns standardized names to genes in vertebrate species that currently lack a nomenclature committee, ensuring that genes are named in line with their human orthologs 57,58 .These resources are essential to develop a common and unambiguous vocabulary with which to name genetic models and characteristics.They support the continuous refinement of nomenclature systems in sync with the evolution of animal models and molecular tools so that nomenclature remains compatible with state-of-the-art research.
The use of most laboratory models is supported by dedicated databases that aggregate genetic and phenotypic information and that link to other resources, such as sequencing databases, scientific publications, and animal model repositories (see examples in Supplementary Table 6).Researchers have a role to play in registering new animal models to publicly accessible databases, thus helping to avoid the generation of lines that already exist.Commercial breeders also distribute information on the biology of the animals they produce.Additional information with a focus on animal welfare can be collected through the establishment of identity cards 59 .
The integrity of genetic model materials is preserved through repositories that archive and distribute animals, gametes, and embryos, as well as plasmids and ES cells.These support structures are federated in international networks that collaborate to ensure the availability of quality-controlled materials to the research community worldwide.The collections available in these repositories can be interrogated at their individual web portals or through web pages that allow querying of the entire repository network to locate and source animal models (https://www.alliancegenome.org/ 4 ).Together with academic and commercial research animal breeders, these repositories play a crucial role in ensuring the genetic quality and stability of laboratory animals and the reproducibility of research that employs animal models.
Acquiring knowledge of appropriate standards of documentation, with the ability to understand and employ these, is an integral part of scientific training.This includes a knowledge of genetics.Beyond formal education, many web resources and training opportunities are available (e.g., https://oacu.oir.nih.gov/training-resources;https:// www.aalas.org/iacuc/iacuc_resources/training-programs;https:// resources.jax.org/).In this respect, learned societies, breeders, and repositories of laboratory animals are important sources of information and educational material.
Finally, the FAIRsharing portal aims to aggregate the resources that support standardization in the life sciences (https://fairsharing. org 7 ).Likewise, learned societies and dedicated consortia play an essential role in establishing these research-support frameworks and in facilitating the training of researchers to understand and manage the challenges of using laboratory animals for reproducible research.

Perspectives
Recognizing concerns about reproducibility, the LAG-R guidelines aim to standardize the information about animal models in scientific reports.This is becoming increasingly important as the diversity of laboratory animals expands, along with new methods and designs for the generation of genetic alterations and for in-depth characterization of genomes.However, we must not ignore that there are barriers to overcome.In particular, it requires a consensus within the community, greater expertise in genetics, and additional editorial work on the part of authors, reviewers, and editors.
It still does not seem realistic, or even possible, to fully validate the entire genetic composition of every animal used in research.On the other hand, improved reporting of all available information regarding the genetic makeup of laboratory animal models and on which validations have been carried out will allow us to better reinforce and evaluate the reliability of animal experiments.More in-depth animal model validation is increasingly feasible but requires specific expertise and the availability of dedicated funding, two aspects that will require significant investment.
Going forward, it is for the community to improve laboratory animal genetic reporting and the LAG-R guidelines will help to facilitate this, but only with the commitment of scientists, funding bodies, journals, reviewers and editors.

Table 1 |
Minimal information needed for correct identification of species, lineages, and genetic alterations Criteria to report on for all research animals a Official name of species, strain, and sub-strain, as applicable, of the animal.Alternatively, for farm animals, indicate breed https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi.Describe type of breeding for strain/stock/breed using species standard wording.For example, in rodents: outbred, inbred, hybrid, congenic b , documented mixed or non-documented genetic background.For example, in zebrafish: specify which wildtype background (AB, TU, TL); if mixed background, provide a clear explanation of the breeding scheme and estimation of the percentage of each background.Specify breeding schemes used to maintain stock and generate experimental animals.Include the genotype of the parents when possible.This is particularly important to trace the origin of sex chromosomes in congenic strains b .Specify breeding strategy to maintain genetic quality of the colony; indicate known family tree.Name origin of strain(s).Name supplier or repository or other origin of animals used in the experiment.Indicate if, when (at what breeding generation) and how the genetic background was verified (i.e., sequencing, SNP (single-nucleotide polymorphism) panel, STR panel, genetic testing chip panel).Detail the shorthand used in article, and official nomenclature d .Use unique identifier (e.g.MGI ID) when applicable.Detail whether allele is a frameshift, deletion, coding or non-coding variant, overexpression, conditional allele, humanization, reporter, structural variation.Detail new gene product if known.If done, describe the precise method used for validation of chromosomal or allele structure, and the outcome g .If done, describe the method and outcome of analyzing the material for additional integrations of donor templates e or reintegration of deleted segments g .Evaluation of potential off-target activity 21Genome editing off-target is defined as a genomic position and/or nucleic acid sequence distinct from the target.If done, describe the method, selection criteria and outcome of off-target analysis.Congenic strains are examples of the importance of correct breeding scheme, as their genetic composition varies according to the parental origin of the sex chromosomes and the mitochondrial genome.In addition, the identity of the region around a transmitted allele remains that of the original strain.Guidelines for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat are described at https://www.informatics.jax.org/mgihome/nomen/gene.shtml.
DNA sequence 11Provide access to the sequence of the genetic modification: targeting vector, donor template or vector for transgenesis.If employed in the mutagenesis process, provide the sequence of donor (e.g., targeting vector, oligodeoxynucleotide, transgene or template sequence used for mutagenesis; DNA or prime editing guide) e .Annotate genomic sequences with corresponding genome assembly version and coordinates.Use universal format; i.e.,.fasta or.gb.Annotate features.Allele schematic 12 Consider presenting a map of the genetic modification.Material availability/source of materials 13 Describe how to access available materials (plasmids, mutant cells, animals and/or germplasm).RRID and/or repository identifier.Obvious phenotype and welfare concern 14 Specify salient phenotypes, such as issues with viability and/or fertility, or immunodeficiency.Describe the severity of the associated phenotype.If necessary, include any requirement to mitigate welfare concerns.Include publication, archive or database reference if available.For mice, consider SHIRPA (SmithKline Beecham/Harwell/Imperial College/Royal London Hospital/phenotype assessment) description 60 .Initial reference 15 Detail whether report is the initial description of mutant and/or mention initial publication of materials.Genotyping assay 16 Describe assay and sequence of primers used for genotyping of established colony.Enzyme and other reagents used for genome engineering 17 Describe enzymes (nuclease, recombinase) if used to generate mutation including number and sequence of guide(s) for ribonucleoproteins if relevant.Detail reagents f .a Essential criteria are indicated in bold; recommended criteria are indicated in italic.The information itself, or a reference to a source, should be detailed.b c Definitions and guidelines for nomenclature of mouse and rat strains are described at https://www.informatics.jax.org/mgihome/nomen/strains.shtml.d e Note that donor sequence can differ from mutagenesis outcome.f Some recommendations for genome-editing formulations are reported in ref. 61.g If not done, indicate that this assay was not performed.Perspective https://doi.org/10.1038/s41467-024-49439-yNature Communications | (2024) 15:5574