Cognitive deficits in human ApoE4 knock-in mice: A systematic review and meta-analysis

Apolipoprotein-E4 (ApoE4) is an important genetic risk factor for Alzheimer ’ s disease. The development of targeted-replacement human ApoE knock-in mice facilitates research into mechanisms by which ApoE4 affects the brain. We performed meta-analyses and meta-regression analyses to examine differences in cognitive performance between ApoE4 and ApoE3 mice. We included 61 studies in which at least one of the following tests was assessed: Morris Water Maze (MWM), novel object location (NL), novel object recognition (NO) and Fear Conditioning (FC) test. ApoE4 vs. ApoE3 mice performed significantly worse on the MWM (several outcomes, 0.17 ≤ g ≤ 0.60), NO (exploration, g = 0.33; index, g = 0.44) and FC (contextual, g = 0.49). ApoE4 vs. ApoE3 differences were not systematically related to sex or age. We conclude that ApoE4 knock-in mice in a non-AD condition show some, but limited cognitive deficits, regardless of sex and age. These effects suggest an intrinsic vulnerability in ApoE4 mice that may become more pronounced under additional brain load, as seen in neurodegenerative diseases.


Introduction
Alzheimer's disease (AD) is the most common dementia type and a growing global health concern with a significant economic burden [13,68].The disease is progressive, severely affecting the independence and quality of life of patients and their families [68].While a small proportion of AD cases is hereditary, the vast majority results from an interplay of genetic and environmental factors [8].The heritable form of AD, known as familial AD, is primarily caused by autosomal dominant mutations in genes involved in amyloid-beta processing.It manifests at a relatively early age, before the age of 65.However, in over 95 % of all AD cases, the cause of the disease cannot be traced back to a single gene [13,14,74].These are sporadic AD cases.From here onward, we use the abbreviation AD to refer specifically to sporadic AD cases.
Several non-genetic risk factors for the development of AD have been identified.The main risk factor is advanced age.Other non-genetic risk factors include lifestyle factors such as physical inactivity, poor nutrition, smoking and low educational level [35,64].
The primary genetic risk factor for AD is the Apolipoprotein-E (ApoE) ε4 allele [8,70].ApoE, is a lipid transporter involved in reverse cholesterol transport in the periphery and functions as the main cholesterol transporter of the brain [34].The ApoE gene is polymorphic.
Compared to carriers of only the neutral ApoE ε3 (ApoE3), heterozygous ApoE ε4 (ApoE4) carriers have a two-to-threefold increased risk of AD and homozygous carriers have a twelvefold increased risk [22].Moreover, the age of onset of AD is lower; 7-9 years earlier for heterozygous carriers and double that for homozygous carriers [21].Additionally, ApoE4 carriers are at higher risk for hypertension, cardiovascular diseases and cerebrovascular diseases [41,45,63].The AD risk associated with ApoE4 is more pronounced in women than in men [2].Conversely, the ApoE ε2 (ApoE2) allele is protective against AD, but is linked to hereditary type 3 hyperapolipoproteinemia [54].For a comprehensive review of associations between ApoE and AD or other pathologies, see Belloy et al. [6].
Surprisingly, the mechanistic role of ApoE4 in AD remains poorly understood.The relation between ApoE and AD could be explained by direct involvement of amyloid pathology, but ApoE is also implicated in tau pathology, synaptic plasticity, lipid-and glucose metabolism, (neuro)inflammation and vascular function [105].A meta-analysis of data from cognitively healthy older adults indicate that ApoE4 carriers showed worse cognitive performance than non-carriers with larger differences emerging with age [96].In line with this, longitudinal data revealed accelerated decline in general cognition in female ApoE4 carriers and accelerated memory decline in male carriers [56].This suggests that ApoE4-related risk might result from changes not directly involved in AD pathogenesis, but rather pose a risk or vulnerability for AD onset.In either case, a better understanding of the functioning of ApoE in its various isoforms can help to understand the mechanisms underlying AD.
Transgenic animal models are valuable to investigate various pathological aspects of AD, but the ApoE genotype has received relatively little attention.This is partly due to the focus on the protein aggregation hypotheses of AD, which have led to the development and investigation of animal models based on amyloid plaques or tau tangles.The development of targeted-replacement human ApoE knock-in mice has provided new opportunities to investigate the function of different ApoE alleles and their effects on cognition [86].Mice are normally monomorphic for ApoE, and murine ApoE is thought to behave like human ApoE3, making it impossible to investigate the effects of various isoforms of ApoE in normal mouse strains.In targeted replacement ApoE mice, the human isoforms of ApoE are knocked-in to replace the murine ApoE without disrupting any known regulatory sequences.
Recently, mouse models have been developed that combine autosomal dominant mutations associated with familial AD with human ApoE isoforms [100].These models allow researchers to explore interactions between ApoE and plaque or tangle associated neurodegeneration.However, this does not reduce the necessity to investigate the function of different ApoE isoforms in an otherwise unaffected brain.A better understanding of ApoE brain functioning outside the context of amyloid-beta and tau aggregation is essential to investigate the origins of AD.It remains unclear which processes cause AD or to what extent ApoE exerts interactive or independent effects explaining the risks of the ApoE4 allele [23,26,61,87].
Our primary aim was to summarise current knowledge about the cognitive performance of targeted replacement human ApoE4 knock-in mice, in the absence of AD pathology, through a systematic review and meta-analysis.We analysed differences in cognitive performance between ApoE4 and ApoE3 mice across various functional domains.Our secondary aim was to determine if age and sex moderate differences in cognitive performance between ApoE4 and ApoE3 mice, which can help to enhance the efficacy of future studies using this transgenic mouse model.

Methods
We followed the PRISMA guidelines (except for the assessment of the methodological quality) for reporting systematic reviews [66].This review was registered in Prospero under ID number CRD42019125030.

Search strategy
We conducted searches on two databases: PubMed and Web of Science.The final searches were completed on 14-12-2021.For both databases, the search terms were related to the population (mice or mouse), comparison (ApoE genotypes) and outcome (behavior, cognition or behavioral/cognitive test).
For PubMed, we used the search terms: Apolipoprotein For Web of Science, we used the search terms: Title: ("Apolipoprotein E" OR "Apolipoprotein E4" OR APOE*) AND Topic: (Cogni* OR "Object recognition" OR Learn* OR Conditioning OR Memory OR "Exploratory behavior" OR Anxi* OR Fear OR Impulsiv*) AND Topic: (Mice OR mouse).

Screening process and eligibility criteria
The screening of titles and abstracts was done independently by two authors (MBvdL and PR).In case of disagreement, full-text was screened and discussed with the other authors.The full text screening was done by one reviewer (MBvdL) and doubts were first discussed with PR and if persisting with the other authors.
Titles and abstract were screened using the following inclusion criteria: (1) Any human ApoE4 knock-in mouse model was used and (2) one or more outcomes were related to cognition.In case of doubts (e.g., ApoE4 was not explicitly mentioned, but ApoE genotype/polymorphism was; mouse model was not explicitly mentioned, but rodents were) we screened studies full-text.For our initial full text screening, we used additional inclusion criteria: (3) ApoE4 mice were directly compared with the neutral ApoE3 mice and (4) a standardized behavioral test related to cognition was assessed.Exclusion criterion was ( 5) not a primary study (e.g., review).In our final screening, we selected studies that used a behavioral test that was used in at least 5 studies; the Morris Water Maze test (MWM), Novel Location recognition test (NL), Novel Object recognition test (NO), contextual Fear Conditioning test (FC) and cued FC.We used no exclusion criteria related to research design or sex and age of the mice.

Data extraction
Data extraction was done by three authors (PMA, PR, and MBvdL).Uncertainties were discussed with the other authors.The following data related to study characteristics were extracted: Authors, year of publication, sex, age at test date, tests assessed (restricted to MWM, NO, NL, contextual FC and cued FC), number of mice per genotype (ApoE3, ApoE4) and if study design was longitudinal.Test specific outcomes were extracted for ApoE3 and ApoE4 separately and, if applicable, by sex and/or age and/or test moment.In the case of therapeutic/detrimental interventions or in the context of another transgenic alteration, data from the control groups were extracted.For the MWM protocol data was extracted (pool or maze diameter, target platform diameter, and temporal aspects of sessions and trials).As outcomes, swimming speed, acquisition distance and time to platform (first day and last day) and retention percentage distance and percentage time in target quadrant (24 hr and 72 hr after the last acquisition trial) were extracted.For the NO and the NL test conditions, percentage exploration time and recognition index were extracted.Finally, for the FC test, we extracted test characteristics, baseline motion index, percentage cued freezing time and percentage contextual freezing time.If only graphed data were presented for the outcome variables, we extracted data points (means with standard errors or standard deviations) with a validated web-based tool (GetData Graph Digitizer, [28]).

Risk of bias
Two authors (MJGvH and EAvdZ) independently assessed the risk of bias using the SYRCLE's risk of bias tool for animal studies [38].This tool consists of 10 items.Three items (1.sequence generation, 3. allocation concealment and 5. blinding investigators) were considered not applicable in the context of this review.These items refer to the experimental intervention part of the study, while only baseline data were extracted.For item 2. (baseline characteristics), we considered age and sex as potential confounders.For item 10. (other sources of bias), we considered three sources of bias: a) for the MWM, the water temperature (19-25 degrees of Celsius) should be mentioned as too cold or too warm water potentially induces a stress response and hampers learning, b) for the NL and/or NO, the type and dimensions of the objects should be mentioned because it matters for the duration of exploration if mice are able to climb on it or not, and c) for the FC whether the pre-shock values of freezing are provided as baseline differences in response to the apparatus influence post-shock freezing values.For the remaining items we used the signaling questions as described in [38].

Data synthesis for the meta-analyses and meta-regression
We used Review Manager Software (Revman) 5.4.1 [19] to carry out a meta-analysis for each test and selected outcomes separately.For the MWM test, we selected for acquisition distance to platform (first day and last day) and time to platform (first day and last day) as outcomes and for retention the percentage of time in the target quadrant after 24 hr and after 72 hr.For the NO test and the NL test, we selected percentage of exploration time and recognition index as outcomes.Finally, for the FC test we selected percentage of cued freezing time and percentage of contextual freezing time as outcomes.If a range is given for the sample size (e.g., n=10-12), we used the mean of the minimal and maximal n for further analysis.We used the inverse variance method with a random-effect model.Hedges' adjusted g effect sizes with 95 % confidence intervals (CI) were calculated for each continuous variable (benchmarks 0.20, 0.50 and 0.80 for respectively small, medium and large effects).We calculated pooled effect sizes for the total group and subgroups by gender (male-only, female-only or mixed samples).In the case of unknown gender, we classified the study into mixed samples.
I 2 is interpreted as the main heterogeneity measure with the following bench-marks: 0-40 not important, 30-60 moderate, 50-90 substantial and 75-100 considerable heterogeneity [36].To explore publication bias, we made funnel plots for each outcome measure using Revman 5.4.1.Finally, we used IBM SPSS Statistics version 28.0.1.1 to execute a meta-regression to examine if sex and age relate to effect size.For this analysis, we selected the male-only and female-only subgroups and considered for each sample the mean age or, if studies only reported an age range, the median of the age range.We used a random-effects model with restricted maximum likelihood estimators, sex as factor, age as covariate and weights according to the reciprocal of the square of the standard error and expressed as percentages.We tested with an alpha of 5 %.

Study selection
Fig. 1 shows the flowchart of the article selection process.The initial database searches resulted in 1542 hits of which 1203 were unique.342 articles were selected for full text screening with an agreement between the independent reviewers of 97.2 %. 62 articles were included in this review and data of 61 studies were synthesized in the meta-analyses.

Study characteristics
Table 1 shows the characteristics of the included studies.The 61 studies included about 1034 ApoE3 mice and 1125 ApoE4 mice.22 studies included only male subjects, 11 studies only female subjects, 25 studies used mixed samples and 4 studies did not report on gender.The age of the mice varied between 6 weeks and 25 months.The MWM test was assessed in 35 studies, the NL test in 9 studies, the NO test in 28 studies and the FC test in 17 studies (15 studies used the FC contextual test and 13 studies the FC cued test).

Risk of bias
The risk of bias of the individual studies are reported in Appendix A. Fig. 2 reflects the overall results of the risk of bias analysis.For the applicable items in the context of this review, risk of bias is generally low for item 2 (baseline characteristics) and item 9 (selective outcome reporting).For item 4 (random housing) and item 6 (random outcome assessment) risk of bias is unclear for all studies because of not reporting sufficient information.For item 7 (blinding outcome assessor), item 8 (incomplete outcome data) and item 10 (other source of bias) the results are mixed with low risk in about 29-36 % of the studies and generally unclear risk in the other studies due to not reporting sufficient information.

Test results in ApoE3 and ApoE4 mice
The test protocols of the MWM test are presented in Appendix B Table 1 and the individual study results for all reported outcomes in ApoE3 and ApoE4 mice are presented in Appendix B Table 2. Appendix B Table 3-5 show the individual study results for all reported outcomes in ApoE3 and ApoE4 mice for respectively the NL, NO and FC test.The individual Hedges' g effect sizes for the selected outcomes are given in Table 1 and Appendix C1-C12 present the forest plots.Table 2 shows an overview of the main outcomes of these forest plots for the total sample ("all mice", including male, female and mixed samples) and for the male and female samples separately.Table 3 presents the results of the meta-regression analysis.
For the MWM test, we analyzed acquisition (time and distance) on the first and last day and retention after 24 hr and 72 hr.On the first day of acquisition, the ApoE4 mice needed more time to reach the platform than the ApoE3 mice (significant for all mice and females), and this difference was still present by the last day of acquisition (now also significant for males) (Table 2 and Appendices C1 and C2; Hedges' g effect sizes around medium, except for males first day (small); heterogeneity not important, except for females last day (moderate)).However, there were no significant differences in distance covered to reach the platform at either time point during acquisition (Table 2 and Appendix C3 and C4, Hedges' g effect sizes small or less; heterogeneity varying from not important to substantial).During retention, the total sample of ApoE4 vs. ApoE3 spent significantly less time in the target quadrant 24 hr after the learning trial (Table 2 and Appendix C5; Hedges' g effect size small; heterogeneity not important).72 hr after the learning trials female ApoE4 vs. ApoE3 mice spent significantly less time in the target quadrant (Table 2 and Appendix C6; Hedges' g effect size small to medium; heterogeneity not important).
For the NL test, we analyzed the percentage of exploration time and the index scores.The percentage exploration scores were not significantly different for ApoE4 vs. ApoE3 mice (Table 2 and Appendix C7; Hedges' g effect sizes around small; heterogeneity not important).The male ApoE4 vs. ApoE3 mice scored worse (i.e.lower) on the index with a trend in significance (p = 0.07) (Table 2 and Appendix C8; Hedges' g effect sizes around small, except for males (medium); heterogeneity moderate, except for females (substantial)).For the exploration outcome the effect size was higher for the male-only sample than for the femaleonly sample with borderline significance (Table 3, p = 0.05).
For the NO test, we analyzed the percentage exploration time and the index scores.For both outcomes, the total ApoE4 sample performed significantly worse (i.e.lower percentage exploration time and lower index) than the total ApoE3 sample (Table 2 and Appendices C9 and C10; Hedges' g effect sizes small to medium; heterogeneity substantial), with a significantly worse performance on percentage exploration time for ApoE4 vs. ApoE3 males (Table 2 and Appendix C9; Hedges' g effect size medium, heterogeneity substantial).For the exploration outcome of the NO there was a trend of a higher effect size for the male-only vs. the female-only sample (Table 3, p = 0.080).
The contextual FC test showed a worse performance (i.e.lower percentage time) for the total and the male-only ApoE4 samples compared to the corresponding ApoE3 samples (Table 2 and Appendix C11; Hedges' g effect sizes respectively medium and large; heterogeneity substantial).There was a trend of a higher effect size for the maleonly vs. the female-only sample (Table 3, p = 0.061).
For the cued FC test, there were no significant differences in performance between ApoE4 and ApoE3 mice (Table 2 and Appendix C12; Hedges' g effect sizes small or less; heterogeneity moderate, except for females (not important)).
Finally, the Hedges' g effect sizes were not significantly related to age, which accounts for all tests (Table 3).

Publication bias
Fig. 3 presents the funnel plots for each test.Visual inspection may suggest (some) publication bias for MWM last day (distance), NL (% exploration), NO (index and % exploration) and FC (contextual and cued).

Discussion of results
Our primary aim was to investigate differences in cognitive performance between targeted replacement human ApoE4 and ApoE3 knockin mice.Additionally, we aimed to determine if age and sex moderate differences in cognitive performance between ApoE4 and ApoE3 mice.Below, we discuss observed differences in cognitive performance, their potential relationship to sex and age, and the implications for the field of ApoE research.

Differences in cognitive performance between ApoE4 and ApoE3 mice related to the hippocampus?
The MWM test revealed a notable effect of the ApoE4 allele.During MWM acquisition, a worse performance was found for ApoE4 vs. ApoE3 mice.ApoE4 mice, particularly females at the beginning of training and both sexes at the end of training, needed more time to find the platform.
However, such a difference was not present for the distance the mice were swimming.This suggests that the reported performance difference may not reflect a functional difference in spatial learning and hence hippocampal functioning per se.One explanation is that ApoE4 mice simply swim slower.In that case, this difference would not reflect a functional difference in spatial orientation and hippocampal functioning, but rather a difference in swimming speed.Thirteen studies reported on swimming speed, often measured during a free-swimming trial with the platform in sight before the actual training started.Of these thirteen studies, five studies [9,72,89,99] reported that ApoE4 mice were (somewhat) slower, whereas five studies [12,31,50,90,97] reported that they swum (somewhat) faster.Two studies [47,49] revealed identical swimming speed, and one study (Chunsun [39]) reported an age-dependent difference: young (3 months of age) ApoE4 mice swum slower than young ApoE3 mice, but old (17 months of age) ApoE4 mice were faster than ApoE3 mice.Four studies reported both time and distance, but not swimming speed.For these studies we  Abbreviations: MWM = Morris Water Maze; FC = Fear Conditioning calculated the swimming speed and found in all studies that ApoE4 mice were slower [71,75,101,102].Taken together, it is clear that there is no systematic difference in swimming speed (5 studies ApoE4 faster, 9 studies ApoE3 faster, 2 studies no difference and in one study it depended on age), so the longer time to reach the platform for ApoE4 mice cannot be fully explained by the differences in swimming speed.An alternative explanation for the longer time is that ApoE4 mice more regularly slow down (or float) to orient themselves to compensate for reduced spatial mapping function of the hippocampus.Indeed, a recent study [4] showed that ApoE4 mice, especially females, showed considerably more winding behavior (tight 360 • turnings) in their swimming path, which most likely slows down the mouse.Also, in this way mice can try to compensate for worse spatial orientation in finding the platform.To better understand the differences in the swimming speed found between studies, future MWM studies should pay more attention to swimming patterns of ApoE mice (and especially ApoE4 mice), as also suggested by Badea and coworkers [4].
Overall, the findings in the MWM test suggest that ApoE4 mice may have a small deficit in hippocampal function, since the hippocampus is strongly involved in the spatial aspects of MWM learning ( [25] (and references therein)).
The results of the contextual FC test align with the MWM findings.The contextual FC test revealed a worse performance particularly for male ApoE4 vs. ApoE3 mice.This test requires the association of a foot shock with the environment where the foot shock was received.This process heavily relies on hippocampal functioning ( [53] (and references therein)).However, no deficits were evident in the NL test, a test which depends on hippocampal function for spatial mapping ( [17] (and references therein)).The lack of significant differences in NL between ApoE4 and ApoE3 mice challenges the suggestion of hippocampal deficits in ApoE4 mice.The discrepancy may be partly due to lower number of studies available for the NL test (9 studies, no additional studies appeared after the search) and the resulting lower statistical power of the analyses.This explains why a medium effect size for males on the index score is non-significant.However, lower statistical power does not explain the absence of ApoE4 vs. ApoE3 differences in females.Therefore, it remains remarkable why the NL findings contrast the MWM and contextual FC findings.
The presence of stressful environmental factors may confound the assessment of ApoE4-specific hippocampal dysfunction.Both the MWM test and FC test are more threatening and stressful to the mice than the NL test, and may go together with more anxiety-like behaviors and higher corticosterone levels negatively affecting hippocampal function.Differences in anxiety levels may affect cognitive outcomes.Mice with high levels of anxiety behave differently in a (new) testing environment or towards a novel experience such as encountering an unknown object.The question is now whether ApoE4 vs. ApoE3 mice demonstrate higher anxiety levels.The best validated test for anxiety is the elevated plus maze (EPM; [37]).In this test, the time spent in the open arms versus the time spent in the closed arms relates to anxiety level with less time in the open arms indicating higher anxiety.Eight studies compared ApoE3 and ApoE4 mice in the EPM, using either 5 or 10 minutes of exploration in the maze.Most studies (4 out of 7) [16,50,51,104] did not find a difference between the genotypes, whereas three studies [85,89,93] reported higher anxiety levels in the ApoE4 mice.This discrepancy in the literature between the studies could not be explained by differences in age or sex.In addition, Salomon-Zimri et al. [76], who used a light/dark anxiety test, also found no difference in anxiety between the two ApoE genotypes.Similarly, the cued FC, also considered a stressful and anxiety-inducing test, -inducing, showed no differences between ApoE3 and ApoE4 mice.Altogether, the differences in results between MWM and contextual FC on the one hand, and the NL on the other, are probably not related to differences in anxiety-like behaviors and remain largely unexplained, although a difference in statistical power may play a role.Of note, the role of ApoE in modulating anxiety levels [69,70], notably in interaction with other factors, is not disputed.
The performance in the NO test was worse in ApoE4 (male) mice as compared to ApoE3 mice.This test evaluates the ability to recognize familiar versus new objects, which inherently depends on the time the mice spend to explore the object.The amount of exploration time during the acquisition phase is significantly reduced in ApoE4 mice, which may hamper their ability to successfully recognize whether an object is familiar or new during the test phase.The role of the hippocampus in the NO test is under debate.Some studies observed reduced performance in the NO as a result of deficits in the hippocampus, but the majority did not (for an overview see [17] or [20]).So, it remains unclear whether the worse performance of ApoE4 vs. ApoE3 mice in the NO test is related to hippocampal deficits.An alternative explanation is that ApoE4-related deficits in the perirhinal cortex may contribute to the lower scores in the NO test.The perirhinal cortex, a brain region involved in visual perception and memory, plays a role in both the NO test [17] and MWM acquisition [25], but not in the NL test [17].Thus, the mild spatial memory deficits in the ApoE4 mice might not be limited to the hippocampus.
Other brain regions than the hippocampus contribute to the behavioral performance in the examined learning tasks.Next to the hippocampus, the amygdala is involved in the contextual FC test [53].However, it is unlikely that reduced amygdala functioning explains the worse performance of the ApoE4 vs. ApoE3 mice, given the lack of a difference in the cued FC test.For cued FC tests (in which a tone is often used which needs to be associated with getting a foot shock), the amygdala is essential (LeDoux, 2000).The prefrontal cortex plays a role in both MWM acquisition [25] and contextual FC (Lopresta et al., 2016), but not in the NO and NL tests [17].Consequently, it remains unclear whether ApoE4 mice suffer to some extent from prefrontal cortex deficits.
Summarized, the test results suggest some hippocampal deficits in ApoE4 mice, but also the perirhinal cortex may be affected.Whether the prefrontal cortex is affected remains unclear, while amygdala functioning seems relatively intact in ApoE4 mice.

Differences in cognitive performance between ApoE4 and ApoE3 mice related to sex and age?
Our meta-regression analysis (Table 3) did not show significant sex differences in ApoE4 vs. ApoE3 effects, although a few trends were noted (for NL exploration, FC contextual and NO exploration), with generally stronger effects observed in males compared to females.Notably, although not confirmed by the meta-regression, we observed a significant ApoE4 vs. ApoE3 effect for the MWM time first day score in females, but not in males (Table 2).If we restrict to the studies reporting the results stratified by sex [49,71,95], a stronger ApoE4 vs. ApoE3 effect for females was evident in only one study and limited to 16-month-old mice [49].Similarly, we observed a significant ApoE4 vs. ApoE3 effect for the MWM 72 hr retention score in females, but not in males (Table 2).This pattern was confirmed if we restrict to the studies reporting the results stratified by sex ( [49,71,72].However, this finding should be interpreted with caution given the limited number of studies in male mice.The meta-regression did not reveal any significant relationship between age and ApoE4 vs. ApoE3 differences in cognitive performance.Even if we look at the individual studies reporting results for different age groups [9,39,49,50,71,83], we observed inconsistent patterns within and between the studies.Overall, we cannot confirm that differences in cognitive performance between ApoE4 and ApoE3 are related to sex and age.This finding remains robust despite the heterogeneity among the studies.

Miscellaneous results
Even slight differences in genetic background can influence behavioral phenotype, including memory performance [58].This variability may contribute to the differences observed in behavioral testing across different laboratories.To address this, we performed some additional analyses and extracted the genetic backgrounds from the papers (data not reported).A slight majority (32 studies) reported C57BL/6 J as genetic background, 17 studies C57BL/6NTac (from Taconic), 5 studies Taconic with subsequent cross-bred other than C57BL/6 J, 4 studies C57BL/6, and 4 studies other or unclear genetic backgrounds.In most tests with a heterogeneity ≥ 40 %, the C57BL/NTac background was underrepresented (only 1 study) or absent.This was the case for the MWM acquisition distance on the last day (males); NL index (both sexes); NO % exploration (males); and NO index (both sexes).Only for the MWM acquisition time on the last day in females (heterogeneity 46 %), a more balanced pattern of genetic backgrounds is shown.Here we observed a non-significant (p = 0.167), but potentially relevant difference in effect size between mice with a C57BL/6 J background (4 studies, average Hedges's g effect size − 0.25) and a C57BL/6NTac background (6 studies, average Hedges's g effect size − 0.94).Thus, genetic backgrounds other than C57BL/NTac may have contributed to heterogeneity in a limited number of tests.For only one outcome, the MWM acquisition time on the last day in females, we do have indications that the genetic background may have systematically affected the outcomes, with stronger deficits in ApoE4 vs. ApoE3 for C57BL/NTac mice.
In general, the results in the "mixed or unclear" subgroup as indicated in the forest plots are in line with the major conclusions based on the findings of males and females separately, as well as in the "Total" group.However, in some cases the results are more complex.This is especially true for the MWM retention 24 h data and both contextual and cued FC tests.In all these cases, the "mixed or unclear" subgroup included fewer studies than either the female-only and/or male-only subgroup.This likely contributes to the slightly deviating results of the "mixed or unclear" subgroup.Due to the poor reporting on the sexes, we decided not to focus on this subgroup in Table 2, but we did include them in the "all mice" group.
Taken together, our results indicate that hippocampal functioning is may be slightly affected in ApoE4 mice.The functional impact in the behavioral tasks likely depends on the interaction between hippocampus and other potentially affected brain regions.No clear indications were found that sex and/or age influenced the findings.This might be different, however, in the presence of AD pathology (e.g.amyloid-B, tau hyperphosphorylation and associated neuroinflammation, as seen in humans).Hence, in such a neuronal environment (i.e., an AD brain) ApoE4 could lead to even more pronounced behavioral and cognitive effects becoming manifest in a clearer sex and/or aging-related manner than observed in ApoE4 mice that are otherwise devoid of AD pathology.As the search for ApoE-targeted therapeutics is ongoing, the use of ApoE4-knock in mice without AD pathology remains relevant and a starting point for testing therapeutics aimed at alleviating cognitive deficits associated with the ApoE4 allele.
ApoE4 likely interacts in a complex way with other genetic and environmental factors.For example, McLean et al. [59] demonstrated that the individual levels of locomotor activity interact with the cognitive impact of ApoE4.Therefore, while the ApoE4 knock-in mouse seems to be a useful transgenic mouse to test ApoE-related therapeutics, it is crucial to consider interacting factors that may complicate the impact of ApoE4.
Finally, in humans, young ApoE4 carriers have been suggested to have an advantage over young non-ApoE4 carriers [60].There is some evidence of antagonistic pleiotropy, in which ApoE4 might offer benefits during development and early adulthood at the cost of an accelerated cognitive decline at an older age.This phenomenon has been observed in both humans [40] and mice.For the latter, Zhang et al. [103] reported that young ApoE4 mice performed slightly better in the NL test than ApoE3 mice, a trend that reversed with age.To substantiate this, further behavioral studies with hippocampus-dependent tasks should be conducted in young ApoE4 mice.

Limitations
Our current review has several limitations.Limitations from the included studies that may limit the evidence include risk of bias, indirectness, imprecision, heterogeneity and publication bias.The risk of bias remains largely unclear.Only a few studies showed a high risk of bias in two items (selective outcome reporting and other bias).For the remaining studies and items, the risk of bias is low or unclear due to insufficient reporting.To address this, we recommend to use the SYR-CLE's risk of bias tool [38] as a reporting guideline for animal studies to facilitate a more accurate risk of bias assessment.Indirectness is likely minimal, as we solely included ApoE4 vs. ApoE3 mice and focused on commonly used outcome measures.However, heterogeneity due to the genetic background of the ApoE4-trangenic mice, as discussed before, may hinder the consistency of more subtle differences in the behavioral performance of the mice to some extent.Imprecision may be a concern, particularly in the relationship between sex and effect size in the NL, NO and FC test, as fewer than 5 studies included females.The generally small sample sizes and wide confidence intervals in these studies provoke imprecision.Heterogeneity for the MWM is generally not important and non-significant, except for the distance on the last day for the total and male samples.For this outcome, the funnel plot suggests some publication bias.However, the asymmetry is especially due to a single study with an unexpectedly better performance in ApoE4 vs. ApoE3 male mice and a relatively large standard error [12].Excluding this study increased the pooled effect size to − 0.21 (-0.48, 0.06), which is still non-significant (p = 0.13).For the NL (index), NO and FC tests, we observed moderate to substantial and significant heterogeneity.Funnel plots for these tests suggest that the asymmetry is caused by a few studies reporting lower performance in ApoE4 vs. ApoE3 with relatively large standard errors.This may indicate an overestimation of the reduced performance in the ApoE4 mice.Overall, the evidence for a reduced performance in ApoE4 vs. ApoE3 is stronger for the MWM test than for the other tests.ApoE2 has been shown to be protective against AD neuropathology [46] and may better discriminate with ApoE4 compared to ApoE3.However, we did not include ApoE2 in our analyses for two reasons.First, data on ApoE2 is limited, especially for tests other than the MWM test.Using our search criteria, we identified only three studies on FC (two studies examined only contextual FC and one study both contextual and cued FC) and two studies on object recognition (NO only).These numbers are too low for analysis.Second, in humans, ApoE3 is by far the most prevalent ApoE allele [1].Nevertheless, including ApoE2 could be of interest in future studies.A final remark relates to the genetic background of the ApoE4 mice.As discussed, it may have a limited impact on the variability in outcomes, but this impact appears to be largely non-systematic.

Conclusion
In conclusion, this review demonstrates that ApoE4 knock-in mice in an otherwise healthy brain condition (i.e., non-AD or any other diseasecausing perturbation) are endowed with some cognitive deficits, irrespective of sex or age, but depending on experimental test conditions.However, compared to the more severe impairments observed in mouse models of neurodegenerative diseases, the cognitive deficits are relatively mild.To what extent these deficits can be attributed to hippocampal dysfunction is currently unclear.Nevertheless, our findings indicate that the presence of the ApoE4 allele can drive cognitive impairments independent of other disease-related perturbations.
E [MeSH terms] OR Apolipoprotein E* [tiab] OR APOE* [tiab]) AND (Cognition [MeSH terms] OR Learning [MeSH terms] OR Conditioning [MeSH terms] OR Memory [MeSH terms] OR Exploratory Behavior [MeSH terms]

Table 1
Study characteristics and Hedges's g effect sizes for selected outcomes included in the meta-analyses.

Table 1
(continued ) [38]eviations: M = male; F = female; MWM = Morris Water Maze test; NO = novel object recognition test; NL = novel location recognition test; FC = fear conditioning test a Restricted to the tests included in this review/meta-analysis b Test performed, but data not displayed or no distinction made between ApoE3 and ApoE4 Fig.2.: Risk of Bias assessed with the SYRCLE's risk of bias tool for animal studies[38].M.J.G. van Heuvelen et al.

Table 2
Overview of the behavioral performance of ApoE4 mice compared to ApoE3 mice.Total and subtotal (by sex) Hedges' g effect sizes with 95 % confidence intervals and p-values and I 2 values for heterogeneity are given.
Abbreviations: MWM = Morris Water Maze test; FC = Fear Conditioning test; ↓Worse performance of ApoE4 mice compared to ApoE3 mice; Test for overall effect as derived from the forest plots: *p<0.05;**p<0.01;***p<0.001;ns = non-significant a The category "all mice" include male, female and mixed samples; b Number of included studies: MWM n=35, NL n=9, NO n=28, FC n=17 studies;

Table 3
Results of meta-regression analyses including male-only and female-only samples with sex and age as potential predictors of effect size of ApoE4 vs. ApoE3.