Both male and female APPswe/PSEN1dE9 mice are impaired in spatial memory and cognitive flexibility at 9 months of age

Alzheimer's disease (AD) is the most common cause of dementia. Despite many years of research, very limited treatment options are available. Here we aim to establish a well-defined learning and memory performance test for an AD mouse model, which can be used in future studies to evaluate the effect of novel drugs, treatments, and interventions. We exposed 9-month-old APPswe/PSEN1dE9 mice to a battery of memory tests to determine which test is best suited to study memory deficits in this specific AD mouse model. Since in more recent years it has become clear that there are sex-dependent differences in AD pathology, we also assessed differences in performance between male and female mice. From our test battery, we conclude that the Barnes maze task, which spans multiple days, is better suited to study subtle learning and memory deficits in 9-month-old APPswe/PS1dE9 mice, than the 2 trial T-maze and Fear conditioning task. This test revealed deficits in both spatial memory and cognitive flexibility in the APPswe/PS1dE9 mice compared to wildtype littermates. Furthermore, we conclude that there are no sex dependent memory deficit differences in this AD mouse model at this age.


Introduction
Dementia is among the top 10 causes of death worldwide ( Department of Data and Analytics et al., 2020 ). Alzheimer's disease (AD) is the number 1 cause of dementia, accounting for around 60%-70% of all cases. Worldwide each year almost 10 million people develop AD, resulting in various socio-economic problems ( Prince et al., 2016 ). The main neuropathological hallmarks of AD are amyloid-β (A β) plaques, neurofibrillary tangles, synapse loss, and neuroinflammation ( Kent et al., 2020 ). AD is also characterized by cognitive deficits, of which memory loss is the most important clinical symptom. It is widely accepted that there is an interaction between the neuropathological processes and memory deficits. Despite many years of research, treatment possibilities for this disease are still limited and no therapy exists that effectively inhibits the cognitive decline ( Vaz & Silvestre, 2020 ).
Overall, women develop AD more often compared to men ( Viña & Lloret, 2010 ). When assessing memory performance of men and women with AD, women score significantly lower than men ( Ryan et al., 2018 ). In addition, women have higher levels of tau tangles in the brain and show a trend towards higher A β-plaque load ( Oveisgharan et al., 2018 ). The exact cause of these sex differences in AD is still unknown. It can not only be attributed to the fact that women live longer since at all ages the percentage of women suffering from AD is higher compared to men ( Viña & Lloret, 2010 ). As treatments may affect males and females differently, it is important to include both sexes in preclinical studies.
To properly test novel drugs and treatments it is important to have a well-established disease model with pathology and associated cognitive decline. The APPswe/PSEN1dE9 mouse model (hereafter referred to as APP/PS1 mice) is one of the most studied animal models for AD (The Jackson Laboratory, MMRRC Stock No: 34832-JAX) ( Smit et al., 2021 ). These mice develop A β-plaques ( Kamphuis et al., 2012 ), have neuroinflammation and reactive astrocytes throughout their brain ( Orre et al., 2013 ), and show synaptic deficiencies and a reduction in the number of synapses ( Viana da Silva et al., 2016 ). They also have sex-related differences in disease pathology, with females showing higher concentra-tions of soluble A β, increased plaque burden, especially in the hippocampus, and more tau pathology and inflammation ( Jiao et al., 2016 ;Ordóñez-Gutiérrez et al., 2015 ).
Different types of memory can be assessed in mice. Although many studies have assessed memory deficits in the APP/PS1 mouse at various stages of AD-like/amyloid pathology only a few studies report the assessment of several different types of memory in 1 study, such as spatial memory, cognitive flexibility, and cued fear memory ( Bonardi et al., 2011 ;Cheng et al., 2013 ;O'Leary et al., 2018 ). None of those studies, however, assessed performance in both males and females.
In our study, we assessed different types of memory in tasks performed by both male and female APP/PS1 mice. We chose to test mice at the age of 9 months, when both sexes show a clear disease pathology and when memory deficits can be observed ( Janus et al., 2015 ;Végh et al., 2014 ). Since this age is still considered to relate to relatively early disease pathology, mainly affecting the hippocampus and the cortex, and the number of plaques, synapse loss and memory deficits will continue to increase with age, it is suitable for studying disease-modifying therapies.
All mice in our study were subjected to the same battery of memory tasks, which included (1) the forced alternation Tmaze to assess working memory and short term spatial memory ( Shoji et al., 2012 ), (2) the Barnes maze for long term spatial memory and cognitive flexibility ( Gawel et al., 2019 ), and (3) fear conditioning, in which contextual and cued fear conditioning memory were tested ( Selden et al., 1991 ). We observed that both spatial memory and cognitive flexibility was affected and that the Barnes maze was the most sensitive task to assess memory deficits in the APP/PS1 mouse at around the age of 9 months old. We conclude that both male and female APP/PS1 mice show very similar deficits in learning and memory performance at this age, and do not show clear sex dependent memory deficits differences.

Mice
We used male and female APPswe/PSEN1dE9 mice on a C57BL6/J background (Jackson Laboratory;Stock No: 34832) and their wild-type (WT) littermates. These mice were bred in our facility for several generations by crossing male APP/PS1 mice with female C57BL6/J mice. All mice were genotyped for the presence of the APP and the PS1 transgene using PCR and gel electrophoresis as described before ( Jankowsky et al., 2001 ;Kamphuis et al., 2012 ).
Where possible, mice were housed in groups, with APP/PS1 and WT littermates housed together. Males and females were separated at weaning and were housed in different rooms. Mice had ad libitum access to water and food. Two weeks before the start of the behavior assessments the mice were housed on a reversed daynight cycle, with the light phase between 19:00 and 07:00. At the start of the experiments, the mice were between 8 and 9 months old. Mice were euthanized and transcranial perfused the day after the final behavior test when they were between 9 and 10 months old. Throughout all experiments and in all housing rooms, radio background noise was played. In total 49 mice were included in these experiments; 12 WT males, 12 APP/PS1 males, 12 WT females and 13 APP/PS1 females. In addition, there was 1 mouse with a blind eye, which showed aberrant behavior, and therefore was excluded from all analysis. Experiments were reviewed and approved by the Animal Ethics Committee of the Central Authority for Scientific Experiments on Animals of the Netherlands (CCD, approval protocol AVD1150020174314), and were following the Directive of the European Parliament and of the Council of the European Union of September 22, 2010 (2010/63/EU).

General set-up
One week before the behavioral test the mice were handled twice a day. All mice went through the following behavioral test battery ( Fig. 1 ): Open Field, Forced T-maze, Novel Object Recognition (NOR), Barnes maze, and Fear conditioning. The NOR experiments were part of the test battery, but was later removed from analysis due to technical issues. Results are described in supplementary figure 1. All experiments were performed in the same room, with the exception of the fear conditioning test. For habituation purposes, mice were housed in the room adjacent to the testing room, starting 1 day before each experiment and they continued to stay there throughout all experiments. Males and females were kept in separate rooms and tested on separate days. At the start of each testing day, except for fear conditioning, mice were moved to individual cages for the duration of the testing day, after which they were placed back in their home cages. Throughout the entire experiment and analyses, researchers were blinded to the genotype of the mice. All experiments were recorded with a camera capable of infrared recordings and all videos, except for the fear conditioning, were recorded with MediaRecorder version 1.0.138 software from Noldus. Both at the start of the testing day and in between trials, the experimental apparatus and used objects were cleaned with Anistel (Animal Health High level Surface disinfectant, nonfragranced), diluted 1:200.

Open field
Open field was performed in white light conditions. The mice were placed in an open field arena consisting of a dark grey polyvinylchloride (PVC) cylinder with a diameter of 77 cm and a height of 32.5 cm, on a floor made of similar dark grey PVC. On top of the arena, a bright light (150 ± 25 lux) served as an aversive stimulus. The mouse was placed in the center of the arena and movement was recorded for 10 minutes. Videos were analyzed using EthovisionXT 11.5. An automated setup was used to detect the mouse. Only when this did not result in accurate tracking the static subtraction method was used for that specific recording. Paths were reviewed and if needed manually altered where mouse tracking was incorrect. The center area was defined as a circle with exactly half the total surface area, with the same center point as the open field area. The periphery was the area outside the center area.

Forced T-maze
The T-maze device was a symmetrical dark grey PVC T-shaped maze and it was placed 1 m above the ground. The arms were 29 cm long, 6.5 cm wide, and 15 cm high, and were connected at the center at a 90 °angle to a 6.5 by 6.5 cm square. Both arms from the T could be blocked with a barrier. The T-maze tests were performed in the dark, with only red light to allow for recording. For the first part of this test, 1 of the 2 arms was blocked (alternated). The mouse was placed at the base of the leg of the T-maze. The mouse was able to freely explore the leg and the open arm of the T-maze for 15 minutes, after which the mouse was returned to its individual home cage. After 1 hour, the blockage was removed from the T-maze and the mouse was allowed to freely explore the leg and both the old and novel arm of the T-maze for 5 minutes. Videos were analyzed using EthovisionXT 11.5. An automated setup was used to detect the mouse. Only when this did not result in reliable tracking the static subtraction method was used for that specific recording. Paths were reviewed and if needed manually altered where mouse tracking was incorrect. In Ethovision, areas were assigned to the leg, the novel, and the old arm. When Fourteen days prior to testing, mice were put on reversed day night light cycle. At that time, the mice were between 8 and 9 months old. Seven days prior to testing mouse handling started. Memory tests were conducted in this order: Open field, T-maze, Novel Object Recognition, Barnes maze and lastly Fear Conditioning. The NOR experiments were part of the test battery, but was later removed from analysis due to technical issues. At the end of the experiments, mice were between 9 and 10 months old. the center point of the mouse entered one of these areas this was counted as an arm entry. Mice with over 80% preference for 1 arm over the other were excluded, as this is a sign of nonactive participation in the task.

Novel Object Recognition (NOR)
Before the main test, several objects were tried before being included in this task. Two prerequisites must be met: the mice should not be able to climb the object, and there should not be an initial preference bias of 1 object over the other. Three objects were selected for use: a plastic bottle, an angular tall glass, and a round glass with green dots. All objects are around 15 cm in height and 7 cm in diameter.
NOR testing was performed in a transparent plastic rodent cage (26 cm long by 42 cm wide and 18 cm high). The tests were performed in red-light conditions and consisted of 4 phases. The first phase was the habituation phase, during which the mouse was placed in the empty cage and was allowed to explore for 10 minutes, before returning the mouse to its individual home cage. A 15minute familiarization phase took place approximately 1 hour after habituation, during which the mouse was placed in the test environment with 2 identical objects. An hour after familiarization the mouse was again placed in the test cage for 10 minutes, now with 1 object identical to the objects in the familiarization phase and 1 novel object (1-hour test). Overnight, mice were placed back in their group-housed home cages. Twenty-four hours after the familiarization phase a similar test was performed as during the 1-hour test, again with 1 object identical to the objects in the familiarization phase, and 1 different novel object (24-hour test). Object location (right or left) and which object was used as the familiar or novel object was pseudo-randomly altered between trials and between mice.
Interaction with the objects was scored manually. Object interaction was defined as when the mouse was oriented towards the object and its snout was within a head's length distance from the object, or when the mouse was actively touching the object. If the mouse was immobile for more than 10 seconds, the exploration interaction was paused until the mouse moved again. This was because immobility was considered to be "not active object interaction" and therefore was excluded. Throughout all phases, if total object interaction was less than 20 seconds this was considered to be an exclusion criterion as it is a sign of not active participation in the task and could result in weak memory formation. Trials in which mice showed more than 80% preference for one of the objects were excluded as well, as this also is a sign of nonactive participation in the task and could lead to failure of memory formation. Novelty preference was calculated by dividing the exploration time of the novel object by the total exploration time on that trial.

Barnes maze
The Barnes maze was a circular white metallic table with a diameter of 1 m elevated 1 m above the ground (Noldus information Technology ). In the periphery, there were 20 holes with a diameter of 5 cm at a regular interval. A rectangular black plastic box of 16.2 cm long, 6.6 cm wide, and 8.5 cm high was attached underneath the maze using magnets and was used as the escape chamber. The escape chamber contained a small staircase for the mice to climb down. Four distal cues were placed 30 cm from the edge of the maze at regular intervals. Cues were roughly 50 cm 2 and were made of cardboard and colored paper, resulting in different shapes and colors: a blue square, a yellow star, a red oval, and an orange triangle. All mice showed normal searching behavior and clearly looked around for visual cues to orient themselves. As an aversive stimulus, the light conditions were set at 175 ± 25 lux; this was used throughout all trials, except for the habituation phase, which was performed under red-light conditions. The complete Barnes maze test consisted of 11 consecutive days; day 0 habituation, day 1-5 acquisition trials, day 6 probe trial, day 7-9 reversal trials and day 10 reversal probe trial. For the habituation, the mouse was placed in the center of the maze and was allowed to freely explore the maze for 1 minute. If the mouse did not enter the escape chamber within 1 minute, it was gently guided towards the escape hole. When the mouse entered the escape chamber, the escape hole was covered and the mouse was left in the escape chamber for 2 minutes, before returning the mouse in its individual cage.
To overcome orientation bias, at the start of the acquisition, probe, reversal, and reversal probe trials the mouse was placed in the center of the maze and then covered with a plastic box for 15 seconds before being allowed to explore the maze. Throughout the acquisition trials, the location of the escape chamber was kept constant for each mouse, but altered between mice. The selected escape hole was at least 5 holes away from the location of the escape hole during the habituation phase. The mouse was allowed to explore the maze for 3 minutes. If by the end thereof the mouse had not entered the escape chamber, the mouse was gently guided towards the target hole and escape latency was set at 180 seconds. When the mouse entered the escape chamber, the escape hole was covered and the mouse was left in the escape chamber for 30 seconds before being moved back to its individual cage. After 10-15 minutes the trial was repeated for a total of 2 acquisition trials per mouse per day. In the probe phase, the escape chamber was removed and the mouse was allowed to explore the maze for 90 seconds, after which the mouse was put back in its individual cage. Mice performed only 1 probe trial. The reversal phase was the same as the acquisition phase with the exception that this phase only lasted 3 days instead of 5 and that the location of the target hole was on the exact opposite location in respect of the target hole location during the acquisition phase. The reversal probe trial was identical to the probe trial.
Escape latency was scored manually. A trial ended when the mouse found the target hole, or when the trial duration had elapsed. Also, the number of off-target holes (mistakes) were scored manually. Checking the same off-target hole twice in a row counted as only 1 mistake. For quantification of path length until finding target hole (primary path length), average speed, and time spent in different quadrants, Ethovision was used. An automated setup was used to detect the mouse. Only when this did not result in accurate tracking the static subtraction method was used for that specific recording. Paths were reviewed and manually altered when automatic mouse tracking was incorrect.
The average of the 2 trials on 1 day was taken for data analysis of the acquisition and the reversal phase. If 1 of the 2 trials on a single day was excluded, that complete day was excluded from analysis for that specific mouse. Search strategies were determined based on the number of errors and the number of crossings through the center zone: (1) random search strategy: if a mouse had more than 2 crossings through the center zone; (2) spatial search strategy: if there were no more than 2 crossings and the mouse made no more than 2 mistakes and the mistakes were adjacent holes; (3) serial search strategy: if the mouse made no more than 2 crossings, but more than 2 mistakes or nonadjacent mistakes. X 2 statistical analysis was used to analyze differences in search strategies. To reduce the effects of multiple comparisons we chose only 4 time points for our analysis: A1, A5, R1, and R3. First, differences between APP/PS1 and WT were compared on A1, A5, R1, and R3, and only if this resulted in a significant difference, another X 2 was used to determine which specific search strategy was differently used. In a similar way differences over time (A1 vs. A5, A1 vs. R1 and R1 vs. R3) were determined.

Contextual and cued fear conditioning
The fear conditioning test consisted of 3 parts: conditioning, contextual testing, and cued testing. The cued testing consisted of both a before cue and a cued phase. The 3 parts were executed in 3 consecutive days. Throughout the fear conditioning tests, the mice were housed in their home cage and transported to the test room each day just before testing. Testing was performed in white light conditions. The test environment consisted of a soundproof chamber of 28 cm in width, 42 cm in length, and 18 cm in height. The bottom consisted of an electrifiable grid. One wall was covered with a chessboard pattern. The chambers were equipped with a light, a fan, a sound source, and a camera. All test cages were connected to a computer on which timing of light, footshock, and tone was programmed with TRANS-IV in MED-PC IV software, from Med associates Inc. For the conditioning trial, the mouse was placed in the test cage to freely explore the environment for 2 minutes, after which a tone was presented for 30 seconds, with a footshock of 0.75 mA given during the last 2 seconds of the tone. Next, there was a 90-seconds resting phase, after which another 30-seconds tone with 2-seconds footshock was given. This was repeated 1 more time for a total of 3 shocks. The mouse was then returned to its home cage and brought back to the home room. For the second day, which is the contextual test, the mouse was placed in the same test environment and was allowed to explore for 5 minutes before being returned to its home cage. On the third day, which is the cued test, the test environment was changed by placing a plastic white platform on top of the electrifiable grit and by replacing the chessboard wall with a white wall. The mouse was placed in this environment and was first allowed to explore for 3 minutes, after which the same auditory tone was presented as used during conditioning. This tone sounded for 3 minutes, the cued phase, after which this test was ended.
All videos were analyzed manually using Noldus Observer XT 12. Freezing behavior was scored, defined by the complete lack of movement of the mouse. No time restrictions were applied for the duration of the freezing. For the conditioning phase, the complete 8 minutes of the trial was scored regarding freezing behavior. Contextual freezing, freezing before and during cued phase was only scored for the duration of the first 3 minutes. All videos were scored twice, and the difference in freezing time had to be less than 10 seconds to take the average as the final score. If the difference in scoring was more than 10 seconds, the video was scored again. The percentage of freezing was calculated per segment of 30 seconds. We did not remove outliers for the fear conditioning analysis.

Statistics and figures
For statistical analysis and graph development, GraphPad Prism 8.4.2 was used. Unless otherwise specified, outliers were removed first using ROUT; Q = 1. For all analyses, we first combined the data from the 2 sexes to analyze the differences between WT and APP/PS1. An unpaired 2-tailed t-test was used to assess differences between WT and APP/PS1. A 1 sample t-test was used to determine whether performance of a group differed from chance. To determine the effect of sex we used a 2-way-ANOVA, with genotype and sex as variables. In case of experiments over time (the Barnes maze and the shock phase of the fear conditioning) we used a 3way ANOVA with genotype, sex and time as variables. If appropriate, we next used a Tukey multiple comparison test to check differences between subgroups or at a specific time point. The nonparametric chi-square ( χ 2 ) test was used to determine differences in first arm entry on the T-maze task, and differences in search strategy in the Barnes maze task. Throughout our statistical analyses we used α = 0.05. Throughout all figures purple lines and error bars are used when comparing genotype differences, while green colors are used when comparing sex differences. When comparing sexes, darker green shades are used to represent male data, while lighter green shades represent female data.

APP/PS1 mice are more active than WT, without a difference in anxiety behavior
The open field test was used to assess differences in baseline locomotion and anxiety levels between APP/PS1 mice and WT littermates. One WT female mouse was excluded from open field analysis due to excessive travel distance (200m compared to a group average of 88m with a SEM of 8.9 m).
When analysing the effects of genotype and sex using a 2way ANOVA we found an effect of genotype (F(1, 44) = 8.184, p = 0.0064), but no effect of sex (F(1, 44) = 0.4694, p = 0.4 96 8), nor of the interaction (F(1, 44) = 0.5855, p = 0.4483). The travel distance of APP/PS1 mice was significantly longer than that of the WT mice ( Fig. 2 A). In addition, we did see a pattern (p = 0.0646) showing that female APP/PS1 mice were generally more active compared to WT females ( Fig. 2 B), also visualized by 2 representative tracings of their travel distance ( Fig. 2 C). The maximum speed of each mouse throughout the trial was not different between the groups (data not shown). Taken together, these data indicate that APP/PS1 mice are more active.
To determine whether APP/PS1 mice differ in anxiety from WT mice, we examined the percentage of distance travelled in the center and the frequency of visits in the center of the open field. Since a bright light was used as an aversive stimulus, we expected that anxious mice avoid the center area. Neither the distance in center (t(44) = 1.802, p = 0.0785) ( Fig. 2 D) nor the number of entries into the center area (t(46) = 1.685, p = 0.0987) ( Fig. 2 E) differed significantly between the APP/PS1 and the WT mice, indicating a similar anxiety level in both groups. All mice travelled a smaller distance in the center, indicating healthy avoidance of the center in all groups.
Overall, the open field test showed no difference in anxiety levels between APP/PS1 and WT mice, independent of sex, and therefore there is no need to take this into account for further testing. The APP/PS1 mice did show a hyperactive phenotype, which was more evident in the female mice.

Impairment in spatial memory of APP/PS1 mice
The T-maze ( Fig. 3 A) tests short term spatial memory and is based on the willingness of mice to explore new environments. We examined 3 different parameters to test the spatial working memory of WT and APP/PS1 mice: (1) the percentage of visits to the novel arm, (2) which arm was the first to be entered, and (3) the percentage of time spent in the novel arm. One APP/PS1 male mouse was excluded from analysis due to over 80% arm preference, and for 1 WT male we miss first arm entry data. When analysing the effects of genotype and sex on visits in the novel arm, we found an effect of genotype (F(1, 44) = 6.678, p = 0.0132), but no effect of sex (F(1, 44) = 0.7762, p = 0.3831), nor the interaction (F(1, 44) < 0.001, p = 0.9806). APP/PS1 mice visited the novel arm significantly less than the WT mice and while WT mice performed above chance level (t(23) = 2.751, p = 0.0114), APP/PS1 mice did not (t(23) = 0.7960, p = 0.4342) ( Fig. 3 B).
Both sexes show a pattern of decreased performance in the APP/PS1 group, but this is not significant. There are no differences between male and female performance ( Fig. 3 C). When examining whether the mice entered the novel or the old arm first, we observed a difference between males and females. Using a chi-square ( χ 2 ) test, we showed that APP/PS1 male mice perform worse and only entered the novel arm first a few times (3 out of 11 times), while WT males more often visit the novel arm first (9 out of 11) ( Fig. 3 D). This difference was not observed in females, where neither WT nor APP/PS1 mice showed a preference for the novel or the familiar arm on the first entry (WT: 6 out of 12, APP/PS1 8 out of 13). There were no differences in total arm visits between WT and APP/PS1 mice (WT mean: 24.17 ± SEM 1.95, APP/PS1 mean: 26.21 ± SEM 1.65; not shown), indicating no differences in motivation and willingness to explore. When comparing the percentage of time spent in the novel arm, WT performance was at chance level (t(23) = 0.7538, p = 0.4586), indicating that this is not a proper parameter to assess differences in memory performance. We did not see differences between WT and APP/PS1 performance (F(1, 44) = 0.03676, p = 0.8488), no effect of sex (F(1, 44) = 0.5135, p = 0.4774), or the interaction (F(1, 44) = 0.01685, p = 0.8973) (Supplemental figure 2).

Spatial memory and cognitive flexibility deficits in the APP/PS1 mice
Multiple different parameters were analyzed to assess learning and memory performance in the Barnes maze ( Fig. 4 A). We first determined the latency to escape. There was a clear effect of time (F(2.452, 108.9) = 44.20, p < 0.0 0 01) indicating that both groups performed better over time and memorized the location The novel arm is the arm that was previously blocked during the initial phase, while the old arm is the arm that the mice were previously allowed to explore. (B) Percentage of visits to the novel arm from WT and APP/PS1 comparison. (C) Comparing sexes on performance of percentage of visits to novel arm. (D) Graph depicting per group which percentage of mice first entered the novel arm and which percentage first entered the old arm. Spheres represent data from WT mice, crosses represent data from APP/PS1 mice. Lines and error bars represent mean and SEM. Gray dashed lines indicate chance level and red stars indicate where the group significantly differs from chance. Purple colors for WT versus APP/PS1 comparison and green for comparison between males (dark green) and females (light green). For WT males N = 12 (except for first entry, where N = 11), WT females N = 12, APP/PS1 males N = 11, APP/PS1 females N = 13 (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article). of the target hole. There also was an effect of time * genotype (F(7, 311 = 3.875, p = 0.0 0 05), showing that over time APP/PS1 mice perform different com pared to WT mice. In most trials the APP/PS1 mice needed more time to find the escape hole ( Fig. 4 B), specifically, on the third and fifth day of the acquisition phase and the second and third day of the reversal phase (A3 t(31.19) = 3.059, p = 0.0357; A5 t(32.55) = 3.514, p = 0.0105; R2 t(28.45) = 3.423, p = 0.0151; R3 t(43.09) = 2.884, p = 0.0478). Both WT and APP/PS1 mice performed worse on the first day of the reversal trial than on the fifth day of acquisition (t(90) = 4.960, p < 0.0 0 01), showing that both had to adjust to find the new target hole. However, both groups quickly learned the new target location, as indicated by a similar performance on the last day of the acquisition phase and the last day of the reversal phase. There was a significant increase in performance between trial A1 and R1 for the WT mice (t(42) = 4.445, p < 0.001) and not for APP/PS1 mice (t(42) = 1.520, p = 0.2537), indicating that WT mice have a more flexible memory and can transfer previously learned skills to a similar trial. When testing the effect of sex using a 3-way ANOVA, no effect was found (F(1, 45) = 1.586, p = 0.2144), neither was there an effect of sex * time (F(7, 301) = 0.4888, p = 0.8426) nor sex * genotype (F(1, 45) = 0.01355, p = 0.9078) ( Fig. 4 C).
When analysing the search strategies of all mice to find the escape hole, we found that on day A5 WT mice used the spatial search strategy significantly more often than APP/PS1 mice (X 2 = 6.00 and 1 df gives α = 0.05) ( Fig. 4 G). No differences in search strategy were observed on day A1, R1, or R3. Both APP/PS1 and WT mice reduced the random search strategy after multiple acquisition or reversal trials. Throughout the acquisition phase WT mice show an increase in spatial search strategy (A1 to A5: X 2 = 14.76 and 1 df, gives α = 0.001). Throughout the reversal trials, the APP/PS1 mice applied the serial search strategy more (R1 to R3: X 2 = 4.44 and 1 df, gives α = 0.05). In the first reversal trial, APP/PS1 mice did not adjust their search strategy compared to the first acquisition trial, while the WT mice significantly decreased the usage of the random search strategy (X 2 = 8.33 and 1 df, gives α = 0.01) and increased the use of the more effective serial search strategy (X 2 = 6.49 and 1 df, gives α = 0.05). Bottom purple and pink pie charts represent APP/PS1 data. (B/D/E) Circles with black lines represent data from WT mice, crosses with purple lines represent data from APP/PS1 mice. * p ≤ 0.05, * * p ≤ 0.01, * * * p ≤ 0.001, #p = 0.0515, ns is not significant. Error bars represent SEM. A1 -A5 is Acquisition trial 1-5, R1 -R3 = Reversal trial 1 to 3. Vertical dashed line indicates transition from acquisition phase to reversal phase. WT males N = 12, WT females N = 12, APP/PS1 males N = 12, APP/PS1 females N = 13 (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article).
Overall, the results from the Barnes maze show clear differences in performance by APP/PS1 and WT mice during acquisition and reversal phase, indicating deficits in both spatial learning and memory and cognitive flexibility in the APP/PS1 mice. No sexdependent differences in performance on the Barnes maze were found.

No differences between APP/PS1 and WT mice in either contextual or cued fear memory
The fear conditioning task consisted of 3 phases: (1) conditioning phase, (2) contextual test, and (3) cued test, performed on 3 consecutive days ( Fig. 5 A). There was no difference in baseline freezing between WT and APP/PS1 mice, and none of the groups showed freezing before the first shock ( Fig. 5 B). In all groups, the freezing behavior was increased after the first shock, and this continued to increase after the second and third shock, indicating a normal fear response. However, we observed less freezing in the APP/PS1 mice over time, as both genotype and time have a significant effect on freezing (F(1, 752) = 21.54, p < 0.0 0 01; F(15, 752) = 31.02, p < 0.001). We next validated that our WT mice have proper memory performance. Freezing is significantly increased from the before cued to the cued phase (t(74) = 4.889, p > 0.0 0 01). In addition, they freeze more during the context phase than during the before cued phase (t(36) = 4.783, p < 0.001). We observed no differences in freezing behavior between WT and APP/PS1 mice during the contextual (t(47) = 0.6835, p = 0.4977) and cued test (t(47) = 0.05363, p = 0.9575) ( Fig. 5 C and D).

Discussion
We observed deficits in both spatial memory and cognitive flexibility of 9 months old APP/PS1 mice compared to WT littermates. These deficits were most clear in the Barnes maze test, but also apparent during the T-maze task. We observed no differences between APP/PS1 and WT mice in either contextual or cued fear (B/C/D) Percentage of freezing by both WT and APP/PS1 mice during conditioning, context testing, or during cued testing respectively. * * * * p ≤ 0.0 0 01, * p ≤ 0.05. "Tone" indicates where the tone is sounded and * indicates where the 2 seconds shock was given. Lines and error bars represent mean and SEM. Crosses represent APP/PS1 data, while spheres represent WT data. WT males N = 12, WT females N = 12, APP/PS1 males N = 12, APP/PS1 females N = 13. memory. We are the first specifically focusing on differences in learning and memory performance between APP/PS1 and WT mice with a C57BL6/J background by using the Barnes maze. Because the Barnes maze allows for detecting subtle differences and changes in memory deficits compared to most other memory tasks, without inducing much stress in the mice, it is a very suitable test to use in future experiments. Furthermore, in contrast to many other studies, we examined sex based differences, but conclude that throughout our behavior tests this specific mouse model, at this age does not show sex differences in learning and memory performance.
Our results of the Barnes maze showed a clear learning and memory deficit in the performance of the APP/PS1 mice. These mice had longer escape latencies, longer travel distance, slower speed, made more errors, and displayed less memory dependent search strategies compared to the WT mice. Deficits were present during both the acquisition and reversal phase, indicating deficits in both spatial memory and cognitive flexibility compared to wildtype littermates. Spatial memory performance is dependent on hippocampal activity ( McHugh et al., 2008 ;O'Keefe & Dostrovsky, 1971 ), and cognitive flexibility depends on the frontal cortex ( Gawel et al., 2019 ;Livingston-Thomas et al., 2015 ). We there-fore conclude that both of these brain regions are affected in our APP/PS1 mouse model at 9 months of age, as shown before ( Jackson et al., 2013 ;Minkeviciene et al., 2008 ). These deficits were most clear in the Barnes maze test.
Quadrant preferences is a commonly tested parameter in the Barnes maze. We expected to see a deficit in quadrant preference during the probe trial in our APP/PS1 mice, however, this was not observed. Other reports have shown that both age and the number of trials prior to the probe task affect performance on the probe trial; differences are more apparent in older mice, or with fewer trials ( Attar et al., 2013 ;Sakakibara et al., 2018 ). We tried 2 other types of analysis to possibly detect more subtle differences: analysing only the first 30 seconds of the probe trials instead of the total 90 seconds probe trial duration, and analysing the percentage of visits to the target hole or an adjacent hole of the first 15 hole visits a mouse made. Neither analysis showed a difference between APP/PS1 and WT performance (data not shown).
We observed hyperactivity in the APP/PS1 mice, as the APP/PS1 mice showed longer travel distances in the open field task and longer travel distances and higher average speed in the T-maze (data not shown). We assume that hyperactivity did not affect memory performance analysis in the T-maze, as we consider per-formance on this task to be independent of the travel distance and speed. Hyperactivity, could however affect performance on the Barnes maze. On one hand, it could be beneficial to shorten escape latency because the mice move faster, which could also explain why the APP/PS1 mice have a better performance on the first day of the Barnes maze task compared to the WT mice. On the other hand, it could also result in mice paying less attention to visual cues, which could lead to poor memory formation, and thereby decreased test performance. Although female mice travelled longer distances in the Barnes Maze, mostly in the APP/PS1 group, both the APP/PS1 and WT mice sometimes pause and look around to orient themselves, and average movement speed throughout the Barnes maze task is lower in APP/PS1 mice compared to WT mice (supplementary figure 3A). Therefore we believe that the observed differences in performance are due to memory deficits, and are not caused by hyperactivity. Hyperactivity can affect the results of the fear conditioning tests, as a hyperactive mouse might alter freezing behavior ( Jager et al., 2019 ). As we expected more freezing behavior in the WT mice compared to the APP/PS1 mice, the effect of hyperactivity should have enhanced the difference. We did not see a difference between APP/PS1 and WT performance, and therefore conclude that the hyperactivity did not skew our results on the fear conditioning task.
There are several possible reasons as to why we do not see differences in most of our memory tasks, even though we see clear differences in Barnes maze performance. First of all, performance is majorly affected by chance. During the T-maze task the mouse has only 2 options to choose from, while in the Barnes maze, the mice can choose between 20 target holes. In addition, while the T-maze task was tested all on 1 day, the Barnes maze spans multiple days, making it possible to assess learning and memory performance over time. The day on which deficits become apparent is an indicator for the degree of deficits; earlier detectable differences indicate more severe impairment. As a result, the Barnes maze can detect more subtle differences in performance and is also suited to detect changes in learning and memory performance throughout disease progression. A hypothesis is that in our model at this age memory deficits are still too small to be detected with the T-maze, but that as pathology progresses, deficits will also become apparent using those tests. This is partly in line with previous literature. On the T-maze or the very similar Y-maze, either with spontaneous alternations or forced-choice, no memory deficits were found in 4-, 6-, 7-, 12-, and 13-month-old APP/PS1 mice, independent of their sex ( Bonardi et al., 2011 ;Chaney et al., 2018 ;Lalonde et al., 20 04 , 20 05 ;O'Leary et al., 2018 ;Onos et al., 2019 ;Reiserer et al., 2007 ). Only when the mice were tested at an older age, 18-or 24-months-old, memory deficits were apparent in the T-or Y-maze ( Chaney et al., 2018 ;Huang et al., 2016 ). To improve performance in the T-maze we suggest using within maze cues to improve recognition of different arms. Overall, however, we would not recommend using the T-maze to assess memory performance in APP/PS1 mice at 9 months of age.
Because in AD the hippocampus is often severely affected, while the amygdala is not ( Jackson et al., 2013 ;Minkeviciene et al., 2008 ), we would expect a decreased performance of the APP/PS1 mice in the contextual test of the fear conditioning task, but not in the cued test. However, we did not find any differences between APP/PS1 and WT performance in the contextual or the conditional task. Previous studies on testing learning and memory performance in APP/PS1 mice using fear conditioning are inconclusive. Some studies show deficits in contextual fear memory ( Kilgore et al., 2010 ;Kommaddi et al., 2018 ;Végh et al., 2014 ), others show deficits in cued fear memory ( Janus et al., 2015 ;Knafo et al., 2009 ), and yet other studies report no deficits at all ( Bonardi et al., 2011 ;Cheng et al., 2013 ). Both the strength of the footshock and the number of shocks, as well as the age of the mice differ between studies, which could partly explain the differences in results. However, there is no clear indication that older APP/PS1 mice or more and stronger shocks are linked to more deficits. A noteworthy difference between most studies and ours is the amount of freezing observed at the end of the conditioning phase. Normally 40%-70% freezing is observed ( Kilgore et al., 2010 ;Knafo et al., 2009 ;Kommaddi et al., 2018 ;O'Leary et al., 2018 ) during context or cued tests, while the highest average amount of freezing we observed was 30% freezing during the cued test. Although all our mice showed a clear response when the shock was administered, it is possible that an increase in shock frequency, duration, or strength could improve the fear response and strengthen memory formation. Lack thereof limits the strength of this test to observe differences between APP/PS1 and WT mice.
From studies in patients, it is known that there are differences in AD pathology between men and women, with women being more severely affected by the disease ( Ryan et al., 2018 ;Viña & Lloret, 2010 ). There is however a very limited number of papers that looks at possible sex differences in learning and memory performance in APP/PS1 mice. One previous experiment testing both male and female APP/PS1 mice in either the Barnes maze or the Morris Water Maze (MWM) showed a sex-dependent difference in the reversal phase; only male WT mice were more affected by the reversal of the escape hole compared to APP/PS1 mice ( O'Leary & Brown, 2009 ). Throughout our experiments we only found sex dependent memory differences in the first arm entry in the T-maze task. We therefore conclude that, throughout our battery of memory tasks and at this age, this APP/PS1 mouse model shows very limited sex dependent memory differences. We do still believe that it is important to examine sex dependent differences in AD, and therefore urge future studies to include this in their tests. It is possible that sex dependent differences would become more apparent at later ages in this mouse model, and this therefore could be examined in future studies. The lack of sex-dependent differences in our experiments is in contrast to our expectations based on observations in humans. One possible explanation for these differences between our APP/PS1 mouse model and humans is the lack of Tau pathology in our model. Women have higher levels of tau tangles in the brain compared to men ( Oveisgharan et al., 2018 ). Several studies have been performed showing sex-dependent memory deficits in an AD mouse models with Tau pathology ( Fertan et al., 2019 ;Roddick et al., 2014 ;Yue et al., 2011 ). It would be interesting to compare sex-dependent learning and memory performance between our AD mouse model and the mouse models used in those studies. In addition, we only tested performance at 1 specific age. We conclude that more extensive studies regarding sex-dependent differences are needed.

Conclusions
Both male and female APP/PS1 mice are impaired in spatial memory and cognitive flexibility at 9 months of age. Almost no differences between male and female performance were found in the APP/PS1 mice, showing that, at the age of 9 months old, biological sex, within the focus of the present study, is not sufficient to explain sex-related disparities in AD. At this age, the T-maze and contextual and cued fear conditioning do not show differences in performance between APP/PS1 and WT mice. The Barnes maze allows for detection of subtle differences and changes in memory deficits, and is a very suitable test to use in future experiments testing novel drugs and treatments to prevent or slow down AD.

Submission declaration and verification
Authors declare that this work has not been published previously, nor is it under consideration for publication elsewhere. In addition, if accepted, this data will not be published elsewhere. All authors approved submission of the manuscript to this journal.

Disclosure statement
The authors declare no conflict of interest.