Glutamatergic and Serotonergic Modulation of Rat Medial and Lateral Orbitofrontal Cortex in Visual Serial Reversal Learning

Adapting behavior to a dynamic environment requires both steadiness when the environment is stable and behavioral flexibility in response to changes. Much evidence suggests that cognitive flexibility, which can be operationalized in reversal learning tasks, is mediated by cortico-striatal circuitries, with the orbitofrontal cortex (OFC) playing a prominent role. The OFC is a functionally heterogeneous region, and we have previously reported differential roles of lateral (lOFC) and medial (mOFC) regions in a touchscreen serial visual reversal learning task for rats using pharmacological inactivation. Here, we investigated the effects of pharmacological overactivation of these regions using a glutamate transporter 1 (GLT-1) inhibitor, dihydrokainate (DHK), which increases extracellular glutamate by blocking its reuptake. We also tested the impact of antagonism of the serotonin 2A receptor (5-HT2AR), which modulates glutamate action, in the mOFC and lOFC on the same task. Overactivation induced by DHK produced dissociable effects in the mOFC and lOFC, with more prominent effects in the mOFC, specifically improving performance in the early, perseveration phase. Intra-lOFC DHK increased the number of omitted responses without affecting errors. In contrast, blocking the 5-HT2AR in the lOFC impaired reversal learning overall, while mOFC 5-HT2AR blockade had no effect. These results further support dissociable roles of the rodent mOFC and lOFC in deterministic visual reversal learning and indicate that modulating glutamate transmission through blocking the GLT-1 and the 5-HT2AR have different roles in these two structures.

Cognitive flexibility, the ability to adapt behavior in response to a changing environment, is disrupted in several psychiatric and developmental disorders including obsessive-compulsive disorder (OCD), schizophrenia, and autism (Chamberlain et al., 2008;D'Cruz et al., 2013;Leeson et al., 2009;Waltz & Gold, 2007). In OCD patients, inflexible behavior is typically treated with selective serotonin reuptake inhibitors, though typically not with full remission and with a large subgroup of nonresponders (Robbins, Vaghi, & Banca, 2019). Thus, more recently, drugs modulating cortical glutamate neurotransmission have gained attention, appearing beneficial in improving cognitive flexibility in OCD patients (Marinova, Chuang, & Fineberg, 2017). However, the underlying neural mechanisms of glutamatergic and serotonergic modulation of flexible behavior are not yet understood and need to be further investigated.
In particular, the excitatory 5-HT 2A Rs primarily localized on pyramidal neurons (Amargós-Bosch et al., 2004;Santana et al., 2004) and inhibitory 5-HT 2C Rs primarily localized on inhibitory parvalbumin neurons (Liu, Bubar, Lanfranco, Hillman, & Cunningham, 2007) seem to be involved in reversal learning as systemic 5-HT 2A R blockade impairs reversal learning performance, while systemic blockade of 5-HT 2C Rs improves performance (Boulougouris, Glennon, & Robbins, 2008). While local 5-HT 2C R antagonism in the lOFC reproduces this improvement, probably through inhibition of parvalbumin neurons leading to increased excitatory lOFC activity, intra-lOFC 5-HT 2A R blockade does not affect spatial reversal learning (Boulougouris & Robbins, 2010). However, 5-HT 2A R blockade with M100907 in the lOFC does impair odorbased reversal learning (Furr, Lapiz-Bluhm, & Morilak, 2012), and high levels of perseveration in rats are associated with decreased levels of 5-HT 2A R in the lOFC and mOFC (Barlow et al., 2015), consistent with decreased levels of OFC 5-HT 2A R predicting clinical severity in OCD patients (Perani et al., 2008). In the visual serial reversal learning paradigm used in the present study, intra-lOFC blockade of 5-HT 2C R improves performance in the early phase of reversal learning (Alsiö et al., 2015), but the role of 5-HT 2A Rs in the OFC still remains to be investigated on this task.
In the present study, we compared the effects of modulating glutamatergic transmission by DHK treatment and 5-HT 2A R blockade in the lOFC and mOFC on a deterministic visual serial reversal learning in rats that we have previously shown to be dissociably affected by lOFC and mOFC inactivation (Hervig et al., 2020). We hypothesized that DHK-induced activation of the lOFC and mOFC would produce effects opposite to those of inactivating the lOFC and mOFC (Hervig et al., 2020). We further hypothesized that blocking 5-HT 2A Rs would produce dissociable effects in the mOFC and lOFC, with hypothetical early reversal learning impairments in the lOFC as blocking the inhibitory 5-HT 2C R in the lOFC produces early reversal learning improvements (Alsiö et al., 2015).

Animals
Subjects were male Lister hooded rats (N ϭ 42; Charles River, United Kingdom; Supplementary Table S1) housed in groups of three or four during behavioral pretraining testing and single-housed following guide cannulas implantation to protect the implant. The rats were housed under a reverse 12-hr light/dark cycle with lights off at 7:00 a.m. All training and testing were performed during the dark phase. To ensure sufficient motivation for task performance, the animals were food restricted with ad libitum access to water and fed once daily at random times after testing. Their body weights were maintained at 85% of their free-feeding weight. All experiments were subject to regulation by the United Kingdom Home Office (PPL 70/ 7548) in accordance with the Animals (Scientific Procedures) Act 1986.
Drugs were aliquoted in the quantities required for each test day and frozen at Ϫ80°C. For the intracranial microinfusions, the drugs were administered at 0.5 l/side 10 min prior to testing.

Behavioral Training (Touchscreen Serial Visual Reversal Learning)
Behavioral training was performed as previously described in Hervig et al., 2020. For the experimental timeline and design, see Figure 1.

Apparatus
We trained and tested the animals on a touchscreen serial visual reversal learning task using 16 operant chambers (Med Associates, Georgia, VT) placed in sound-and light-attenuating wooden cabinets equipped with a fan for ventilation and masking of external noise. The chambers measured 30 cm ϫ 39 cm ϫ 29 cm and consisted of a clear Perspex ceiling, front door, and back panel and metal paneling on the sides of the chamber. A metal grid with a removable metal tray below made up the floor of the chamber. A central food magazine coupled to an external pellet dispenser was located on one side of the chamber. It was equipped with light and infrared beam sensors to detect magazine entry, allowing delivery of one 45-mg sucrose pellet (TestDiet 5TUL; Sandown Scientific, Middlesex, United Kingdom) upon correct responses. A house light (ϳ3 W) was located near the ceiling directly above the magazine. A touch-sensitive screen (29 ϫ 32 cm) presenting visual stimuli was located on the opposite side to that of the magazine. Task schedules were developed and implemented by A. C. Mar (Mar et al., 2013) using Visual Basic 2010 and have been published previously (Alsiö et al., 2015;Hervig et al., 2020).

Pretraining: Touchscreen Serial Visual Reversal Learning
A five-stage pretraining phase began after the rats were food restricted, involving Pavlovian and instrumental conditioning prior to visual discrimination and serial reversal learning, and lasted until a stable baseline was reached. In Stages 1 to 3, rats were trained to respond to a single white box at the bottom center of the touchscreen for sucrose reward pellets during 60-min daily sessions until criterion of receiving the maximum 100 pellets in one session. The box decreased in size across the three stages until a final size of 3 ϫ 4 cm ("start box") in Stage 3. In pretraining Stages 4 and 5, two additional stimuli were introduced (horizontal and vertical bars). The first was at the bottom of the screen to ease touch (Stage 4); then, the stimulus was raised 5 cm to the final location on the screen to avoid accidental touches (Stage 5). At this point, touching the white start box was no longer reinforced but instead led to the presentation of one of these novel stimuli to the left or right (pseudorandomized location). Responding to the presented stimulus was reinforced with a sugar pellet, whereas responding to the blank side was signaled as incorrect by the illumination of the house light for a 5-s time-out period. Eighty percent or more correct touches on one stimulus in a session led to training sessions with the other stimulus. When criterion of Ͼ 80% correct touches was reached also on this stimulus, the rat moved on from Stage 4 to Stage 5, and after Ն 80% correct touches were reached on both stimuli on Stage 5, visual discrimination training ensued.

Visual Discrimination Training
In visual discrimination, the rats were presented with both stimuli simultaneously, of which one was reinforced. For session initiation, the rats would collect a free reward delivery, which led to presentation of the start box. The rat initiated a trial by responding to the start box, which initiated a simultaneous presentation of the stimuli pair. Responding to the correct stimulus (conditional stimulus; [CS]ϩ) was reinforced with a sugar pellet, while responding to the incorrect nonreinforced stimulus (CSϪ) triggered a house-light-signaled 5-s time-out period. Failure to make a choice of either stimulus within 10 s of trial initiation was recorded as an omission. A 5-s intertrial-interval period Infusion sites were characterized from brain sections prepared with cresyl violet. Coordinates are given as millimeter distance from bregma. CS ϭ conditioned stimulus; ITI ϭ intertribal interval or intertrial interval; VD ϭ visual discrimination. See the online article for the color version of this figure. preceded the next trial. To prevent the rats from developing a side bias, the stimuli were presented on the screen (left or right side) in a pseudorandom order (maximum three consecutive trials to the same side). The daily session ended after either 60 min, 150 rewards, or 250 trials, whichever was the first to occur. The rats reached criterion by 24 correct out of a running window of 30 trials. Prior to serial reversal learning training, a retention session with the same reward contingencies was given, as well as on the day following attainment of the learning criterion, to ensure that the rat had acquired the discrimination.

Serial Visual Reversal Learning Training
Following the retention session during visual discrimination, the contingencies reversed so the rats then had to respond to the previous nonrewarded CSϪ stimulus (now CSϩ) for reinforcement until they reached the reversal learning criterion (24/30). A retention session both preceded and followed a reversal block. A stable serial reversal performance was achieved once the rat reached criterion within three consecutive daily sessions, with more than 200 trials completed on the first reversal day. The rats underwent surgery after they acquired a stable reversal learning performance.

Stereotaxic Surgery
Rats were initially anesthetized with 5% isoflurane gas, which for the duration of the aseptic surgical procedure was reduced and maintained between 1% and 3%. We secured the rats in a stereotaxic frame (KOPF, Tujunga, CA) with atraumatic ear bars, set the tooth bar to Ϫ3.3 mm, and adjusted for flat skull position. Bilateral guide cannulas (22-GA; PlasticsOne, Roanoke, VA) were implanted in the lOFC (anteroposterior [AP] ϩ3.5, mediolateral [ML] Ϯ2.5, dorsoventral [DV] Ϫ1.7) and the mOFC (AP ϩ4.0, ML Ϯ0.6, DV Ϫ1.4) and secured with four screws and dental cement. Removable obdurators were inserted into the guide cannulas to prevent occlusion and protected with a dust cap. We obtained the surgical coordinates by using a stereotaxic atlas and made adjustments according to pilot surgeries. AP and ML coordinates were referenced to bregma, and DV was referenced to dura.

Intracerebral Microinfusions and Reversal Learning Testing
Following the surgery recovery week, the rats were rebaselined on the serial reversal learning task to ensure a continued stable performance after the surgery. Following the baseline reversal week, which also included microinfusion habituation with sham infusions, we started the bilateral drug infusions of either M100907 or DHK across reversals according to a withinsubject, crossover/Latin square design. The procedure was as follows: Prior to testing, the rats were gently restrained, and injectors (Plastic-sOne; 28-GA) extending 2 mm below the guide were inserted into the guide cannulas. The injectors were left in place for 1 min before and after infusion, and drug was infused in a volume of 0.5 l over 2 min. The rats were allowed to move freely around in the experimenter's lap during infusion. Ten minutes after drug infusion, the rats were tested on the reversal task. Infusions were administered each day of reversal-that is, from the session when contingencies first shifted to the day criterion was reached, followed by a retention session with no infusion. Thus, an animal that reached criterion on the third day received three infusions across 3 consecutive days. On the day before the next reversal, another retention session was given in which the rats received saline infusion to ensure habituation to the infusion procedure as the rats typically had 2 days without testing between these retention sessions. Thus, a complete reversal with retention sessions and break took 7 days, during which the rats typically received three drug infusions.

Histology
To confirm cannulas and injector-tip placements, we performed cresyl violet staining. Briefly, after the experiments, the rats were given a lethal dose of sodium pentobarbitone (Euthatal) and transcardially perfused with 0.01 M PBS followed by 4% (vol/vol) paraformaldehyde solution. The brains were removed, postfixed in 4% paraformaldehyde for 24 hr at room temperature, and dehydrated and preserved in 30% (wt/vol) sucrose in 0.01 M PBS for at least 2 days until sectioning. For sectioning, the brains were fast-frozen, embedded in optimal cutting temperature compound (O.C.T, VWR Chemicals, #361603E), and sectioned into 60-m coronal sections using a cryostat (Leica, CM3050 S). The sections were stored in cryoprotectant at Ϫ20°C until cresyl violet staining.

Experimental Design and Statistical Analyses
Only animals with intact cannulas during the course of the experiments and with correct regional placement of injector tips (see Figure 1) were included in the analyses (Supplementary Table S1). All experiments employed a withinsubject complete crossover/Latin square design with separate cohorts for each region and drug. Data across days within one reversal were collapsed, and trial outcomes were coded as perseverative, random, or late learning depending on performance over bins of 30 trials in a rolling window, as described in detail and illustrated previously (Hervig et al., 2020), following binomial distribution probabilities (Jones & Mishkin, 1972). Postcriterion data (Ͼ24 correct) were excluded from analysis.
Behavioral data were subjected to analysis of variance (ANOVA) using a general linear model with significance at ␣ ϭ .05. Data were initially tested for normality with the Shapiro-Wilk test, and data that did not pass the Shapiro-Wilk test were appropriately transformed to obtain normal distribution before analysis (as described in further detail next). Outliers were tested by inspection of studentized residuals and would only be excluded from the analyses if the subject was consistently an outlier across all drug doses and behavioral phases; no animals were excluded. Homogeneity of variance was verified using Levene's test; for repeatedmeasures analyses, Mauchly's test of sphericity was applied to assure the sphericity assumption was not violated.
The dependent variables were trials, errors, reward collection and response latencies, omissions, as well as win-stay and lose-shift probabilities. Errors were square root transformed and analyzed to learning criterion and in each phase across regions. Lose-shift and win-stay probabilities were arcsine transformed and analyzed to criterion. Nonparametric testing was applied to analyze omissions in each phase and to criterion (Wilcoxon's; note that omissions only occurred if the animals actively initiated a trial by touching the start box). Latencies to respond to the stimuli (after initiating a trial) and to collect earned reward pellets were analyzed to criterion.
To investigate whether treatment had an impact on the overall learning strategy, we analyzed the win-stay and lose-shift behavior as a proxy for learning from positive and negative feedback, respectively. We calculated the winstay strategy as the probability of making a correct choice after a correct trial (P[stay|win]) and the lose-shift strategy as the probability of making a correct choice after an incorrect trial (P[shift|loss]; Clarke et al., 2008;Riceberg & Shapiro, 2012). Thus, P(shift|win) ϩ P(stay|win) ϭ 1 and P(shift|loss) ϩ P(stay|loss) ϭ 1.
The "criterion of learning" and "behavioral phase" data analyses across regions were performed with two-way mixed ANOVAs in a within-subject (Treatment) ϫ between-subjects (Region) design for regional inactivation. Data were analyzed within each region using planned pairwise comparisons with Student's t tests and repeated-measures one-way ANOVAs as appropriate.
All statistical analyses were performed using SPSS Version 25.0.0.1, and graphs were generated using GraphPad Prism 8 (San Diego, CA). Data are presented as mean Ϯ standard error of mean. Significant effects will be p Ͻ .05, while p Ͼ .1 will be reported as noneffects. Effect sizes are indicated with partial eta squared ( p 2 ; Cohen, 1988).

Effects of DHK Infusion in the mOFC and lOFC on Reversal Learning
In sum, DHK infused into the mOFC selectively reduced perseveration without affecting later learning phases. By contrast, DHK in the lOFC did not affect errors committed but increased omissions selectively in the perseveration phase.

Effects of 5-HT 2A R Blockade in the mOFC and lOFC on Reversal Learning
Intra-OFC 5-HT 2A blockade with M100907 produced dissociable effects on trials to criterion; intra-lOFC M100907 significantly increased overall trials required for learning, likely driven by increases across all phases, while intra-mOFC M100907 had no effect on trials (see Figure 3) but  Figure 3A).
In sum, 1 g M100907 induced more effects than did 3 g M100907, with these effects being found mainly in the lOFC. Intra-lOFC M100907 (1 g) reduced reversal learning performance overall by increasing trials to criterion, probably driven by an increase in errors committed in the late learning phase. This reversal learning impairment was associated with faster response latencies. By contrast, 1 g M100907 infused into the mOFC mainly increased omissions.

Discussion
We observed dissociable effects of intra-OFC blockade of the GLT-1 following DHK (presumably resulting in increased extracellular glutamate) and of the 5-HT 2A R with M100907 (presumably resulting in diminished 5-HT 2A R-mediated glutamatergic transmission) on deterministic serial visual reversal learning. Intra-mOFC DHK reduced perseverative errors, while intra-lOFC DHK had no effect on errors committed. By contrast, intra-lOFC M100907 impaired overall reversal learning as reflected by increased trials required to reach the learning criterion-presumably driven by errors increasing cumulatively at each stage reaching significance during late learning. This impairment was also associated with faster response latencies. These results add to our previous finding of dissociable roles of the rodent mOFC and lOFC in visual reversal learning (Hervig et al., 2020), which has also been reported across other tasks such as probabilistic reversal learning (Dalton et al., 2016), delay discounting (Mar et al., 2011), and instrumental action (Gourley, Lee, Howell, Pittenger, & Taylor, 2010).

Effects of Intra-OFC Blockade of GLT-1 on Serial Visual Reversal Learning
The present study shows that blockade of the astrocytic glutamate transporter GLT-1 with DHK in mOFC and lOFC affected reversal learning in a dissociable manner, though not in the direction that we expected. Based on our previous study (Hervig et al., 2020) showing that inactivating the lOFC impaired reversal learning, while inactivating the mOFC improved it, we expected to see somewhat opposite, and still dissociable, effects with DHK microinfusions. This is because DHK increases prefrontal extracellular glutamate levels (Pintor et al., 2004) and neuronal metabolic activity after local administration (at a comparable dose to the dose used in the present study) in the prefrontal cortex (PFC), while also decreasing related subcortical activity (Gasull-Camós et al., 2017). Thus, we predicted that microinfusion of DHK into lOFC would improve reversal learning, while it would impair reversal in the mOFC. Apparently paradoxically, intra-mOFC DHK improved reversal learning performance selectively in the early phase, as also occurred following inactivation of this structure. However, this improvement occurred in the absence of decreased collection latencies and enhanced negative feedback sensitivity produced by inactivation of the mOFC (Hervig et al., 2020). In contrast, intra-lOFC DHK had no effect on reversal learning performance.
We have previously suggested that the mOFC facilitates exploitative behavior (Hervig et al., 2020). DHK-induced excess glutamate in the mOFC likely disturbs the finely tuned glutamate homeostasis required for optimal neuronal functioning in learning and plasticity (Kalivas, 2009), in turn disrupting synchronized neuronal firing (Gray, 1994). This could hypothetically lead to inefficient cortico-striatal control over behavior and consequently enhanced exploration. This account may explain why intra-mOFC DHK to some degree mimics part of the effects from pharmacological mOFC inactivation on reversal learning observed previously (Hervig et al., 2020), while not fully reproducing those effects as the neural mechanisms are fundamentally different. As the mOFC, in contrast to lOFC, is the area most affected by DHK application in this study, it may be the region reflecting the brain circuitry responsible for the beneficial role of glutamate in reversal learning.
While it has been shown that optogenetic stimulation of lOFC-striatal projections suppresses compulsive grooming behavior (Burguière, Monteiro, Feng, & Graybiel, 2013), another study has shown that deep brain stimulation of the lOFC impairs spatial reversal learning, although not initial acquisition, in rats (Klanker, Post, Joosten, Feenstra, & Denys, 2013). Thus, the functional effect of lOFC activation on compulsive behavior is not straightforward, a conclusion further supported by the lack of effect of intra-lOFC DHK on visual reversal learning in the present study. As impaired reversal learning after lOFC inactivation or lesioning is well established across species, we expected to see some effect of "overactivating" the lOFC, but, at least in this paradigm, excessive glutamate in the lOFC does not seem to affect the lOFC's control over dorsostriatal regions thought to be responsible for adapting behavior to altered response-reward contingencies in humans (Balleine & O'Doherty, 2010;Gillan et al., 2015;Morris et al., 2016), monkeys (Groman et al., 2013), and mice (Gremel & Costa, 2013). Alternatively, it is possible that overall glutamate excess does not affect lOFC neurons overall, but only subpopulations, due to the presence of functionally different individual neurons that exhibit different activational profiles depending on task after optogenetic stimulation (Jennings et al., 2019). Thus, variations in DHK infusion placements could in theory mask any specific effects mediated by individual lOFC neurons. This is further supported by studies showing that subpopulations of lOFC neurons exhibit task-dependent firing patterns during reversal learning (Gremel & Costa, 2013;Marquardt, Sigdel, & Brigman, 2017). At least, we can conclude that a hypothetical subpopulation effect in the lOFC is not transmitted to subcortical regions, such as the dorsolateral striatum, which is part of the neural circuitry mediating habitual learning (Gremel & Costa, 2013;Groman et al., 2013).
While intra-lOFC DHK did not affect primary measures of reversal learning perfor-mance, it did increase omissions specifically in the perseveration phase. Although this result should be interpreted with caution as it only encompasses few omissions in total, it does indicate some impairment in the early phase, possibly due to an attentional deficit resulting from hallucinatory-type actions (Jardri et al., 2016) or possibly due to some degree of anhedonia as shown for global and PFC DHK treatment in rats (Bechtholt-Gompf et al., 2010;John et al., 2012). Overall, our observations support a role for the GLT-1-mediated regulation of glutamate availability in the mOFC, not in the lOFC, in controlling reversal learning.

Effects of Intra-OFC Blockade of the 5-HT 2A R on Serial Visual Reversal Learning
We found that selective blockade of 5-HT 2A Rs (by M100907) in the lOFC, not the mOFC, impaired reversal learning overall, as reflected by the increased number of trials required to reach learning criterion-an effect that presumably arose from increased errors committed cumulatively at each stage, reaching significance during late learning. This impairment was associated with faster response latencies, which could reflect overconfidence or impulsivity affecting decision-making. Blocking 5-HT 2A Rs in the mOFC had no effect on reversal learning but increased omissions.
This finding is consistent with a role of orbitofrontal serotonin in reversal learning as previous studies have shown that serotonin and serotonin transporter levels/polymorphisms predict individual variation in reversal learning performance in rodents (Barlow et al., 2015;Lapiz-Bluhm et al., 2009;Stolyarova, O'Dell, Marshall, & Izquierdo, 2014) and monkeys (Groman et al., 2013;Vallender, Lynch, Novak, & Miller, 2009), that orbitofrontal serotonin depletion selectively impairs visual reversal learning in monkeys (Clarke et al., 2004(Clarke et al., , 2005(Clarke et al., , 2007Rygula et al., 2015), associated with poor response suppression (Rygula et al., 2015), and that OFC serotonin is important for reinforcer devaluation (West, Forcelli, McCue, & Malkova, 2013). Our result is also consistent with previous systemic administration of M100907 impairing reversal learning performance on an operant two-choice spatial reversal learning task, whereas systemic blockade of 5-HT 2C Rs had the opposite effect, improving performance (Boulougouris et al., 2008). While local 5-HT 2C R antagonism in the lOFC reproduced this impairment, intra-lOFC 5-HT 2A R blockade had no effect on spatial reversal learning (Boulougouris & Robbins, 2010). However, 5-HT 2A R blockade with M100907 in the lOFC does impair odor-based reversal learning (Furr et al., 2012). Also, low reversal learning performance in rats is associated with decreased levels of 5-HT 2A R, and serotonin, in the lOFC (Barlow et al., 2015), supporting our result that lOFC 5-HT 2A R blockade impairs reversal learning performance.
In the visual serial reversal learning task used in the present study, intra-lOFC blockade of 5-HT 2C R (primarily localized to inhibitory parvalbumin interneurons) improves performance in the early phase of reversal learning (Alsiö et al., 2015), which together with a reversal learning impairment after blockade of 5-HT 2A R (primarily localized to glutamatergic pyramidal neurons) in the lOFC in the present study is consistent with 5-HT 2C Rs controlling and 5-HT 2A Rs facilitating reversal learning. However, it is important to note that as we did see an impairment, as expected, this impairment was not due to increased perseverative errors specifically, but rather an increase in errors committed across all reversal learning phases.
The discrepancy in effects of intra-lOFC M100907 administration on reversal learning is likely due to differences in reversal learning task design and sensory modalities involved. Boulougouris et al. used a spatial reversal learning task and only saw effects on the first, not the second or third, reversal the rats experienced, suggesting that novelty was also an important factor (Boulougouris et al., 2008;Boulougouris & Robbins, 2010). In the present study, we used a visual task, where the rats were trained in serial reversals to obtain stable reversal performance, allowing for within-subject analysis across reversals. Thus, our task is less dependent on circuitries involved in spatial cognition and excludes novelty as a possible factor.
It is important to note that only the lowest dose of 1 g M100907 affected reversal learning significantly. This dose has been used in lOFC in a previous reversal learning study with no effects (Boulougouris & Robbins, 2010) and in the mPFC with effects on compulsivity (Mora et al., 2018). Both studies showed disso-ciable effects from intra-lOFC 5-HT 2C R antagonist treatment-thus, this dose is presumably not targeting 5-HT 2c R receptors. Also, in Furr et al. (2012), a dose comparable to the lowest dose in the present study impaired reversal learning when infused into the lOFC. However, no previous studies have used 3 g M100907 in the OFC. Our results indicate that 3 g M100907 had different effects from 1 g, probably reflecting an inverted U-curve effect, which has also been reported for 5-HT 2A R antagonists previously (Marek, Martin-Ruiz, Abo, & Artigas, 2005;Roth, 2011). Thus, our high dose may have induced receptor internalization, which is a known mechanism for the 5-HT 2A R (Roth, 2011), supported by dose-response studies showing that systemic moderate doses of M100907 are more effective than low and high doses on a response inhibition task (Marek et al., 2005) and that intra-lOFC infusions with moderate M100907 doses induce the strongest detrimental effects on reversal learning compared with the low and high doses (Furr et al., 2012).
It is also worth speculating if the differential behavioral effects from the low and high doses of M100907 could be due to other factors. One factor is that the high M100907 dose could be targeting other receptors, in addition to the 5-HT 2A R, to which M100907 has lower affinity, such as the 5-HT 2C R. Since blocking the 5-HT 2C R has opposite effects on reversal learning (Alsiö et al., 2015), this could in theory mask/counteract potential M100907 effects. However, this seems unlikely both because no opposite effects of low and high M100907 doses are observed in the early phase and because M100907 has subnanomolar affinity for 5-HT 2A Rs, at least 100-fold lower affinity for 5-HT 2C Rs, and negligible affinity for other receptors (Johnson, Siegel, & Carr, 1996;Kehne et al., 1996;Pehek, Nocjar, Roth, Byrd, & Mabrouk, 2006). It is also worth noting that, while the majority of prefrontal 5-HT 2A Rs are postsynaptic, a small proportion are presynaptic (Cornea-Hébert, Riad, Wu, Singh, & Descarries, 1999;González-Maeso et al., 2007;Jakab & Goldman-Rakic, 1998;Miner, Backstrom, Sanders-Bush, & Sesack, 2003). The high dose of M100907 may, in theory, affect presynaptic, in addition to postsynaptic, receptors to a greater extent than the low dose and may thus modulate not only downstream neuronal excita-tion and long-term potentiation (Aznar & Klein, 2013) but also afferent neurotransmission (Barre et al., 2016). However, more investigations are needed to elucidate these potential underlying mechanisms. Moreover, although the findings were statistically significant with a high effect size within the lOFC, the lack of statistical significance and low to moderate effect sizes in the overall ANOVA indicate that this experiment eventually will require replication.
We hypothesize that the lOFC promotes exploration, and our present study suggests that 5-HT 2A R in the lOFC may be responsible not so much for the initial switch from exploitation to exploration strategies that occurs at the time of reversal but for implementing the information acquired through exploration. Our study shows that the rats are able to initiate exploration (as perseverative errors are not statistically altered), but they commit increasingly more errors in the random (early learning) and late learning phases. Thus, the information acquired initially is not properly implemented in the existing task set encoded by the lOFC as more trials/errors to update, or create new, task sets were required following intra-lOFC M100907 treatment, consistent with the well-established role for the 5-HT 2A R in learning and memory (Harvey, 2003;Zhang & Stackman, 2015). This impairment is not due to deficient feedback sensitivity as win-stay/lose-shift parameters were not affected by intra-lOFC M100907, but the observed speeding of response latency could reflect altered decision-making as a result of increased "guessing" before actually having made the correct decision. The OFC is involved in decision confidence and decision-making processes (Izquierdo, 2017;Kepecs & Mainen, 2012;Kepecs, Uchida, Zariwala, & Mainen, 2008) and required for optimal waiting based on decision confidence (Lak et al., 2014). Moreover, orbitofrontal serotonin depletion is associated with poor response suppression (Rygula et al., 2015), and prefrontal 5-HT 2A Rs are thought to play a role in decision-making (Aznar & Klein, 2013).
That the reversal learning impairment is associated with faster responding may seem paradoxical, but this could be due to a potential role for 5-HT 2A R and OFC in impulsivity. Although systemic 5-HT 2A R antagonism decreases impulsive responding (Winstanley, Theobald, Dalley, Glennon, & Robbins, 2004), an effect presumably mediated through the nucleus accumbens (Robinson et al., 2008), an opposing role for 5-HT 2A R in the lOFC is plausible.

Concluding Summary
We found that increasing glutamate availability in the mOFC, not lOFC, improved early reversal learning, while blocking 5-HT 2A Rs in the lOFC (presumably resulting in diminished glutamatergic transmission), not mOFC, lead to an overall impairment in visual reversal learning. These results further support dissociable roles of the rodent mOFC and lOFC in deterministic visual reversal learning and indicate that glutamate transmission and 5-HT 2A R have different roles in these two structures.