Delay of Reinforcement Versus Rate of Reinforcement in Pavlovian Conditioning

Conditioned stimulus (CS) duration is a determinant of conditioned responding, with increases in duration leading to reductions in response rates. The CS duration effect has been proposed to reflect sensitivity to the reinforcement rate across cumulative exposure to the CS, suggesting that the delay of reinforcement from the onset of the cue is not crucial. Here, we compared the effects of delay and rate of reinforcement on Pavlovian appetitive conditioning in mice. In Experiment 1, the influence of reinforcement delay on the timing of responding was removed by making the duration of cues variable across trials. Mice trained with variable duration cues were sensitive to differences in the rate of reinforcement to a similar extent as mice trained with fixed duration cues. Experiments 2 and 3 tested the independent effects of delay and reinforcement rate. In Experiment 2, food was presented at either the termination of the CS or during the CS. In Experiment 3, food occurred during the CS for all cues. The latter experiment demonstrated an effect of delay, but not reinforcement rate. Experiment 4 ruled out the possibility that the lack of effect of reinforcement rate in Experiment 3 was due to mice failing to learn about the nonreinforced CS exposure after the presentation of food within a trial. These results demonstrate that although the CS duration effect is not simply a consequence of timing of conditioned responses, it is dependent on the delay of reinforcement. The results provide a challenge to current associative and nonassociative, time-accumulation models of learning.

Temporal factors play a crucial role in determining the rate of conditioned responding in Pavlovian procedures. One factor is the duration of the conditioned stimulus (CS). Short duration CSs typically elicit higher response rates than long duration CSs (e.g., Gibbon, Baldock, Locurto, Gold, & Terrace, 1977;Harris & Carpenter, 2011;Holland, 2000;Lattal, 1999; but see Davis, Schlesinger, & Sorenson, 1989). An account of this CS duration effect is that it reflects the sensitivity of conditioned responding to the rate of reinforcement across cumulative exposure to a CS (Gallistel & Gibbon, 2000). Recent support for the role of reinforcement rate in the CS duration effect has come from experiments demonstrating no significant difference in the rate of responding elicited by CSs that differ in duration if they are matched for cumulative reinforcement rate (Harris, Patterson, & Gharaei, 2015). Matched reinforcement rate was achieved by reinforcing the short duration CS on only a proportion of trials such that its cumulative reinforcement rate was the same as the long duration CS. For example, in the second experiment by Harris et al. (2015) rats were trained with a CS that was on average 30 s long across trials and another CS that was on average 10 s long. The 30-s CS was reinforced on every trial. The 10-s CS was presented three times as often as the 30-s CS, thereby matching cumulative exposure between the CSs, and was reinforced on a random third of trials. Because of the differences in average CS duration and probability of the unconditioned stimulus (US) per trial, both CSs were reinforced on average every 30 s across cumulative exposure to the CS. The rate of responding to the short-and long-duration CSs did not differ over training. In contrast, conditioned responding was greater for a variable duration 10-s CS compared to a variable duration 30-s CS when both cues were reinforced every trial. Therefore, the advantage of the short-duration CS over the long-duration CS was removed by matching the rate of reinforcement over cumulative exposure. We have also found similar results with mice (Austen, Pickering, Sprengel, & Sanderson, 2018). Using procedures that were similar to Harris et al. (2015), mice received appetitive Pavlovian conditioning of magazine approach behavior. The rate of responding elicited by a 10-s CS did not differ significantly from a 40-s CS when the 10-s CS was reinforced on a random 25% of trials and was presented four times as often as the 40-s CS, thereby matching the number of CS-US pairings. In contrast, when both CSs were reinforced on 100% of trials the 10-s CS elicited a higher rate of responding than the 40-s CS.
In a continuously reinforced delay conditioning procedure in which the US is presented at the termination of the CS on every trial the reinforcement rate is the reciprocal of the delay of reinforcement (CS-US interval). Therefore, under such conditions it is not possible to isolate the role of reinforcement rate from the role of delay of reinforcement. The experiments described above, however, demonstrated an independent role of reinforcement rate by comparing partially reinforced cues with a short delay of reinforcement with continuously reinforced cues with a longer delay of reinforcement such that cumulative reinforcement rate was matched. This suggests that when rate of reinforcement is controlled, delay of reinforcement does not determine the rate of responding. Thus, it does not matter if reinforcement occurs after a short or long delay. One problem with this conclusion is that cues that were matched for reinforcement rate differed in probability of reinforcement per trial as well as delay of reinforcement. One way of disentangling the role of reinforcement rate from the delay of reinforcement that avoids confounding the delay of reinforcement with probability of reinforcement per trial is to compare cues of equal duration that differ in delay of reinforcement by virtue of reinforcement occurring at different time points within the trial. Therefore, reinforcement rate is matched for cues with different CS-US intervals by manipulating the duration of continued exposure to the CS after the US. If conditioned responding is determined by the reinforcement rate over cumulative exposure to the CS regardless of how the exposure is structured relative to the presentation of the US, then it would be predicted that delay of reinforcement will have little effect on response rates.
The prediction that delay of reinforcement will not have an effect on conditioned responding is at odds with other work that does suggest that delay of reinforcement does influence behavior. The delay of reinforcement is known to influence conditioned responding, affecting the distribution of responding within a trial such that responses are timed to the occurrence of reinforcement (e.g., Delamater, Chen, Nasser, & Elayouby, 2018). Based, in part, on the timing of conditioned responding that animals show, it has been proposed that the associative strength of a CS reflects the estimated delay of reinforcement from CS onset (Balsam, 1984;Gibbon & Balsam, 1981). Moreover, there are a number of reports from studies examining choice behavior in appetitive operant conditioning, primarily in pigeons, but also in rats, that the delay of reinforcement has a greater effect than the rate of reinforcement (Davison, 1968;Gentry & Marr, 1980;Herrnstein, 1964;Killeen, 1968;Lea, 1979;Logan, 1965;Mazur, Snyderman, & Coe, 1985;McDiarmid & Rilling, 1965;Shull, Spear, & Bryson, 1981). Thus, responses that lead to a short delay of reinforcement, but with a low rate of reinforcement are preferred to responses that lead to a long delay of reinforcement, but with a high rate of reinforcement.
A further reason for suggesting that the delay of reinforcement is important is that Harris et al. (2015) found that the CS duration effect was abolished by matching reinforcement rates only when the CSs were of variable durations that changed trial by trial such that rats could not time the occurrence of the US within the trial. Specifically, CS durations varied uniformly around a mean duration, which led to rats displaying a flat response rate within a trial. When cues were of a fixed duration, and, therefore, reinforcement occurred after a fixed delay, rats showed faster acquisition of conditioned responding with the long duration cue that was reinforced on every trial compared to the short duration cue that was reinforced on a proportion of trials. Although this may suggest the probability of reinforcement per trial may also play a role in determining response rates in some circumstances (see also Chan & Harris, 2017;Harris & Andrew, 2017;Harris & Kwok, 2018), it demonstrates that timing of conditioned responding may confound measures of overall response rates, masking the relationship between reinforcement rates and response rates.
In contrast to Harris et al. (2015) we found that fixed duration CSs, for which the occurrence of reinforcement could be timed, failed to elicit different response rates when they were matched for reinforcement rate over trials (Austen et al., 2018). The difference between our results with mice and those of Harris et al. with rats may be due to a number of reasons. Regardless of these differences, although the collective results suggest that reinforcement rate is an important determinant of response rates, these experiments confounded CS duration (whether variable or fixed) and consequently delay of reinforcement with probability of reinforcement per trial. Therefore, it is not clear whether delay of reinforcement plays a role in determining overall response rates in Pavlovian conditioning that is independent from reinforcement rate.
Identifying the factors that determine the strength of conditioned responding is crucial for assessing the merits of theoretical accounts of learning. Associative theories that have been developed to explain sensitivity to temporal information (e.g., Ludvig, Sutton, & Kehoe, 2012;Sutton & Barto, 1981;Vogel, Brandon, & Wagner, 2003;Wagner, 1981) assume that both rate and delay of reinforcement will influence the strength of responding, but do not necessarily anticipate that the factors will be dissociable without making assumptions about a number of free parameters. Theories that are explicitly nonassociative, in that they assume learning requires symbolic encoding of quantitative variables, do make claims about which variables are crucial. For example, scalar expectancy theory proposes that responding reflects expectancy of reinforcement based on the comparison of time since the onset of the CS and the remembered CS-US interval (Gibbon, 1977). In contrast, rate estimation theory proposes that conditioned responding reflects comparison of the reinforcement rate of a CS with the background reinforcement rate over cumulative exposure (Gallistel & Gibbon, 2000).
The purpose of the experiments presented here was to assess the role of delay of reinforcement in determining response rates in Pavlovian conditioning in mice. Similar to the study by Austen et al. (2018), appetitive conditioning of magazine approach behavior was used with the rate of head entries into the food magazine as the measure of responding. In Experiment 1, we assessed whether the ability to time the occurrence of the US affected the CS duration effect. This was achieved by comparing the CS duration effect in mice that were trained with short and long duration CSs that were of either a fixed or a variable duration. Experiments 2 and 3 tested the effects of delay of reinforcement and rate of reinforcement in order to determine whether they had independent effects. The effects of delay were assessed by comparing CSs of the same duration but which differed in delay of reinforcement due to manipulation of the time within a trial at which the US was presented. The independent effect of reinforcement rate was assessed by comparing CSs of different durations, but that were reinforced at the same time point from the onset of the CS. Experiment 4 tested an account of the results of Experiment 3 in 204 AUSTEN AND SANDERSON which, within a trial, continued nonreinforced exposure to a CS after the US failed to reduce response rates.

Experiment 1
The aim of Experiment 1 was to test the extent to which timing of conditioned responses influences the CS duration effect. Although CSs of different durations that are reinforced at the end of their presentation elicit different overall rates of responding, they also elicit distinct patterns of responding, with responses being timed to the presentation of reinforcement. Therefore, any comparison of overall response rates between cues of different durations is potentially confounded by differences in the distribution of responding across cues. Indeed, Harris et al. (2015) have argued that the sensitivity of response rates to reinforcement rates can only be detected under conditions in which animals are unable to time their responses due to the time of reinforcement being variable. In contrast, we have used fixed duration cues and have failed to find any difference in acquisition with cues of different durations and probability of reinforcement per trial, but that have the same cumulative reinforcement rate (Austen et al., 2018). This was true regardless of whether responding was compared across equivalent numbers of trials, or equivalent numbers of reinforcements. This may suggest that the cue duration effect that we have observed in mice, using continuously reinforced, fixed duration 10-s and 40-s cues, may solely reflect differences in reinforcement rate rather than timing of conditioned responding. Thus, it is possible that mice failed to time responses sufficiently for it to affect overall response rates.
To test whether timing of conditioned responding influences the extent of the CS duration effect we trained one group of mice with fixed duration 10-s and 40-s CSs, replicating the design used by Austen et al. (2018), and another group with CSs that varied in duration trial by trial, but one CS was on average 10-s and the other 40-s on average (see Figure 1A and 1B). Harris, Gharaei, and Pincham (2011) have shown that, in rats, when CS durations vary in a uniformly distributed manner, then response rates are constant across the duration of the CS. In contrast, when variable CS durations are drawn from an exponential distribution, response rates decline across the duration of the CS. Therefore, we chose to sample variable cue durations from a uniform distribution, to reduce the likelihood that responding would change as a function of the CS duration and, as a consequence, appear to be timed. In addition to the reinforced CSs, mice also received presentations of nonreinforced (fixed or variable duration, according to group allocation) 10-s and 40-s CSs. These nonreinforced cues served as control cues for determining baseline response rates for the cues of different durations.

Method
Subjects. Thirty-two experimentally naïve female C57BL/6J mice (Charles River UK Ltd., Margate, United Kingdom), approximately 10 weeks old at the start of testing, with a mean freefeeding weight of 19.1g (range: 15.9 -22.6g), were used. Mice were caged in groups of 4 -8 in a temperature-controlled housing room on a 12-hr light-dark cycle (lights on at 8:00 a.m.). Prior to the start of the experiment, the weights of the mice were reduced by being placed on a restricted diet. Mice were then maintained at 85% of their free-feeding weights throughout the experiment. Mice had ad libitum access to water in their home cages. All procedures were conducted under Home Office UK project license number PPL 70/7785. Apparatus. A set of eight identical operant chambers (interior dimensions: 15.9 ϫ 14.0 ϫ 12.7 cm; ENV-307A, Med Associates, Inc., Fairfax, VA), enclosed in sound-attenuating cubicles (ENV-022V) were used. The operant chambers were controlled by Med-PC IV software (SOF-735). The side walls were made from aluminum, and the front and back walls and the ceiling were made from clear Perspex. The chamber floors each comprised a grid of stainless steel rods (0.32-cm diameter), spaced 0.79 cm apart, and running perpendicular to the front of the chamber (ENV-307A-GFW). A food magazine (2.9 ϫ 2.5 ϫ 1.9 cm; ENV-303M) was situated in the center of one of the sidewalls of the chamber, into which sucrose pellets (14 mg, TestDiet) could be delivered from a pellet dispenser ). An infrared beam (ENV-303HDA) across the entrance of the magazine was used to record head entries at a resolution of 0.1 s. A fan (ENV-025F) was located within each of the sound-attenuating cubicles and was turned on during sessions, providing a background SPL of approximately 65 dB. Auditory stimuli were provided by a white noise generator (ENV-325SM) outputting a flat frequency response from 10 to 25,000 Hz at 75 dB and a clicker (ENV-335M) operating at a frequency of 4 Hz at 75 dB. Visual stimuli were a 2.8 W house light (ENV-315M), which could illuminate the entire chamber, and two LEDs (ENV-321M) positioned to the left and right of the food magazine, which provided more localized illumination.
Procedure. Mice received 12 sessions of training with two short duration cues and two long duration cues. Mice were randomly allocated to one of two groups (N ϭ 16 per group). For group fixed, the duration of short cues was 10 s and the duration of long cues was 40 s. For group variable, the durations of the cues varied from trial to trial, but within a session they had a mean duration that was the same as the duration of group fixed. Therefore, the short cues had a mean of 10 s, but the duration of each trial varied, according to a uniform distribution, around the mean (shortest ϭ 2 s, longest ϭ 18 s). Similarly, the long cues had a mean of 40 s, and trials varied according to a uniform distribution around the mean (shortest ϭ 2 s, longest ϭ 78 s). For both groups one of the short and one of the long duration cues was reinforced by presentation of a sucrose pellet at the termination of the cue (CSϩ). The remaining short and long cues were nonreinforced (CSϪ). Within each group, for half of the mice the short cues were auditory (noise, clicker) and the long cues were visual (house light, flashing LEDs [0.25 s on/0.25 s off]). The opposite was true for the remaining mice. Within each of these subgroups the identity of the reinforced and nonreinforced stimuli was fully counterbalanced. Each of the four cues was presented nine times per session with a fixed interval of 120 s between the offset of one cue and the onset of the next. Trials were presented in a random order with the constraint that an equal number of each cue was presented every block of 12 trials. For each session, all mice received the stimuli presented in the same order (e.g., 1st trial ϭ noise, 2nd trial ϭ house light, 3rd trial ϭ clicker, etc.). Because the identity of short and long duration cues and the identity of reinforced and nonreinforced cues were counterbalanced across mice, this resulted in the order of these factors also being counterbalanced across mice.

DELAY AND RATE OF REINFORCEMENT IN CONDITIONING
Data and statistical analysis. The frequency of head entries into the food magazine was recorded per-second during the CS presentations and for the 10-s pre-CS period. Responding during the CS is reported as a difference score (responses per minute [RPM]) in which responding during the nonreinforced cues was subtracted from the responding to the reinforced cue of the same duration and modality. For mice that were trained with variable duration cues, the overall response rates were calculated by averaging across response rates per trial independent of the duration of the trial.
Timing of conditioned responding was analyzed by calculating the mean response rate per 1-s time bin for individual mice. For mice trained with variable duration cues the response rate per 1-s time bin was calculated by averaging over the response rates for individual trials that lasted long enough to include the relevant time period. Thus, average response rates for the first two seconds of the CS could be calculated from all trials, because all trials lasted at least 2 s. For time bins beyond 2 s, however, the average response rate was calculated from the relevant proportion of trials. In this manner, response rates per time bin were corrected for opportunity of sampling. Response rates across time bins were then normalized by each mouse's overall response rate in order to derive the proportion of responses made in each time bin. Linear slopes were then fitted to the normalized response rates. For comparisons of timing of different intervals (i.e., 10 s and 40 s), the durations were normalized by comparing responding over equivalent proportions of time.
All data were analyzed using one-way or multifactorial analysis of variance (ANOVA). The modality of the 10-s/40-s cues was included as a nuisance variable in all analyses. We have previously found that response rates are higher for auditory cues than visual cues (e.g., Sanderson, Jones, & Austen, 2016). The inclusion of this counterbalancing factor allowed assessment of the other factors independent of the variance caused by the counterbalancing factor. Consequently, in all analyses the main effect of this nuisance factor and any interactions involving the nuisance factor were ignored. Interactions were analyzed with simple main effects analysis using the pooled error term from the original ANOVA or separate ANOVAs for repeated measures with more than two levels. Where sphericity of within-subjects variables could not be assumed, a Greenhouse-Geisser correction was applied. In instances in which manipulations of the main factors of interest led to nonsignificant results, Bayesian statistics were used to evaluate the degree to which the results provided evidence for the null hypothesis. Bayesian analyses were conducted in JASP using default priors. The reported Bayes Factor compares models containing the effect of interest to equivalent models stripped of the effect, excluding higher-order interactions. The analysis was suggested by Mathôt (2017). Within JASP, this was achieved by conducting a Bayesian repeated measures ANOVA and outputting effects across matched models.
Given that there was no significant effect of cue duration variability on responding to cues of different durations, the results fail to support the claim that response rates are determined by reinforcement rate only when responding is not timed to the occurrence of the US. To assess whether the data provide evidence for there being no effect of cue variability on sensitivity to differences in cue duration a Bayesian analysis was conducted. Although there was very strong evidence that any CS Duration ϫ Group interaction did not interact with session, CS Duration ϫ Group ϫ Session BF ϭ 0.01, there was moderate evidence for an interaction between CS duration and group, suggesting that the CS duration effect was larger in the variable group than the fixed group (BF ϭ 5.09). Given the lack of a significant interaction in the previous ANOVA, we are cautious about making any conclusions based on this result. At the very least the Bayesian analysis did not provide strong evidence for a lack of difference in the CS duration effect between groups.
A corresponding analysis on responding during the nonreinforced cues (see Table 1 for means and SEMs) showed that mice responded more to the short nonreinforced cue than the long nonreinforced cue, F(1, 28) ϭ 50.2, p Ͻ .001, p 2 ϭ .64, 90% CI [.43, .74]. There was a significant effect of session, F(11, 308) ϭ 34.7, p Ͻ .001, p 2 ϭ .55, 90% CI [.48, .59], and significant Group ϫ Session, F(11, 308) ϭ 2.57, p ϭ .049, p 2 ϭ .08, 90% CI [.02, .10], and CS Duration ϫ Session, F(11, 308) ϭ 5.89, p Ͻ .001, p 2 ϭ .17, 90% CI [.09, .21], interactions. The main effect of group, F(1, 28) ϭ 1.46, p ϭ .24, p 2 ϭ .05, 90% CI [.00, .21], and the CS Duration ϫ Group, F(1, 28) ϭ 1.36, p ϭ .25, p 2 ϭ .05, 90% CI [.00, .21], and CS Duration ϫ Group ϫ Session, F Ͻ 1, p ϭ .64, interactions were all nonsignificant. Further analysis of the significant Group ϫ Session interaction showed a significant effect of group for Session 8 only, F(1, 28) ϭ 4.78, p ϭ .037. There were significant effects of session for both groups, F values Ͼ 16.1, p values Ͻ .001. Further analysis of the significant CS Duration ϫ Session interaction showed a significant effect of CS duration, with mice responding more to the short than the long duration nonreinforced cue, on Sessions 1-7 and 10, F values Ͼ 4.68, p values Ͻ .040. There were significant effects of session for both CS durations, F values Ͼ 22.4, p values Ͻ .001. Given that mice responded more to the short duration nonreinforced cue than the long duration nonreinforced cue, this difference would have only led to underestimating the size of the CS duration effect for reinforced cues when rates of responding were converted to difference scores. Therefore, the effect of cue duration on the difference scores was not an artifact of the differences in rates of responding to the nonreinforced cues of short and long duration.
The rates of responding across the duration of the 10 s and 40 s cues, restricted to the first 10 s and 40 s, respectively, for both groups can be seen in Figure 2C and 2D. Mice in group fixed showed an increase in responding over the duration of the cue presentations, with this increase being more pronounced for the 10 s cue than the 40 s cue. Mice in group variable also showed an increase in responding over time, but, in contrast to group fixed, responding tended to level out before the average time of reinforcement for the short and long duration cues. Timing of conditioned responding was examined by fitting linear slopes to the normalized rates of responding during comparable time periods for both groups (i.e., the 10 s of the short cue and the 40 s of the long cue for group fixed and the first 10 s of the short cue and the first 40 s of the long cue for group variable). The durations of the short and long cues were normalized by examining responding across 207 DELAY AND RATE OF REINFORCEMENT IN CONDITIONING  Figure 2E and 2F). Thus, normalized response rates were calculated for each tenth of the short and long durations. The gradients of these normalized response rates were then calculated by fitting linear trends (fixed 10ϩ: M ϭ 0.00965, SEM ϭ 0.00236; fixed 40ϩ: M ϭ 0.00314, SEM ϭ 0.00092; variable 10ϩ: M ϭ 0.00565, SEM ϭ 0.00146; variable 40ϩ: M ϭ 0.00104, SEM ϭ 0.00136). An ANOVA of CS Duration (10-s or 40-s) ϫ Group (fixed or variable) ϫ Counterbalance (10-s CS auditory or visual; nuisance factor) showed that the gradients of normalized responding were steeper for the 10-s cue than for the 40-s cue, F(1, 28) ϭ 21.7, p Ͻ .001, p 2 ϭ .44, 90% CI [.19, .59]. Mice in the variable condition showed significantly shallower gradients than mice in the fixed condition, F(1, 28) ϭ 5.19, p ϭ .031, p 2 ϭ .16, 90% CI [.01, .35]. The interaction between CS duration and group was not significant, F Ͻ 1, p ϭ .43.
Although the distribution of responding during the CSs differed depending on whether the durations of the CSs were fixed or variable across trials, it was clear that the CS duration effect was not significantly affected by cue duration variability. The lack of difference between mice trained with variable duration CSs and those trained with fixed duration CSs is consistent with other findings in mice (Ward et al., 2012) but is in contrast to findings in rats (Jennings, Alonso, Mondragón, Franssen, & Bonardi, 2013) that showed that variable duration cues elicit weaker levels of responding compared to fixed duration cues. Also, Harris et al. (2015) found that equating reinforcement rates between continuously and partially reinforced cues led to similar levels of conditioned responding only when the cue durations were variable, but not when constant, suggesting that the opportunity to time conditioned responding affected the overall rates of responding. Our results failed to find that variable duration cues elicited weaker responding and the fact that varying the duration of cues did not affect the cue duration effect suggests differences in conditioned responding between the short and long duration cues were determined by differences in duration of cumulative exposure rather than the opportunity to time responding.
It is of note that the analysis of timing failed to show equivalent timing for short and long duration cues when the durations were normalized. It has been claimed that timing ability is scale invariant such that the variance in timing ability scales with changes in the timed duration (Gibbon, 1977). If this was the case, then the distribution of normalized responding over the short and long duration cues should be the same when responding is expressed as a function of the proportion of the timed interval. This clearly was not the case in the present experiment. This instead suggests that shorter intervals are timed better than longer intervals.

Experiment 2a and 2b
In Experiment 1, regardless of whether responding was timed to the occurrence of reinforcement, the overall rates of responding were sensitive to the cumulative duration of the cues. Although this may suggest that delay of reinforcement is not important for determining the rate of responding, it was still true that for the group that received conditioning with variable duration cues, even though some cue presentations were very short, on average long duration cues had a longer delay of reinforcement than short duration cues.
Experiments 2a and 2b directly tested the relative contributions of delay of reinforcement and rate of reinforcement respectively. In Experiment 2a mice received conditioning with two 40-s CSs (see Figure 1C). One was reinforced after 10 s within the CS presentation and the other after 40 s, at the termination of the CS. Although the cues differed in the delay at which reinforcement occurred within the trial, both cues had the same reinforcement rate. In Experiment 2b mice were trained with a 10-s CS and a 40-s CS (see Figure 1D). Both CSs were reinforced 10-s after the onset of the cue, such that for the 40-s CS reinforcement occurred during the cue, but for the 10-s CS reinforcement occurred at the termination of the cue. Although both cues had the same delay of reinforcement, they differed in reinforcement rate.

Method
Subjects and apparatus. Experiment 2a used 16 female C57BL/6J mice, 14 -15 weeks old at the start of testing, with a mean free-feeding weight of 17.7g (range: 16.0 -19.5g). Experiment 2b used 16 female C57BL/6J mice, 14 -15 weeks old at the start of testing, with a mean free-feeding weight of 18.7g (range: 16.0 -21.4g). Mice for both experiments had previously been used in an unrelated experiment involving consumption of flavoured sucrose solutions, conducted in a different room in operant boxes that were distinct from those used in the current experiment. All other details were the same as Experiment 1.
Procedure. In Experiment 2a, mice received 12 sessions of training with four 40-s cues (two auditory and two visual). One of the cues was reinforced by the presentation of a sucrose pellet 10 s into the presentation of the cue (40/10ϩ, see Figure 1C). Another cue was reinforced by the presentation of a sucrose pellet after 40 s, at the termination of the cue (40/40ϩ, see Figure 1C). The remaining cues were nonreinforced (CSϪ). For half of the mice the modality of the cue reinforced after 10 s was auditory (noise, clicker) and the modality of the cue reinforced after 40 s was visual (house light, flashing LEDs with alternating 0.5 s illumination of the left and right LEDs). The opposite was true for the remaining mice. Within each of these subgroups the identity of the reinforced and nonreinforced stimuli was fully counterbalanced. In Experiment 2b mice received 12 sessions of training with two 10-s duration cues and two 40-s cues. One cue of each duration was reinforced. The 10-s cue was reinforced by the presentation of a sucrose pellet after 10 s, at the termination of the cue (10/10ϩ, see Figure 1D). The 40-s cue was reinforced by the presentation of a sucrose pellet 10 s into the presentation of the cue (40/10ϩ, see Figure 1D). The remaining short and long cues were nonreinforced. For half of the mice, the short cues were auditory (noise, clicker) and the long cues were visual (house light, flashing LEDs with alternating 0.5-s illumination of the left and right LEDs). The opposite was true for the remaining mice. Within each of these subgroups the identity of the reinforced and nonreinforced stimuli was fully counterbalanced. For both experiments mice received six presentations of each cue per session, with a fixed interval of 120 s between the offset of one cue and the onset of the next. The order of trials was random with the constraint that there was an equal number of each trial type every eight trials. For each session all mice received the stimuli presented in the same order (e.g., 1st trial ϭ noise, 2nd trial ϭ house light, 3rd trial ϭ clicker etc.). Because the identity of the stimuli used in the different conditions was counterbalanced across mice, this resulted in the order of the different conditions across trials also being counterbalanced across mice. Data analysis. The frequency of head entries into the food magazine was recorded per-second during the CS exposure prior to the presentation of reinforcement. Therefore, in the condition in which the 40-s cue was reinforced 10 s after the CS onset, rates of responding are reported for the initial 10 s of the CS and not the subsequent 30 s after reinforcement. Rates of responding during reinforced cues were converted to difference scores by subtracting the rate of responding during the equivalent period of the nonreinforced cue of the same modality. Responding was also recorded for a 10-s pre-CS period. All other details were the same as Experiment 1.

Results and Discussion
Experiment 2a. Responses per minute, displayed as difference scores (rate of responding during the reinforced cues, prior to reinforcement, minus the rate of responding during the equivalent periods for the nonreinforced cues of the same modality as the reinforced cue) are shown in Figure 3A. Mice responded more in the 40/10ϩ condition than in the 40/40ϩ condition. An ANOVA of Delay (40/10ϩ vs. 40/40ϩ) ϫ Counterbalance (40/10ϩ CS auditory or visual; nuisance factor) ϫ Session showed significant main effects of delay, F(1, 14)  A corresponding analysis of responding during the nonreinforced cues (see Table 2 for means and SEMs) showed that mice responded more to the nonreinforced cue that was of the same modality as the cue that was reinforced after 10 s compared to the nonreinforced cue that was the same modality as the cue reinforced after 40 s, F(1, 14) .13]. Similar to Experiment 1, the difference between the nonreinforced cues would have only led to underestimating the size of the effect of delay for reinforced cues when rates of responding were converted to difference scores. Therefore, the effect of delay on the difference scores was not an artifact of the differences in rates of responding to the nonreinforced cues.
The rate of responding during each second of the reinforced cues is shown in Figure 3C. For both reinforced cues the rate of responding increased over the course of the cue. As for Experiment 1, timing was analyzed by normalizing response rates across equivalent proportions of the 10-s and 40-s delays (see Figure 3E). The gradients of these normalized response rates were then calculated by fitting linear trends (40/10ϩ: M ϭ 0.01123, SEM ϭ 0.00079; 40/40ϩ: M ϭ 0.00887, SEM ϭ 0.00172). There was no significant difference in the gradients between the two conditions, F(1, 14) ϭ 1.93, p ϭ .19, p 2 ϭ .12, 90% CI [.00, .37]. This is in contrast to Experiment 1, in which we found a significant difference in the distribution of responses for the short and long delays of reinforcement. It is not clear why the results of the two experiments differ. One potential difference in Experiment 2a was the degree of temporal contiguity between the CS and US for each delay. This is discussed further below. Experiment 2b. Responses per minute, displayed as difference scores (i.e., 10/10 CSϪ subtracted from 10/10 CSϩ, 40/10 CSϪ subtracted from 40/10 CSϩ), are shown in Figure 3B. Mice responded more in the 40/10ϩ condition than in the 10/10ϩ condition. An ANOVA of Reinforcement Rate (10/10 or 40/10) ϫ Counterbalance (10/10ϩ CS auditory or visual; nuisance factor) ϫ Session showed significant main effects of reinforcement rate, F(1, 14) ϭ 18.7, p ϭ .001, p 2 ϭ .57, 90% CI [.22, .74], and session, F(11, 154) ϭ 24.2, p Ͻ .001, p 2 ϭ .63, 90% CI [.53, .67], and a significant Reinforcement Rate ϫ Session interaction, F(11, 154) ϭ 3.18, p ϭ .001, p 2 ϭ .19, 90% CI [.05, .22]. Further analysis of this interaction showed that responding in the 40/10ϩ condition was higher than in the 10/10ϩ condition on Sessions 6, 7, 9, 10, 11, and 12, F values Ͼ 7.0, p values Ͻ .02.
In Experiment 2a, it was found that a shorter delay of reinforcement within a CS presentation led to greater responding compared to a longer delay of reinforcement, even though both CSs were matched for their overall rate of reinforcement. In Experiment 2b, even though the CSs were matched for delay of reinforcement, it was found that the CS that had a lower reinforcement rate elicited greater responding than the CS with the higher reinforcement rate. Although the results of these two experiments contradict previous results, suggesting that increases in reinforcement rate do not necessarily lead to increases in response rate, these results can, instead, be explained in terms of differences in temporal contiguity. In Experiment 2a, when the CS with the long delay of reinforcement was reinforced, the pellet was presented at the termination of the CS and mice would have consumed the pellet some moments later. For the short delay of reinforcement CS, however, the pellet was presented during the CS and mice would have likely consumed the pellet during the continued presentation of the CS. Therefore, the greater temporal contiguity between the CS and reinforcement may have led to the greater rate of responding for the short delay of reinforcement CS compared to the long delay of reinforcement CS.
Temporal contiguity between the CS and reinforcement may also explain the performance of mice in Experiment 2b. For the 40-s CS that was reinforced after 10 s, consumption of the pellet would have likely been contiguous with the presentation of the CS. For the 10-s CS, however, mice would have consumed the pellet some moments after the termination of the CS. Therefore, despite the 40-s CS having a lower reinforcement rate than the 10-s CS, the degree of temporal contiguity between the cue and reinforcement would have been higher.
Other experiments that have compared conditioning with a CS that is extended past the presentation of the US with a CS that has terminated at the onset of the US have found mixed results that may reflect differences in the nature of the conditioned response that was measured (Wagner & Brandon, 1989). For example, extending the duration of the CS after the onset of shock reduces conditioned suppression in rats (Ayres & Albert, 1990;Ayres, Albert, & Bombace, 1987), but similar manipulations, using conditioning of the nictitating membrane response in rabbits, increases conditioning (Kehoe, 2000). Although the results of Experiment 2a and 2b are consistent with the proposal that the offset of a CS engages inhibitory processes (Klopf, 1988), the fact that it is not possible to have high temporal control over the consumption of the food pellet makes it likely that the superior acquisition of responding with the CS that was extended past the US reflects increased temporal contiguity.
Given the confound in the degree of temporal contiguity between the CS and US, it is clear that Experiments 2a and 2b did not provide an unambiguous test of delay of reinforcement and rate of reinforcement. The results of these two experiments do suggest, however, that temporal contiguity has a greater effect on responding than delay and rate of reinforcement. Thus, demonstrations of the role of reinforcement rate, and any potential role of delay of reinforcement, may only be revealed under conditions in which the temporal contiguity between events is equal.

Experiment 3
The purpose of Experiment 3 was to test the role of delay of reinforcement and rate of reinforcement under conditions in which differences in the degree of temporal contiguity between the CS and reinforcement were avoided, or at the very least substantially reduced. Four groups of mice were used. Two groups of mice were trained with a 40-s and a 70-s CS. For half of these mice, reinforcement occurred 10 s after the start of both CSs (40/10ϩ and 70/10ϩ, see Figure 1E). For the other half, the 40-s CS was reinforced 10 s after the start of the CS (40/10ϩ), and the 70-s CS was reinforced 40 s after the start of the CS (70/40ϩ). For both of these groups reinforcement was presented during the CS and there was a substantial period of time (at least 30 s) left within the CS presentation in which mice would have likely consumed the pellet. Therefore, the degree of temporal contiguity between reinforcement and the CS would have been similar between the CSs. Certainly, temporal contiguity was matched for the group in which the 40-s CS was reinforced after 10 s and the 70-s CS was reinforced after 40 s, because both CSs were presented for 30 s after the presentation of reinforcement. For the first group of mice delay of reinforcement was matched, but the rate of reinforcement was lower for the 70-s CS than for the 40-s CS. In contrast, delay of reinforcement and rate of reinforcement was confounded in the second group such that the 70-s CS had a longer delay of reinforcement and lower reinforcement rate than the 40-s CS. If the CS duration effect was weaker for the group in which delay of reinforcement was matched compared to the group in which it was not, then it would suggest that delay of reinforcement plays a role, independent of reinforcement rate, in determining the cue duration effect.

Method
Subjects and apparatus. Sixty-four experimentally naïve female C57BL/6J mice were used. They were 10 -16 weeks old at the start of testing, with a mean free-feeding weight of 19.7g (range: 16.9 -23.0g). All other details were the same as Experiment 1.
Procedure. Mice received 12 sessions of training, one per day. All mice received training with two 40-s cues, one of which was reinforced after 10 s, and the other was nonreinforced. Mice also received training with a longer duration cue that was reinforced after either the same or longer delay as the reinforced 40-s cue. Half of the mice received additional training with two 70-s cues. The remaining mice received training with two additional 160-s cues. For half of the mice trained with either 70-s or 160-s long cues, one long (70 s/160 s) cue was reinforced by presentation of a sucrose pellet after 10 s. For the remaining mice the long cue (70 s/160 s) was reinforced after 40 s. Therefore, mice were trained in one of four groups, in two of which the long cue had the same time of reinforcement as the short cue (group 70/10ϩ and group 160/10ϩ), and for the other two the long cue had a different time of reinforcement to the short cue (group 70/40ϩ and group 160/ 40ϩ). The remaining short and long duration cues were nonreinforced. Within each group, for half of the mice the short (40 s) cues were auditory (noise, clicker) and the long (70 s/160 s) cues were visual (house light, flashing LEDs with alternating 0.5 s illumination of the left and right LEDs). The opposite was true for the remaining mice. Within each of these subgroups the identity of the reinforced and nonreinforced stimuli was fully counterbalanced. Each of the four cues was presented six times per session with a fixed interval of 120 s between the offset of one cue and the onset of the next. Trials were presented in a random order with the constraint that an equal number of each cue was presented every block of eight trials. For each session all mice received the stimuli presented in the same order (e.g., 1st trial ϭ noise, 2nd trial ϭ house light, 3rd trial ϭ clicker etc.). Because the identity of shortand long-duration cues and the identity of reinforced and nonreinforced cues were counterbalanced across mice, this resulted in the order of these factors also being counterbalanced across mice.
Data analysis. The frequency of head entries into the food magazine was recorded per-second during the CS exposure prior to the presentation of reinforcement. Rates of responding during reinforced cues were converted to difference scores by subtracting the rate of responding during the equivalent period of the nonreinforced cue of the same modality. Responding was also recorded for a 10-s pre-CS period. All other details were the same as Experiment 1.

Results and Discussion
Responses per minute, displayed as difference scores, are shown in Figure 4A-4D. Mice responded similarly to all cues that were reinforced after 10 s, but responded less to those cues that were reinforced after 40 s. This was true regardless of whether the long duration cue was 70 s or 160 s. Further analysis of the Cue Duration ϫ Delay of Reinforcement of Long Cue interaction showed that when the long cue was reinforced after 10 s, responding was similar for both the short and long duration cues, F Ͻ 1, p ϭ .75. When the long cue was reinforced after 40 s, responding to the short cue was greater than to the long cue, F(1, 56) ϭ 16.8, p Ͻ .001. Responding to the long cue was significantly higher when reinforced after 10 s rather than 40 s, F(1, 56) ϭ 25.5, p Ͻ .001, but this comparison did not reach significance for responding to the short cue, F(1, 56) ϭ 3.60, p ϭ .063.
The results demonstrate that when cues were matched for delay of reinforcement there was no significant effect of reinforcement rate. To assess whether the data provided evidence for there being no effect of reinforcement rate when delay of reinforcement was controlled a Bayesian analysis was conducted. There was strong evidence for a lack of effect of cue duration (BF ϭ 0.089) and a lack of a Cue Duration ϫ Session interaction (BF ϭ 0.001) for those animals for which the long cue was reinforced after 10 s.
A corresponding analysis of responding during the nonreinforced cues (see Table 3 for means and SEMs) was conducted. Mice responded more to the short duration cue than the long duration cue and this effect was greater for groups in which the long duration cue was reinforced after 40 s compared to those for which reinforcement occurred after 10 s. There was a significant effect of cue duration , F(1, 56)  14. Further analysis of the Cue Duration ϫ Delay of Reinforcement of Long Cue interaction showed a significant effect of delay of reinforcement of long cue for the long duration cue, F(1, 56) ϭ 16.4, p Ͻ .001, but not for the short duration cue, F(1, 56) ϭ 1.06, p ϭ .31. In addition, there was an effect of cue duration when the delay of reinforcement for the short and long 213 duration cues differed, F(1, 56) ϭ 31.1, p Ͻ .001, but not when matched, F(1, 56) ϭ 1.11, p ϭ .30. Given that the difference between nonreinforced cues followed the same pattern as for the difference scores, it is unlikely that the effects found with the difference scores were an artifact of differences in responding to the nonreinforced cues.

Table 3 Mean (SEM) Responses per Minute (RPM) During the Nonreinforced Cues in Experiment 3
Group Cue duration (s)  None of these differences in baseline responding can account for the patterns of responding to the reinforced cues as measured by the difference scores. In addition, given that the trial order of cues was counterbalanced in terms of whether they were of a short or long duration or were reinforced or nonreinforced, these differences likely reflect chance variation in response rates. There were no other significant effects or interactions, p values Ͼ .11. The rates of responding during each second of the short and long duration CSs are shown in Figure 5. The rates of responding were similar for those cues reinforced after 10 s, but rate of responding was lower for the cues reinforced after 40 s. Analysis of the gradients of normalized responding (group 70/10ϩ: 40/10ϩ: M ϭ 0.01305, SEM ϭ 0.00149; 70/10ϩ: M ϭ 0.01244, SEM ϭ 0.00146; group 70/40ϩ: 40/10ϩ: M ϭ 0.01157, SEM ϭ 0.00120; 70/40ϩ: M ϭ 0.00640, SEM ϭ 0.00113; group 160/10ϩ: 40/10ϩ: M ϭ 0.01349, SEM ϭ 0.00156; 160/10ϩ: M ϭ 0.01185, SEM ϭ 0.00096; group 160/40ϩ: 40/10ϩ: M ϭ 0.01002, SEM ϭ 0.00129; 160/40ϩ: M ϭ 0.00344, SEM ϭ 0.00111; see Figure 6) across equivalent proportions of the delay to reinforcement showed significant main effects of cue duration, F(1, 56) ϭ 26.5, p Ͻ .001, p 2 ϭ .32, 90% CI [.16, .45], and delay of reinforcement of long cue, F(1, 56) ϭ 39.9, p Ͻ .001, p 2 ϭ .42, 90% CI [.25, .54], and a significant interaction between these factors, F(1, 56) ϭ 12.2, p ϭ .001, p 2 ϭ .18, 90% CI [.05, .32] (see Figure 6). All other main effects and interactions were not significant, F values Ͻ 2.02, p values Ͼ .16. Further analysis of the Cue Duration ϫ Delay of Reinforcement of Long Cue interaction showed that for animals for which the long cue was reinforced after 40 s, the responding gradient was steeper for the short cue than for the long cue, F(1, 56) ϭ 37.4, p Ͻ .001, but this was not the case when the delay of reinforcement was matched between short and long duration cues, F(1, 56) ϭ 1.37, p ϭ .25. Gradients for the long cue were

215
significantly shallower when it was reinforced after 40 s compared to when it was reinforced after 10 s, F(1, 56) ϭ 48.66, p Ͻ .001. The gradients for the short duration cue were also affected by the delay of reinforcement for the long duration cue, with gradients being shallower when the delay of reinforcement for the long duration cue was 40 s compared to 10 s, F(1, 56) ϭ 5.93, p ϭ .018 (see Figure 6). The difference between short and long delays of reinforcement on the distribution of responding replicates the effect found in Experiment 1, suggesting that timing does not scale with duration. The fact that mice trained with a 40-s delay of reinforcement showed worse timing of the 10-s cue compared to mice that only ever received reinforcement after a 10-s delay suggests that there was some generalization or interference between the 10-s and 40-s intervals that affected timing of conditioned responses to the short duration cue.
When the time of reinforcement was matched across CSs that differed in reinforcement rate, the rates of responding were similar. In contrast, a CS that was reinforced after 40 s elicited weaker responding than a CS that was reinforced after 10 s. By comparing across groups, it was clear that there was an effect of delay of reinforcement even when cues were matched for reinforcement rate. These results suggest that the time that reinforcement occurs within a cue is more important than the rate at which reinforcement occurs across the CS for determining response rates. In addition, it was clear that reinforcement rates failed to affect timing of conditioned responding.
Given that the results of Experiment 2 suggested that presenting reinforcement during the presentation of a CS, prior to termination of the cue, aids conditioning by increasing temporal contiguity, it is possible that reinforcing a cue earlier in its presentation increased temporal contiguity in Experiment 3. Thus, the long duration CSs may have had increased temporal contiguity with reinforcement when reinforced after 10 s compared to when reinforced after 40 s. This, however, seems unlikely. Sanderson, Cuell, and Bannerman (2014) have shown in mice, using a similar conditioning procedure, that when a presentation of a pellet preceded a CS by 10 s there was no excitatory conditioning (see also Delamater, Sosa, & LoLordo, 2003). In the present experiment, in all conditions, the long-duration cue was presented for at least 30 s after the presentation of the pellet. Furthermore, even if the reinforcing effects persisted for more than 30 s, it is unlikely that differences in temporal contiguity could account for differences in the rates of responding to the 160-s cue when reinforced after 10 or 40 s (i.e., it is unlikely that an extra 150 s of exposure after the US will lead to greater temporal contiguity that an extra 120 s). There was also no evidence that longer cues elicited greater

216
AUSTEN AND SANDERSON responding than shorter duration cues when the delay of reinforcement was matched. Therefore, the results suggest that, in the absence of differences in temporal contiguity, delay of reinforcement has a greater effect on conditioned responding than rate of reinforcement.

Experiment 4
In Experiment 3 conditioned responding was unaffected by differences in cue duration (i.e., 40 s, 70 s and 160 s) if the cues were reinforced after 10 s from their onset. This suggests that differences in the duration of nonreinforcement after reinforcement, within the trial, failed to affect performance. Therefore, the subsequent 150 s of nonreinforced exposure after reinforcement in the 160 s cue did not reduce responding on subsequent trials compared to a cue that was four times shorter. There may be a number of possible explanations for why this nonreinforced exposure failed to extinguish responding. One simple possibility is that the nonreinforced exposure was not processed sufficiently for extinction to occur, perhaps as a consequence of the recent reinforcement. The purpose of Experiment 4 was to test whether mice are able to learn about events that occur after a US presentation, but that occur within the same trial.
In Experiment 4 mice received conditioning with two cues that were 80 s in duration. Both cues were reinforced 10 s after the onset of the cue. One cue was reinforced again 30 s later, 40 s after the onset of the cue (10ϩ/40ϩ), but the other cue was not (10ϩ/ 40Ϫ). Mice also received training with a third cue that was not reinforced and served as a control cue to determine baseline levels of responding.

Method
Subjects and apparatus. Twenty-four experimentally naïve female C57BL/6J mice were used. They were 10 -11 weeks old at the start of testing, with a mean free-feeding weight of 18.8g (range: 16.9 -21.0g). All other details were the same as Experiment 1, with the addition of a pure tone generator (ENV-323AM) that produced a 2,900 Hz tone at 75 dB.
Procedure. Mice received 10 sessions of training with three auditory cues (tone, noise, and clicker), each with a duration of 80 s. During training, one cue (10ϩ/40Ϫ) was reinforced with a sucrose pellet 10 s after cue onset. Another cue (10ϩ/40ϩ) was similarly reinforced after 10 s, but also reinforced again 40 s after onset. A third cue (CSϪ) was nonreinforced. During each session, there were eight presentations of each cue, separated by a fixed interval of 120 s. Cues were presented in a random order with the constraint that there were four of each type every 12 trials. The allocation of the tone, noise, and clicker to the three conditions (10ϩ/40Ϫ, 10ϩ/40ϩ, and CSϪ) was counterbalanced across mice using equal numbers of the six possible combinations of cues. For each session all mice received the stimuli presented in the same order (e.g., 1st trial ϭ noise, 2nd trial ϭ tone, 3rd trial ϭ clicker etc.). Because the identity of the stimuli assigned to the different conditions was counterbalanced across mice, this resulted in the order of these conditions across trials also being counterbalanced across mice.
Data analysis. Responding was analyzed during seconds 1-10 and 31-40 for each cue, corresponding to the 10 s prior to the delivery of the two reinforcements for the 10ϩ/40ϩ cue and allowing comparable control periods for the other cues. Rates of responding during the reinforced cues were converted to difference scores by subtracting the rate of responding during the equivalent period of the nonreinforced cue. Responding was also recorded for the 10-s pre-CS period. All other details were the same as Experiment 1.

Results and Discussion
The rates of responding during the first 10 s and the period from the 31st to the 40th second are shown in Figure 7A-7B. Mice responded more to cue 10ϩ/40ϩ than to cue 10ϩ/40Ϫ during both periods, although by the end of training this difference was not present in the first period. A Cue (10ϩ/40Ϫ vs. 10ϩ/40ϩ) ϫ Period (1-10 s vs. 31-40 s) ϫ Session ANOVA revealed a significant three-way interaction between factors, F(9, 207) ϭ 4.53, p ϭ .001, p 2 ϭ .16, 90% CI [.06, .21]. All other main effects and interactions were also significant (p values Ͻ .005). The three-way interaction was analyzed by conducting separate ANOVAs for the first (1-10 s) and second (31-40 s) periods. For the first period (1-10 s) there was a significant Cue ϫ Session interaction, F(9, 207) ϭ 4.54, p Ͻ .001. Simple main effects analysis of the interaction revealed that there were significant effects of cue on Sessions 3-5 and 7, smallest F(1, 23) ϭ 4.83 (p ϭ .038), but not on the other sessions (p values Ͼ .08). For the second period (31-40 s) there was also a significant Cue ϫ Session interaction, F(9, 207) ϭ 3.95, p ϭ .015. Simple main effects analysis revealed a significant effect of cue on Sessions 1 and 3-10, smallest F(1, 23) ϭ 4.43 (p ϭ .047), but not Session 2 (p ϭ .071).
The results demonstrate that mice are able learn about events that occur after reinforcement, within the same trial. Mice re-217 DELAY AND RATE OF REINFORCEMENT IN CONDITIONING sponded more to the cue that was reinforced with two pellets per trial, one after 10 s and one after 40 s, compared to the cue to that was reinforced once after 10 s. This was true for the 10 s period prior to the time of the second reinforcement (31-40 s) and also for the first 10 s of the cues prior to the first reinforcement. These results provide clear evidence that mice are able to effectively process the continued presentation of a cue after reinforcement occurs. This suggests that the lack of difference between cues that were matched for delay of reinforcement but had differing rates of reinforcement in Experiment 3 was not due to insufficient processing of the nonreinforced exposure after the occurrence of reinforcement.

General Discussion
We previously found that the CS duration effect was abolished if the rate of reinforcement was equated across CSs (Austen et al., 2018), suggesting that the cause of the CS duration effect was sensitivity to reinforcement rate. The present results, however, provided little support for that conclusion. In Experiment 3, when CSs differed in reinforcement rate but were matched for delay of reinforcement, there was no significant difference in rate of conditioned responding. In contrast, when CSs were matched for reinforcement rate, but differed in delay of reinforcement then the CS with a short delay elicited higher rates of responding compared to the CS with a long delay.
These findings contradict rate estimation theory (Gallistel & Gibbon, 2000), which states that the acquisition of conditioned responding reflects calculation of the rate of reinforcement across cumulative exposure to a cue. The model proposes that animals represent and store CS durations and number of reinforcements in memory such that these variables can be used to derive rate information. Because of the calculation of rate occurring across cumulative CS exposure, independent of how the CS exposure is divided into specific trial durations, and independent of the delay of reinforcement within those trials, the model predicts that CSs that differ in reinforcement rate will elicit different rates of responding. The assumption that rate is calculated across cumulative exposure is key to the model's ability to explain various properties of conditioning such as the contingency effect (Rescorla, 1968), in which the background reinforcement rate (i.e., the rate in the absence of the CS) affects the rate of responding to the CS.
The results are also problematic for a simple associative account of the role of reinforcement rate in learning that assumes that changes in associative strength occur moment by moment rather than trial by trial. Thus, it is possible to derive an account of sensitivity to reinforcement rate by assuming that during periods of CS exposure in which reinforcement occurs there are increments in associative strength, but during periods of CS exposure in which reinforcement does not occur then there are decrements in associative strength. This simply results in a CS gaining associative strength that is proportional to the reinforcement rate. Therefore, CSs may differ in duration and delay of reinforcement, but as long as their reinforcement rate and cumulative duration of exposure are matched then they will gain the same associative strength.
Although rate estimation theory (Gallistel & Gibbon, 2000) proposes that the duration of cumulative CS exposure is the critical variable over which reinforcement rate is calculated, it is tempting to speculate whether other intervals could be used in order to derive estimations of rate that could also account for the results of Experiment 3. For example, rate estimation theory also assumes that animals encode the delay of reinforcement in order to time conditioned responses. If the reinforcement rate simply reflected the inverse of the delay of reinforcement on reinforced trials alone (e.g., Gibbon & Balsam, 1981), rather than the number of reinforcements across the cumulative duration of CS exposure, it would provide a potential account of the results of Experiment 3 in which cues with matched delay of reinforcement but different reinforcement rates elicited the similar levels of responding. In this situation the differences in duration of continued CS exposure after the US would have no effect on reinforcement rate. This account, however, would not explain how matching reinforcement rates over cues that differ in probability of reinforcement per trial leads to matched rates of responding. For nonreinforced trials there is no CS-US interval. This raises the issue of what temporal information is encoded on nonreinforced trials. Models such as rate estimation theory (Gallistel & Gibbon, 2000) and the application of scalar expectancy theory to acquisition of conditioned responding (Gibbon & Balsam, 1981) assume that nonreinforced trials simply add to the cumulative CS exposure and therefore reduce estimations of rate or temporal expectancy of reinforcement in the same manner as increasing the CS duration for reinforced trials. Although it is possible that variables other than cumulative exposure may account for the present results, such variables are unlikely to account for findings such as the contingency effect (Dweck & Wagner, 1970;Murphy & Baker, 2004;Rescorla, 1968) in which the reinforcement rate over cumulative exposure does appear to be the crucial factor. Thus, it is not readily obvious what other variables may be encoded which provide a satisfactory account of the results.
It is clear from Experiment 3 that extending the duration of CS exposure beyond the presentation of the US failed to extinguish conditioned responding. This was true even when there was an extra 150 s of CS exposure beyond the time at which reinforcement occurred compared to a cue for which there was an extra 30 s after reinforcement. As mentioned above, this is problematic for time-sensitive associative accounts of learning that assume that changes in learning occur moment by moment, and, therefore, nonreinforced CS exposure will lead to extinction of excitatory learning regardless of when it is presented within a trial. The lack of extinction, however, may be accounted for by a number of assumptions. For example, if it is assumed that the associability of the CS decreases within a trial due to short-term habituation (Wagner, 1981), then it is possible that the nonreinforced CS exposure after reinforcement was not sufficient for extinction of learning to occur. Alternatively, if a componential view of CS representations is assumed such that a stimulus consists of a series of temporally ordered microstimuli (e.g., Ludvig et al., 2012;Sutton & Barto, 1981Vogel et al., 2003), it is possible that the continued CS exposure after the US had no effect because the nature of the CS representation during those periods was different to its representation prior to reinforcement. Temporal difference learning models have appealed to changes in the nature of stimulus representations within a trial in order to explain various aspects of timing behavior (e.g., Williams, Todd, Chubala, & Ludvig, 2017).
In the case of the results in Experiment 3, there would have had to be little or no generalization of learning between elements of the CS representation that were processed prior to reinforcement and after reinforcement. Temporal difference learning models would predict this to be the case when it is assumed that a stimulus is represented as a complete serial compound in which each temporally activated element is entirely distinct (Sutton & Barto, 1990). The complete serial compound assumption, however, leads to incorrect, and perhaps unrealistic, predictions about the precision of timing in terms of behavior and neural correlates of prediction error learning (Gershman, Moustafa, & Ludvig, 2014;Ludvig et al., 2012). A reason for doubting explanations of the failure to observe extinction in Experiment 3 in terms of reduced associability or changes in the nature of the stimulus representation are the results of Experiment 4. In that experiment, we found that mice could learn about reinforcement that occurred after an initial reinforcement within a trial, suggesting that, at the least, the associability of the stimulus at that time point was sufficient for learning to occur, albeit the learning was excitatory rather than reflecting extinction. In addition, we found that mice responded more during the first 10 s of the cue that received two reinforcements within a trial compared to the cue that was reinforced at the first time point, but not the second. Therefore, learning about the cue prior to the second reinforcement generalized to the first 10 s period of the cue prior to the first reinforcement. This would suggest that, at the least, there was commonality between the nature of the CS representation at different time points within the CS. Temporal difference models would only be able to account for the data by making extreme assumptions about the changes in the nature of the CS representation within a trial.
The experiments presented here sought to identify the role of delay of reinforcement in the CS duration effect. A complication in trying to dissociate the effects of delay and rate of reinforcement in the CS duration effect was that attempts to control one factor while manipulating another ran the risk of introducing new confounds. Thus, in Experiment 2 the manipulation of rate, by comparing a short, delay conditioned cue with a long, simultaneously conditioned cue, led to confounds in temporal contiguity. Experiment 3 did, however, provide clear results demonstrating an effect of delay of reinforcement, but not rate of reinforcement. This dissociation was revealed only by comparing conditioned responding to cues for which reinforcement occurred during the CS presentation. It is also important to note that the results were found with female mice using an appetitive Pavlovian magazine approach procedure. Therefore, future work will need to establish the generality of these findings to other species and conditioning paradigms.
In conclusion, it is not clear how delay of reinforcement alone can account for why matching reinforcement rate between cues of different durations and different probabilities of reinforcement per trial leads to matched response rates. The present results do, however, suggest that delay of reinforcement may be more important than reinforcement rate in particular circumstances. The results provide a challenge to accounts of learning that assume that the cumulative exposure to a cue is a critical variable in determining response rates.