In nature, it is advantageous for an animal to be able to anticipate the spatiotemporal variability of a biologically significant event, such as food, a mate, or the threat of predation. For example, oystercatchers, a type of seabird, are known to travel long distances to specific beaches only during the times when the tide is low so that they can optimize mussel foraging (Daan & Koene, 1981). Gallistel (1990) posited that time–place–event memory codes are automatically formed for biologically significant events and that these codes can then be retrieved to guide an animal’s behavior during a current biological event. This theory has lead to the inception of time–place learning (TPL) studies (see Thorpe & Wilkie, 2006, for a review), in which an animal must associate an event with a time and place to receive reinforcement (referred to as a TP discrimination). In daily TPL, which is the focus of the present study, the location of the event varies depending on the time of day. For example, food is located in one place in the morning and another place in the afternoon.

To solve daily TPL tasks, animals can use a circadian, interval, ordinal, or alternation strategy (Carr & Wilkie, 1997a). An animal that is using a circadian strategy learns that the times of events have a fixed periodicity that is associated with different phase angles of an endogenous circadian oscillator. The animal is then able to use this information to accurately predict the time that these events occur. An animal that is using an interval timer has learned that the event of interest occurs after a period of time since the start of an external event, such as feeding or the turning on of the colony lights (see Pizzo & Crystal, 2002, 2006). An animal that is using an ordinal timing strategy has learned to anticipate the sequence of events that occur during a specific time period, but not necessarily the exact time that these events occurred (Carr & Wilkie, 1997a). Finally, animals can also acquire the tasks using a nontiming alternation strategy, which involves alternating the locations visited from session to session. Skipping one of the daily sessions and then analyzing the animals’ behavior in the next session can elucidate the type of strategy that the animals are using to complete the task (Carr & Wilkie, 1997a).

While daily TPL has been well documented in a variety of species, including birds (garden warblers, Biebach, Gordijn, & Krebs, 1989; pigeons, Saksida & Wilkie, 1994), fish (inangas, Reebs, 1999; golden shiner, Reebs, 1996), honeybees (Wahl, 1932, as cited in Reebs, 1993), ants (Schatz, Beugnon, & Lachaud, 1994), and mice (Van der Zee et al., 2008), research with rats has surprisingly resulted in inconsistent results (see Thorpe &Wilkie, 2006, for a review). The ease with which researchers demonstrate daily TPL in rats seems to be dependent, in part, on the type of task. Specifically, many of the successful daily TPL studies have used free operant procedures. In a free operant daily TPL task, the rat must perform a response, such as a lever press or a head entry, at the correct location at the appropriate time of day. An animal is considered to have formed a TP discrimination by performing a greater proportion of responses at the correct location at the correct time of day.

Two early studies of daily TPL in the rat utilized free operant procedures that required rats to press one lever in morning sessions and another lever in afternoon sessions (Carr & Wilkie, 1997b; Mistlberger, de Groot, Bossert, & Marchant, 1996). In the study by Mistlberger et al., the levers were located at the ends of T-mazes, while in the Carr and Wilkie (1997b) experiment, the levers were on opposite sides of an operant box. Evidence of learning was found only if one considered the percentage of presses to the correct lever prior to the first reinforcer; if one considered the first arm choice or the first press, respectively, it did not appear that the rats learned the task. Skipped sessions indicated that the rats in the first study used a circadian timer, while the rats in the second study used an ordinal timer.

Similarly, both Pizzo and Crystal (2002) and Aragona, Curtis, Davidson, Wang, and Stephan (2002) successfully demonstrated daily TPL in free operant tasks that used head entries in the open field maze and operant boxes, respectively. However, not all free operant procedures have successfully produced TPL. Boulos and Logothetis (1990) found that only a few of their rats showed daily TP discriminations when two levers on opposite sides of a cylindrical chamber provided food at two different times daily. Boulos and Logothetis attributed this poor performance to the fact that the levers were close together and, therefore, it was nearly as efficient to switch between the levers as it was to learn the contingency and respond on only a single lever.

Surprisingly, in discrete trial procedures in which the rat is removed from the apparatus at the end of each trial, successful demonstrations of daily TPL are more difficult to obtain. For example, Thorpe, Bates, and Willkie (2003) investigated rats’ ability to form TP discriminations in a variety of discrete trial tasks, such as the water maze, food-rewarded place preference task, and radial arm maze. While the rats did learn the locations that provided food, as evidenced by an increased tendency to go to those locations, they did not go to the correct locations at the correct time of day.

Means, Ginn, Arolfo, and Pence (2000) found that only 63 % of rats demonstrated TP associations, and only after many training trials. Furthermore, with additional training postcriterion (9 correct trials out of 10), performance declined to 70 %. In a follow-up experiment, various aspects of the procedure were manipulated in an attempt to ameliorate acquisition and performance (Means, Arolfo, Ginn, Pence, & Watson, 2000). The experiment showed that performance did not improve when one of the arms was made more distinct, when two trials were administered for each session, or when one of the daily sessions was conducted in the light and the other was conducted in the dark. Furthermore, using natural light cycles or extinguishing repeated responding to one of the arms also did not improve performance (Means, Arolfo, et al., 2000).

One factor that does usually improve performance in discrete trial daily TPL studies is response cost—in particular, the effort involved in choice and recovery cost. When the choice and recovery costs were increased by requiring the rats to climb a barrier at the start arms, by placing weighted covers over the food cups, and giving a 15-min time-out for incorrect choices, there was no improvement in TP discrimination. However, when the effort involved in choice and recovery cost was increased by placing the food at the top of a tower, a correlation was found between the height of the towers (i.e., effort involved in climbing up to locate food and climbing down when an error was made) and the proportion of rats successfully completing the task (Widman, Gordon, & Timberlake, 2000). Similarly, in water maze discrete trial versions of daily TPL tasks, effort has also been shown to be an important determinant of whether the rats learn the task. Lukoyanov, Pereira, Mesquita, and Andrade (2002) demonstrated that only severely food-deprived rats (fed 60 % of the food eaten by ad lib rats) acquired TP associations in the water maze. Although the authors hypothesized that a food-entrained oscillator mediated successful performance, Widman, Sermania, and Genismore (2004) theorized that because the amount of food restriction was drastic, the cost of the task was higher for these rats because their caloric intake was depleted and the water maze is an energetically taxing task. With this in mind, Widman and colleagues (2004) increased the response cost (effort) by adding weighted vests to the rats and found that satiated rats could acquire TP associations in this water maze task.

Given that response cost is an important factor in successful TP discrimination in discrete trial tasks, it is surprising that TP discrimination is found so easily in free operant tasks in which the response cost would appear to be low. It is difficult to argue that a VR15 or VR16 schedule (as in Carr & Wilkie, 1997b, 1999 respectively) has a higher response cost than climbing barriers, lifting weighted food covers, and waiting for a 15-min time-out to expire (as in Widman et al., 2000). Widman and colleagues proposed that the reason that the free operant tasks are successful is because they tap into the natural foraging behaviors of the rat; that is, the foraging ecology is a possible source of response cost, rather than the physical effort of the task. In free operant tasks, the trials are of a limited duration (e.g., 10 min in Carr & Wilkie 1997b, 1999; Mistlberger et al., 1996), and rats are very sensitive to the amount of food available in a patch (Widman et al., 2000). Thus, in these tasks, the more time that the rat spends responding at the incorrect location, the less food it will receive. The authors suggested that although the physical effort required in these tasks is marginal, there is a limited duration for foraging, which increases the cost associated with maximizing the amount of food obtained.

Another possible reason that free operant versions of TPL might be more successful at demonstrating learning is because they allow the rats time to explore the environment at the start of each session, without that exploratory behavior being scored as an error (Thorpe, Jacova, & Wilkie, 2004). When rats are placed in an operant box, they initially patrol the environment, and as part of this patrolling, they will occasionally press levers. Carr and Wilkie (1997b, 1999) found that if these early responses were included in the data analysis, it appeared that the rats did not learn the task. However, if a short nonreinforced period was included at the start of each session, the rats did learn the task. Similarly, in the Mistlberger et al. (1996) study, in which rats had to press levers located at the ends of the arms in a T-maze, if only the rats’ first arm choice data had been used, they would have concluded that the rats did not learn the discrimination. However, it was found that the rats focused most of their responding at the appropriate lever during the correct times, suggesting that the rats did acquire TP discriminations. In discrete trial tasks, exploratory behavior into one of the arms would be scored as an error.

The goal of the present study was to determine the relative role of response cost, in terms of both effort and foraging ecology, and the intrusion of species-typical patrolling behaviors in the discrimination of TP associations. A free operant paradigm similar to that used by Mistlberger et al. (1996), in which a lever located at an arm of a T-maze provided reinforcement in morning sessions and a lever located at the other arm provided reinforcement in afternoon sessions, was used. The response cost associated with the physical effort of the task was manipulated by varying the ratio of reinforcement (VR2 vs. VR30), while the response cost associated with foraging ecology was manipulated by varying the time of the trials for the low response cost groups (approximately 2 min vs. 10 min). If the physical effort of the task is an important factor in determining whether or not rats successfully learn the TP discrimination, it was expected that the VR30 group should acquire the task more quickly that the VR2 group. If foraging behavior is an important component of the response cost, the rats that were on the maze for a shorter duration (e.g., VR2) should perform better than the rats that were on the maze longer (e.g., VR2 10-min).

To determine the effect of species-typical behaviors, both of the VR2 groups were compared with groups that were also reinforced on a VR2 schedule but had a 2-min time-out at the start of each session (TO-VR2 and TO-VR2 10-min). The time-out period has been shown to be effective in the control of species-typical behaviors in many TPL studies and has ranged from as little as 4 s to a maximum of 2 min. We chose to use a 2-min time-out because it had been used successfully in past TPL studies (Carr, Tan, Thorpe, & Wilkie, 2001; Thorpe, Floresco, Carr, & Wilkie, 2002; Thorpe, Petrovic, & Wilkie, 2002), and we hypothesized that the more time that the rat was allowed to patrol the maze the better.

As in the Carr and Wilkie (1997b, 1999) studies, during the time-out period, the maze lights remained off, and the responses of the rats had no effect on reinforcement. Once the maze lights were turned on, the rat was reinforced for pressing according to a VR2 schedule. If the opportunity to patrol the maze is important, it would be expected that the groups with the time-outs (i.e., TO-VR2 and TO-VR2 10 min) would perform better than the groups with no time-outs (i.e., VR2 and VR2 10-min).

Method

Subjects and apparatus

To make running the experiment more feasible, the 33 male Long Evans rats were separated into two cohorts. All rats were obtained from Charles River (St. Constant, Quebec). The 16 rats in cohort 1 were approximately 57 days old at the start of training and approximately 104 days old at the start of discrimination training. One rat was dropped from this cohort because it did not consistently press the levers. The 17 rats in cohort 2 were approximately 55 days old at the beginning of training and approximately 84 days old at the start of discrimination training. Two rats were dropped from the second cohort because one did not consistently press the levers and the other rat was ill.

All of the rats received a standard rat diet (PMI Nutrition International, St. Louis, MO). Their weights were maintained at 85 % of their free-feeding weight (adjusted for age), and the rats were allowed to gain approximately 5 g per week to allow for continued growth. (This level of deprivation was not considered to be extreme, and rats maintained by us in this manner have remained in good health.) The rats were fed every day at approximately 4:00 p.m., even on days that they were not tested. The rats were housed individually in transparent plastic cages (45 × 25 × 21 cm) that were lined with aspen woodchip bedding (Necto Company, New York, NY). The rats were given paper cups twice weekly to make additional bedding. The rats were kept in a colony room that was maintained on a 12:12-h light:dark cycle, with light onset at 7:00 a.m. and offset at 7:00 p.m. During pretraining and discrimination training, 45-mg pellets (Bio Serv, Frenchtown, NJ) were used as reinforcers. The rats had free access to water at all times, except during experimental sessions.

Before and during training, the rats were handled extensively, and all of the rats received individual 20-min sessions in an enriched environment, approximately three times a week. The enrichment environment consisted of a Plexiglas enrichment box (61 × 61 × 61 cm) that was lined with aspen woodchip bedding (Necto Company, New York, NY) and contained several plastic tubes and containers, as well as a standard running wheel.

The rats were trained to lever press in a Plexiglas operant conditioning box (47 × 47 × 32 cm) that had a retractable lever (Med Associates Inc., St. Albans, VT) in the center of each of the four walls of the box. Pellet dispensers (Model ENV-203045, Med Associates, Inc., St. Alban, VT) were used to deliver the 45-mg pellets (Bio Serv, Frenchtown, NJ) to food wells that were mounted 6 cm from the floor. The box was lined with aspen woodchip bedding (Necto Company, New York, NY). The operant conditioning box was located in a room (170 × 160 cm) that contained a cabinet, a radio, and a door.

A painted wood T-maze with nonretractable levers (Model ENV-110 M, Med Associates, Inc., St. Alban, VT) attached at each end of the choice arms was used during discrimination training. Each arm of the T-maze was 53.5 cm long × 15.0 cm wide, and the T-maze was elevated 84 cm above the floor. There were no walls along the sides of the stem or choice arms of the T-maze, nor was there a covering on top of the maze. However, Plexiglas walls were attached to the end of each of the choice arms so that they could each support a lever, food cup, light, and pellet dispenser. These components were arranged in the same way as in the operant box that was used for shaping. The food cup was located 6 cm above the T-maze, while the lever and light were located 8 and 15 cm above the T-maze, respectively. The pellet dispenser was located 28 cm above the T-maze. An in-house designed controller box and computer program (Python) were used to run the maze and collect the data. The T-maze was located in a room (604 × 248 cm) that contained two tables, a window, a sink with a cabinet, two doors, a poster, and a radio.

Procedure

Pretraining

Rats were randomly assigned to one of five groups: VR30 (n = 7), VR2 (n = 8), TO-VR2 (n = 7), VR2 10-min (n = 4), and TO-VR2 10-min (n = 4). For the VR30 group, there were 3 rats from cohort 1 and 4 rats from cohort 2. For the VR2 group, there were 4 rats from each of the cohorts. For the TO-VR2 group, there were 4 rats from cohort 1 and 3 rats from cohort 2. For the VR2 10-min and TO-VR2 10-min groups, there were 2 rats from each of the cohorts in both groups. The rats were first trained to leverpress in the operant conditioning box. Only one lever was available at a time, and its wall location varied across days. Rats in all three groups were initially shaped to a VR30 schedule of reinforcement. This training took an average of 17 days.

Once rats were successfully pressing on a VR30 schedule in the operant box, they began habituation sessions on the T-maze. Once the rats were habituated to the maze, they were trained to press the levers according to a CRF schedule of reinforcement. Some of the rats required additional training on the maze to ensure that they responded on both levers. All of these pretraining sessions were conducted at times different from the eventual discrimination training times.

Finally, rats received 1 week of pretraining in which they received two daily sessions as in the discrimination training (see the next paragraph). In this phase, the incorrect lever was blocked, and they were reinforced according to the appropriate reinforcement schedule for their assigned group.

Discrimination training

Discrimination training then began, and the rats were tested twice daily, 5 days a week, for a total of 70 days (fourteen 5-day blocks). The testing began at 8:30 a.m. and 2:30 p.m. One lever provided reinforcement in morning sessions, and the other lever in afternoon sessions. The morning and afternoon locations were counterbalanced across rats. Rats were tested individually and in the same order each session. Rats were held in their home cages on a cart in the experimental room while they awaited their turn to be tested.

To begin each session, the rat was placed on the end of the stem of the T-maze, and the corresponding computer program for that rat was started immediately. Once the rat was placed on the maze, the experimenter exited the room and observed the rats’ behavior through a window in a door. The rats rarely fell off the maze, and if they did, they were placed back on immediately to continue the trial.

Rats in the VR30 group were on the maze for 10 min each session. Rats in the VR2 and TO-VR2 groups were yoked to a partner in the VR30 groups such that they received the same number of pellets. For the VR30 and VR2 groups, the lights were turned on immediately when the trial was started. However, for the TO-VR2 rats, the lights were not turned on until the 2-min time-out period had elapsed, and although the levers were accessible during the time-out period, presses did not count until this period had elapsed. The VR2 10-min and TO-VR2 10-min groups were the same as their counterparts, except that the rats remained on the maze for 10 min instead of being yoked to the rats in the VR30 group. Although these new groups were equivalent in terms of time on the maze, the rats in these groups received a lot more pellets than did those in the VR30 group (maximum of approximately 150 pellets for the VR2 10-min and TO-VR2 10-min groups, as compared with approximately 20 pellets for the other three groups). As with the TO-VR2 group, the TO-VR2 10-min group also had a 2-min time-out period at the start of every session, but in this case, the time-out period was followed by 10 min in which the levers were active. For all groups, reinforcement was contingent on presses on the correct lever, and all presses were recorded with 0.2-s accuracy by the computer.

Various dependent measures were used, including the rat’s first arm choice (entire body minus the tail in an arm), first press, and the percentage of presses on the correct lever, as compared with the incorrect lever, before the first reinforcer was administered (referred to as prereinforcement presses). The computer automatically recorded all of the lever presses, whereas the rat’s first arm choice was recorded manually by an experimenter who observed the rats’ behavior through a doorway into the experiment room. For the first cohort, the rats’ first arm choice data were not recorded until 19 days into discrimination training. This was because we were following Mistlberger et al.’s. (1996) procedure and they did not report this measure, since they said that there was no evidence of learning when it was used. However, as the present experiment progressed. we thought that this measure might be interesting in regard to how it compared with the other two measures. Thus, for the first arm choice data, we considered only the last 10 blocks (5 days per block) of the experiment for both cohorts.

To examine the effect of training over time, the data were grouped into blocks of 5 days (five morning sessions and five afternoon sessions). Each of the dependent measures was calculated as a percentage of trials that were correct on that measure within each of the blocks.

A rat was considered to have learned the task when it had achieved a criterion of 18/20 correct trials. This criterion was calculated for each of the three measures. The rat’s first press and prereinforcement press data were analyzed separately. When the prereinforcement press data were considered, a trial was coded as correct if the percentage of presses on the correct lever, as compared with the incorrect lever, was greater than 50 %.

Skipped session probes

To determine whether rats were relying on a circadian, ordinal, or alternation strategy to solve the task, probe sessions were conducted in which morning or afternoon sessions were omitted and performance on the subsequent session was analyzed. If the rats were using a circadian strategy, they should always have chosen the correct location in the session following the omitted one, regardless of whether a morning or an afternoon session was skipped. If the rats were using an ordinal strategy, they should have gone to the morning location when an afternoon session was skipped, but when a morning session was skipped, they should have incorrectly gone to the morning location in the afternoon session. If the rats were using an alternation strategy, they should always have gone to the incorrect location, regardless of which session was skipped.

These probe sessions were conducted once a rat had achieved criterion in any of the three measures. To ensure that there was enough data to analyze the prereinforcement presses measure, on the probe trials following the skipped session, the rat had to respond on the correct lever a minimum of five times before a reinforcer was given. A total of six skip session probe trials were conducted (three morning and three afternoon trials) for each rat. Only one probe was conducted a week, and it was administered only if the rat had been run the day before.

Results

Data were included in the analyses only if the rats had been tested in both sessions (morning and afternoon) that day.Footnote 1 Furthermore, the data for probe trial days were not included in the analyses of the discrimination training data for all three of the measures. This resulted in either 1 or 2 days of data being omitted per probe trial, depending on whether the morning or afternoon session had been skipped. Since the block factor follows a continuum indicating the passage of time, this factor was analyzed using trend analyses, and only the linear and quadratic effects of the block factor and interactions involving this factor are reported. Also, because there were so many one-sample t-tests conducted (50 for the first arm choice data and 70 each for the first press and prereinforcement press data), the alpha was reduced to .005 to control for inflated family-wise error for all of the t-tests.

For the TO-VR2 and TO-VR2 10-min groups, the rats’ first arm choice was recorded when the rats were first placed on the maze, but the first press and percentage of presses on the correct lever data were analyzed after the 2-min time-out period had elapsed. We included the first arm choice data for the TO-VR2 and TO-VR2 10-min rats for completeness sake; however, given our hypothesis that the time-out allowed the rats to patrol the maze, it was expected that the first arm choice would be at chance levels for these rats. The reason that we did not look at the first arm choice after the time-out was because the rats were typically already at a lever and pressing when the time-out elapsed. Therefore, the first arm choice data at the 2-min point would have been similar to the first press data.

First arm choice

The first arm choice was recorded only after the 19th session for the first cohort of rats. Initially, we ignored this variable because we assumed that similar to Mistlberger et al. (1996), we would not find any evidence of task acquisition for this measure. However, we later decided that the first arm choice data might be interesting, given the other results of the study, and then started to record it. Because of this, the first arm choice was analyzed only for the final 10 blocks of the training (i.e., blocks 5–14). Also, because there were some missing values for the first arm choice data, each block was calculated by taking an average of the data for the available days, and the missing days were not included. As a result, some of the blocks did not contain all 10 data points (i.e., 5 from the morning and 5 from the afternoon). For the VR30, VR2, TO-VR2, VR2 10-min, and TO-VR2 groups, there was an average of 96, 96, 97, 92, and 92 of the total 100 data points, respectively, for the entire experiment.

To determine whether the individual rats learned the task, we first considered whether they reached a criterion of 18/20 correct trials at any point during the final 10 blocks of the experiment. When the first arm choice data were considered, only 1 of the rats (in the VR2 group) reached criterion, and it did so at day 50 of discrimination training.

To determine whether each group learned the task, one-sample t-tests were conducted for each group to determine in which blocks the percentage of correct responses differed from chance (50 %). For the VR30 group, performance was not statistically greater than chance for any of the blocks [block 14: M = 63.21; t(6) = 2.06, p = .085]. For the VR2 rats, performance was also not statistically greater than chance for any of the blocks; however, three of the last four blocks approached significance [block 14: M = 67.81; t(6) = 2.74, p = .029]. Similarly, for the TO-VR2 group, performance was not statistically greater than chance for any of the blocks [block 14: M = 50.36; t(6) = 0.08, p = .936]. For the VR2 10-min rats, performance was not statistically greater than chance in any of the blocks [block 14: M = 58.75; t(3) = 1.48, p = .235]. Finally, for the TO-VR2 10-min group, performance was not statistically greater than chance for any of the blocks [block 14: M = 45.00; t(3) = −0.58, p = .604].

To test for evidence of learning over time, a 10 (block: blocks 5–14) × 2 (time of day: morning vs afternoon) × 5 (group) mixed-model ANOVA was conducted with block and time of day as within factors and group as the between factor. The dependent measure was the average percentage of correct first arm choices per block. The analyses indicated that there was no linear, F(1, 25) = 1.47, p = .237, or quadratic, F(1, 25) = 0.13, p = .725, effect for block (refer to Fig. 1a). Nor was there a linear group × block interaction, F(4, 25) = 0.92, p = .467. However, there was a main effect of time of day, F(1, 225) = 18.42, p < .001, with the rats performing better in the afternoon (M = 69.98) than in the morning (M = 37.72). There was also a main effect of group, F(4, 25) = 3.78, p = .016.

Fig. 1
figure 1

a The rats’ percentages of correct first arm choices for the final 10 blocks of the experiment (chance was 50 %). b Average percent correct first arm choices averaged across the 14 blocks of the experiment for each group. The errors bars represent the standard errors of the means

Finally, to test our hypotheses laid out in the introduction, four contrasts were conducted: (1) VR30 versus VR2, (2) VR30 versus [VR2 + TO-VR2 + VR2 10-min + TO-VR2 10-min], (3) [VR2 + TO-VR2] versus [VR2 10-min + TO-VR2 10-min], and (4) [VR2 + VR2 10-min] versus [TO-VR2 + TO-VR2 10-min]. The first two contrasts were designed to test for response cost differences due to effort. The third contrast was designed to test for response cost differences due to foraging. The last contrast was designed to test for the effect of allowing for a time-out. The first contrast showed no significant effect of effort, F(1, 25) = 2.073, p = .162. The second contrast also showed no significant effect of effort, F(1, 25) = 0.114, p = .738. The third contrast showed no significant differences between the groups based on how long they were on the maze, F(1, 25) = 0.031, p = .862. And finally, the fourth contrast showed a significant difference based on whether the rats received a time-out, F(1, 25) = 12.789, p = .001, with the time-out groups performing worse (M = 47.94) than the groups with no time-out (M = 59.53) (see Fig. 1b).

On the basis of the first arm choice data, it would be concluded that the rats did not learn the TP discrimination. In fact, their performance did not appear to improve across training. However, there was evidence to suggest that the rats with the time-outs performed more poorly than the other groups.

First press

The same analyses were conducted using first press. To determine whether the individual rats learned the TP discrimination, we first looked at how many rats in each group reached criterion of 18/20 correct first presses. All 7 of the rats in the VR30 group (M = 41 days), 7 of the 8 rats in the VR2 group (M = 36 days), 5 of 7 rats in the TO-VR2 group (M = 54 days), all 4 of the rats in the VR2 10-min group (M = 27 days), and 2 of the 4 rats in the TO-VR2 10-min group (M = 50 days) reached criterion during the 14 blocks of the experiment.

To determine whether each group learned the task, one-sample t-tests were conducted for each group to determine in which blocks the percentage of correct responses differed from chance (50 %). For the VR30 group, performance was statistically greater than chance in all of the blocks after block 5 [smallest significant t-value, block 8: M = 85.71; t(6) = 5.21, p = .002]. For the VR2 group, performance was statistically greater than chance in all of the blocks except for the first two [smallest significant t-value, block 3: M = 78.75; t(7) = 4.50, p < .001]. For the TO-VR2 group, performance was statistically greater than chance only in three of the blocks (blocks 5, 11, and 12) [smallest significant t-value, block 5: M = 77.14; t(6) = 5.20, p = .002].For the VR2 10-min group, performance was statistically greater than chance in eight blocks. After block 3, only blocks 6, 7, and 12 were not significant [smallest significant t value, block 9: M = 87.50; t(3) = 7.83, p = .004]. Finally, for the TO-VR2 10-min group, performance was not statistically greater than chance in any of the blocks [block 14: M = 62.50; t(3) = 1.99, p = .141].

To test for learning over time, a 14 (block: blocks 1–14) × 2 (time of day: morning vs. afternoon) × 5 (group) mixed-model ANOVA was conducted with block and time of day as within factors and group as the between factor. The dependent measure was the average percentage of first presses on the correct lever per block. The analysis indicated that there was a linear, F(1, 25) = 103.80, p < .001, and quadratic, F(1, 25) = 34.05, p < .001, effect (refer to Fig. 2a). But, there was not a main effect for time of day, F(1, 325) = 0.91, p = .350, nor was there a main effect for group (VR30, M = 78.47; VR2, M = 79.02; TO-VR2, M = 68.88; VR2 10-min, M = 82.86; TO-VR2 10-min, M = 70.36), F(4, 25) = 2.72, p = .053. However, the linear block × group interaction was significant, F(4, 25) = 3.22, p = .029.

Fig. 2
figure 2

a The rats’ percentages of correct first presses for the entire 14 blocks of the experiment (chance was 50 %). b Average percent correct first press averaged across the 14 blocks of the experiment for each group. The errors bars represent the standard errors of the means

Because there was a block × group interaction, follow-up simple main effects (repeated measures) analyses were conducted for each group. For the VR30 group, there was a linear effect for the block factor, F(1, 6) = 51.71, p < .001, ψ = 1,481.43, slope = 3.26. For the VR2 group, there was also a linear effect for the block factor, F(1, 7) = 14.62, p = .007, ψ = 856.25, slope = 1.88. For the TO-VR2 group, there was also a linear effect for the block factor, F(1, 6) = 25.87, p = .002, ψ = 921.42, slope = 2.03. For the VR2 10-min group, there was also a linear effect for the block factor, F(1, 3) = 54.77, p = .005, ψ = 1,305.00, slope = 2.87. Finally, for the TO-VR2 10-min group, however, there was not a significant linear effect for the block factor, F(1, 3) = 8.11, p = .065. These results suggest that the VR30 and VR2 10-min groups learned that task more quickly than did the VR2, the TO-VR2, and the TO-VR2 10-min groups (who might not have acquired the task at all).

Finally, to test our hypotheses laid out in the introduction, the four contrasts were again conducted. (1) No significant differences were found between the VR30 and VR2 groups, F(1, 25) = 0.016, p = .902, suggesting no effect of effort. (2) No significant differences were found when the VR30 group was compared with the others combined, F(1, 25) = 0.736, p = .398, again suggesting no effect of effort. (3) There was no significant difference between the [VR2 + TO-VR2] and [VR2 10-min + TO-VR2 10-min] groups, F(1, 25) = 0.509, p = .482, suggesting that the amount of time did not have an effect on performance. (4) There was a significant difference between the [VR2 + VR2 10-min] and [TO-VR2 + TO-VR2 10-min] groups, F(1, 25) = 9.231, p = .006, suggesting that the groups without time-outs (M = 80.30) performed better than the groups with a time-out (M = 69.42) (see Fig. 2b).

On the basis of the first press data, it appears that the majority of the rats learned the TP discrimination. This is in contrast to the first arm choice data, which showed no evidence of learning the discrimination. Again, it appears that a time-out at the start of each session impairs performance, since both the TO groups performed worse than all the non-TO groups. The only part of the analyses that was not entirely consistent with this picture is the linear × group interaction, which suggests that the VR2 group also learned the task at a slower rate than did the other non-TO groups. Further inspection of overall performance, however, demonstrates that the VR2 group performed just as well as the VR30 and VR2 10-min groups (see Fig. 2b). This suggests that the linear trend for the VR2 group is smaller only because the learning for this group was more quadratic.

Prereinforcement presses

To determine whether each individual rat learned the TP discrimination, we first looked to see how many rats in each group reached the criterion of 18/20. A trial was coded as correct if the percentage of presses on the correct lever, as compared with the incorrect lever, was greater than 50 %. All of the rats in the VR30 (M = 25 days), VR2 (M = 29 days), VR2 10-min (M = 30 days), and TO-VR2 10-min (M = 35 days) groups achieved criterion, but only 5 of the 7 rats in the TO-VR2 (M = 51 days) group achieved criterion when the prereinforcement presses were considered. Both of the rats that failed to reach criterion when this measure was used also failed to reach criterion when the first press data were considered.

To determine whether each group of rats learned the task, one-sample t-tests were conducted for each group to determine in which blocks the percentages differed from chance (50 %). For the VR30 group, performance was statistically greater than chance in all of the blocks except for the first four [smallest significant t-value, block 5: M = 86.51; t(6) = 6.44, p = .001]. For the VR2 group, performance was statistically greater than chance in all of the blocks except for the first one [smallest significant t-value, block 2: M = 73.83; t(7) = 7.04, p < .001]. For the TO-VR2 group, performance was statistically greater than chance in all of the blocks except for the first three [smallest significant t-value, block 14: M = 77.99, t(6) = 4.36, p = .005]. For the VR2 10-min group, except for block 6, performance was statistically greater than chance in all of the blocks after block 3 [smallest significant t value, block 4: M = 82.65, t(3) = 10.42, p = .002]. Finally, for the TO-VR2 10-min group, performance was statistically greater than chance only in blocks 6 and 8 [smallest significant t value, block 6: M = 86.86; t(3) = 7.23, p = .005].

A 14 (block: blocks 1–14) × 2 (time of day: morning vs. afternoon) × 5 (group) mixed-model ANOVA was conducted with block and time of day as within factors and group as the between factor. The dependent measure was the average percentage of presses on the correct lever prior to the first reinforcer, per block. The analyses indicated that there was a linear, F(1, 25) = 104.46, p < .001, and quadratic, F(1, 25) = 77.25, p < .001, effect (refer to Fig. 3a). There was not a main effect for time of day, F(1, 325) = 0.47, p = .498; however, there was a main effect of group, F(4, 25) = 3.04, p = .036. There was also a significant linear block × group interaction, F(4, 25) = 3.09, p = .034.

Fig. 3
figure 3

a The rats’ average percentages of presses on the correct lever before reinforcement for the entire 14 blocks of the experiment (chance was 50 %). b Average percentage of presses on the correct lever before reinforcement averaged across the 14 blocks of the experiment for each group. The errors bars represent the standard errors of the means

Because there was a block × group interaction, follow-up simple main effect (repeated measures) analyses were conducted for each group. For the VR30 group, there was a linear effect for the block factor, F(1, 6) = 29.57, p = .002, ψ = 1,351.74, slope = 2.97. For the VR2 group, there was also a linear effect for the block factor, F(1, 7) = 23.72, p = .002, ψ = 718.58, slope = 1.58. For the TO-VR2 group, there was also a linear effect for the block factor, F(1, 6) = 38.11, p = .001, ψ = 792.75, slope = 1.74. For the VR2 10-min group, there was also a linear effect for the block factor, F(1, 3) = 102.17, p = .002, ψ = 1,035.48, slope = 2.28. Finally, for the TO-VR2 10-min group, there was not a significant linear effect for the block factor, F(1, 3) = 6.71, p = .081. These results parallel those of the first press, which suggests that the VR30 and VR2 10-min group learn more quickly than the VR2, TO-VR2, and TO-VR2 10-min groups (with the TO-VR2 10-min group not learning at all). However, if overall performance is taken into account (see Fig. 3b), it appears that the VR2 group does perform just as well as the VR30 and VR2 10-min groups, implying a stronger amount of quadratic learning in the VR2 group.

Finally, to test our hypotheses laid out in the introduction, four contrasts were conducted: (1) VR30 versus VR2, (2) VR30 versus [VR2 + TO-VR2 + VR2 10-min + TO-VR2 10-min], (3) [VR2 + TO-VR2] versus [VR2 10-min + TO-VR2 10-min], and (4) [VR2 + VR2 10-min] versus [TO-VR2 + TO-VR2 10-min]. Both the first and the second contrasts indicated that there is no effect of effort, F(1, 25) = 0.032, p = .859, and F(1, 25) = 0.628, p = .436, respectively. The third contrast showed that there does not appear to be a difference between groups based on the length of the session, F(1, 25) = 0.603, p = .445. Again, the fourth contrast suggests that the groups with a timeout (M = 77.19) did significantly worse than the groups without the time-out (M = 85.91), F(1, 25) = 10.686, p = .003 (see Fig. 3b).

A pattern of results based on the prereinforcement presses was found that was similar to that based on the first press data. Namely, the majority of the rats learned the task, and the groups with a time-out at the start of the session performed more poorly than the other groups.

Skipped session probes

To determine which strategy the rats were using to solve the task, accuracy on the sessions following a skipped session was analyzed. If the rats tended to be correct following both skipped morning and skipped afternoon sessions, they were labeled circadian timers. If the rats tended to be correct following skipped afternoon sessions but incorrect following skipped morning sessions, they were labeled ordinal timers. And if they were incorrect following both types of skipped sessions, they were labeled alternators. In the VR30 group, 4 rats used a circadian strategy, and 3 rats used an ordinal strategy. In the VR2 group, 7 rats used a circadian strategy, and 1 rat used an alternation strategy. And in the TO-VR2 group, 4 rats used a circadian strategy, and 1 rat used an ordinal strategy. One rat in the TO-VR2 10-min group did not receive any probe trials, because it failed to reach criterion. Of the 3 rats that received probes in the TO-VR2 10-min group, 1 rat used an ordinal strategy, and the remaining 2 rats had a pattern of results that could not be interpreted as one of the known strategies, because they tended to choose the correct lever following skipped morning sessions and to choose incorrectly following skipped afternoon sessions. In the VR2 10-min group, 2 rats used a circadian strategy, and 1 rat used an ordinal strategy. The strategy used by the remaining rat in this group could also not be determined. Overall, the majority of rats (17/27) that received probe trials used a circadian strategy.

Discussion

The purpose of the present study was to determine the relative role of response cost, in terms of both effort and foraging ecology, and the intrusion of species-typical exploratory behaviors in the discrimination of TP associations. We hypothesized that either response cost or species-typical behaviors could explain why some studies find that rats learn daily TPL tasks and other studies do not. The results suggest, however, that success or failure on a free operant daily TPL task does not depend on response cost, nor does it depend on the inclusion of a time-out for species-typical behaviors.

In the present task, response cost through effort was manipulated by requiring one group of rats to respond 15 times more often for reinforcement than another group. Nevertheless, there was no difference in performance between these two groups. This is in contrast to the findings of Widman et al. (2000; Widman et al., 2004), which suggested that the physical effort of the task is an important component of response cost that ultimately leads to successful discrimination. One possible explanation is that learning requires only a minimum amount of response cost and that a VR2 schedule, while seeming to require little effort, actually meets this lower threshold. Nevertheless, although it is possible that a VR2 exceeds the lowest level of difficulty necessary, it seems unlikely that pressing on a VR2 schedule is that much more difficult than some of the tasks in discrete trials tasks, such as moving weighted food covers or climbing over barricades, where rats did not learn the task (Widman et al., 2000).

Response cost in terms of foraging conditions was also manipulated in this study by varying the amount of time that the rat was on the maze per session. Widman and colleagues (2000) proposed that a high response cost is inherent in free operant tasks, because the nature of the task engages rats’ natural foraging behavior, and it was this high response cost that explained the success of rats learning TPL in free operant tasks. Because the rats have only limited access to food, they would therefore place a high cost on maximizing the amount of food obtained in the patch. If this is true, we should have found that groups with a shorter amount of time on the maze (i.e., VR2 and TO-VR2) would outperform the groups that were on the maze longer (i.e., VR2 10-min and TO-VR2 10-min). However, there was no difference between the groups despite a large difference in the length of the sessions. These results challenge the Widman et al. (2000) theory, which suggested that the response cost in the free operant tasks was still high because rats try to maximize the amount of food obtained in a patch and there is a higher cost placed on this behavior when there is a limited patch duration. Therefore, it would appear that high response cost, in terms of either effort or time on the maze, does not predict improved performance in free operant daily TPL tasks.

The results also did not support the species-typical behavior hypothesis. Surprisingly, the addition of a 2-min time-out period at the start of each session resulted in poorer performance. This is in direct contradiction to our hypothesis that the time-out period would result in improved performance because it would allow rats time to patrol the maze without that exploratory behavior being scored as an error. Carr and Wilkie (1997b, 1999) successfully implemented time-out periods in their operant box studies of daily TPL.Footnote 2 In these studies, the rats’ first lever press was not indicative of learning, because the rats patrolled the box and pressed a variety of levers before settling on the correct lever. However, in our case, the first lever press did show evidence of task acquisition, which suggests that in our paradigm, the rats did not need a time-out period in which lever presses were not counted toward reinforcement. Instead, rats in the present study may have satisfied their exploratory tendencies simply by exploring the maze, rather than pressing the levers. This might also explain the discrepancies between the first arm choice and first press data. Therefore, the first press and prereinforcement presses are a more accurate measure of what the rat has learned, because the first arm choice is confounded by species-typical exploratory behavior.

This raises an important question about whether rats in discrete trial daily TPL tasks did, in fact, learn the discrimination but, because of the dependent measures used, this learning was not evident. In the majority of the unsuccessful discrete trial daily TPL designs, the dependent variable was first arm choice (e.g., T-maze in Means et al., 2000; radial arm mazes in Thorpe et al., 2003, and White & Timberlake, 1990).The role of response cost in discrete trial experiments may not be to increase “learning” but, rather, to inhibit species-typical exploratory behaviors. What is puzzling, however, is why exploratory behavior is a problem in TPL designs but not in similar fields, such as spatial learning. For example, in successful spatial learning studies (e.g., Skinner et al., 2003), the rat’s first arm choice data show evidence of learning.

Given that neither the intrusive effect of exploratory behaviors nor response cost is able to fully explain why rats perform better on free operant versions of the task, we must look for other possible explanations. First, it is possible that performing an operant behavior in the goal location strengthens the TP association; however, it is not readily apparent why this would be so. Second, most free operant studies use variable ratio reinforcement schedules, and the desired response in the correct location may be bolstered by the partial reinforcement schedule. Third, in free operant tasks, the behavior is performed many times in a session, as compared with the single-arm choice that is often employed in discrete trial tasks, and this increase in the number of trials may have a positive effect on performance. However if this were the case, we would have expected the 10-min groups to outperform the other groups. And finally, sessions are typically longer in free operant tasks than in discrete trial tasks, and therefore, the rats spend significantly more time in the correct location than they do in discrete trial tasks. As a result of increased responding and session duration, the rats in free operant tasks also receive much more of the reward than they would in discrete trial daily TPL studies. But again, if either the time on the maze or the amount of food was important, we would have expected to see improved performance in the 10-min groups.

Another possibility comes from an occasion-setting hypothesis proposed by Means, Ginn, et al. (2000), which suggests that rats use time of day as an occasion setter rather than as a discriminative stimulus. When the rat is in the start location, time might simply signal that food is available, rather than signaling what response should be made. Expanding on this, it is possible that when the maze is traversed, visual, tactile, olfactory, or positional cues act as discriminative cues that signal what response should be performed. Therefore, rats might not be able to choose the correct response (i.e., pressing the correct lever) until they are in the correct physical location. This would explain why the first arm choice is not a good indicator of learning but first leverpress is.

The distinction between the species-typical exploratory behaviors and the occasion-setting hypotheses is similar to the performance versus learning distinction. In the species-typical behavior situation, it is assumed that the rats know the TP contingencies and a high response cost is necessary to inhibit the exploratory behaviors that mask evidence of this learning. In the occasion-setting scenario, it is assumed that the rats do not know the TP contingency and a high response cost is necessary for the rats to bind the information together. Future research is still needed to determine which of these two hypotheses is correct.

Importantly, the results of this study stress the importance of carefully choosing the dependent measure in daily TPL tasks. In the present study, it would have been concluded that none of the groups learned the task if we had measured and analyzed only the first arm choice. It was only when we included analysis of the first lever press and the percentage of presses to the correct lever prior to reinforcement that we saw concrete evidence that the rats did, in fact, learn the TP association. Future research on daily TPL, using either free operant or discrete trial designs, needs to be cognizant of this issue.