Turning traditional neurophysiology on its head: A naturalistic environment to study natural behaviors and cognitive tasks in freely moving monkeys

Macaque monkeys are widely used to study the neural basis of cognition. In the traditional approach, the monkey is brought into a lab to perform tasks while it is restrained to obtain stable gaze tracking and neural recordings. This unnatural setting prevents studying brain activity during natural, social and complex behaviors. Here, we designed a naturalistic environment with an integrated behavioral workstation that enables complex task training with viable gaze tracking in freely moving monkeys. We used this facility to train monkeys on a challenging same-different task. Remarkably, this facility enabled a naïve monkey to learn the task merely by observing trained monkeys. This social training was faster primarily because the naïve monkey first learned the task structure and then the same-different rule. We propose that such hybrid environments can be used to study brain activity during natural behaviors as well as during controlled cognitive tasks.


INTRODUCTION
flexibility to go back and forth freely between them. The design principles for such 23 naturalistic environments as well as standard procedures to maximize animal welfare are animals. This is a non-trivial challenge because most infrared eye trackers require 32 restraining the head (Machado and Nelson, 2011), and do not work at close quarters with 33 an elevated line of sight as would be the case for a monkey interacting with a touchscreen. 34 Here, we designed a hybrid naturalistic environment with a touchscreen 35 workstation that can be used to record brain activity in controlled cognitive tasks as well 36 as during natural and social behaviors. We show that this facility can be used to obtain 37 viable gaze signals and that it can be used to train monkeys on a same-different task by 38 taking them through a sequence of subtasks with increasing complexity. Remarkably, we 39 show that a naïve monkey was able to learn the entire task much faster merely by 40 observing two other trained monkeys. 43 We designed a novel naturalistic environment for recording brain activity during 44 controlled cognitive tasks as well as natural and social behaviors (Figure 1). Monkeys 45 were group-housed in an enriched living environment with access to a touchscreen 46 workstation where they could perform cognitive tasks for juice reward while their brain activity is recorded ( Figure 1A; see Methods). The enriched environment comprised log 48 perches and dead trees with natural as well as artificial lighting with several CCTV 49 cameras to monitor movements ( Figure 1B). We also included tall perches to which 50 animals could retreat to safety ( Figure 1C). The continuous camera recordings enabled 51 us to reconstruct activity maps of the animals with and without human interactions ( Figure   52 1D; Video S1). To allow specific animals access to the behavior room, we designed a 53 corridor with movable partitions so that the selected animal could be induced to enter 54 while restricting others ( Figure 1E). We included a squeeze partition that was not used 55 for training but was used if required for administering drugs or for routine blood testing 56 ( Figure 1F). This squeeze partition had a ratchet mechanism and locks for easy operation 57 ( Figure 1G). After traversing the corridor ( Figure 1H), monkeys entered a behavior room 58 containing a touchscreen workstation ( Figure 1I). The behavior room contained copper-59 sandwiched high pressure laminated panels that formed a closed circuit for removing 60 external noise (Section S1). The entire workflow was designed so that experimenters 61 would never have to directly handle or contact the animals during training. Even though 62 the facility contained safe perches out of reach from humans, we nonetheless developed standard protocols to easily isolate each monkey and give it access to the behaviour room 64 (see Methods). Touchscreen workstation with eye tracking in unrestrained monkeys 93 The touchscreen workstation is detailed in Figure 2. Monkeys were trained to sit 94 comfortably at the juice spout and perform tasks on the touchscreen for juice reward. The 95 workstation contained several critical design elements that enabled behavioral control 96 and eye tracking, as summarized below. 97 First, we developed a juice delivery arm with a drain mechanism that would take 98 any extra juice back out to a juice reservoir (Section S1). This was done to ensure that 99 monkeys drank juice directly from the juice spout after a correct trial instead of subverting 100 it and the accessing the spillover. Second, we developed several modular head frames 101 that were tailored to the typical monkey head shape ( Figure 2B; see also Section S1). In 102 practice, monkeys comfortably rested their chin/head against these frames and were 103 willing to perform hundreds of trials even while using the most restrictive frames. Third, 104 we included a removable hand grill to prevent the monkeys from accessing the 105 touchscreen with the left hand ( Figure 2A). This was critical not only for reducing 106 movement variability, but also to enable a direct line of sight for the eye tracker. Finally, 107 we affixed two transparent viewports above and below the touchscreen, one for an optical 108 video camera and the other for an infrared eye tracker (Figure 2A Figure 3B illustrates the example gaze signals recorded from monkey M1 during 146 two trials of the same-different task, one with a "SAME" response and the other with a 147 "DIFFERENT" response. The monkey initially looked at the hold, then at the sample, and   Can a naïve monkey learn the task by observing trained monkeys? 210 Our novel facility has the provision to allow multiple monkeys to freely move and 211 access the touchscreen workstation. We therefore wondered whether a naïve monkey 212 could learn the same-different task by observing trained monkeys. This would further 213 obviate the need for the TAT paradigm by allowing monkeys to learn from each other. 214 To investigate this possibility, we ran social training sessions in which a naïve 215 monkey (M2) was introduced along with a trained monkey into the behavior room.  Figure 4A). 243 After this interaction, M2 initiated more trials by touching the hold button (Section S3) but 244 still did not earn any juice reward by himself. increased. On this day, first chance accuracy of M2 was 53% on responded trials ( Figure   255 4B), though this was still a small proportion of all trials (7.6%, Figure 4C).  Figure 4B). The M2-M3 interactions are summarized in Figure 4B (inset). 266 From Days 14-29, M2 was trained alone and learned the task by trial and error. 267 We included an immediate repeat of error trials (  Here, we designed a novel hybrid naturalistic environment with a touchscreen 309 workstation that can be used to record brain activity in controlled cognitive tasks as well 310 as during natural and social behaviors. We demonstrate two major outcomes using this 311 facility. First, we show that viable gaze tracking can be achieved in unrestrained, freely 312 moving monkeys working at the touchscreen on a complex cognitive task (same-different 313 task). Second, we show that a naïve monkey could learn the same-different task much 314 faster by socially observing two trained monkeys doing the task. We discuss these 315 advances in relation to the existing literature below. Social learning of complex tasks in monkeys 339 We have found that a naïve monkey can learn the complex same-different task   can be challenging to prevent a monkey from getting distracted from other events in its 398 living environment and to ensure access for a specific monkey. In contrast, the latter 399 approach is better suited to control for disturbances in the living room but with the caveat 400 that it has commonly been designed for use by one monkey at a time and thus precludes 401 studying interesting behaviors where multiple monkeys can observe or interact with the 402 behavior station. 403 Here, we combined the best of both approaches such that monkeys traversed 404 away from the living room to a behavior room that had a single behavior station mounted 405 flush to its walls. As a result, monkeys did not require human intervention to get access 406 to the behavior station and were thus free to come to the behavior station when they 407 wished and interact with it for juice rewards. Our approach also enabled the dynamics 408 between each monkey to play out and determine the duration of access by each monkey. 409 Such an arrangement was of benefit to us as the experimenters as well.  420 We commissioned a facility meeting our requirements which can house a small  Finally, the behavior room hosts the behavior station on the wall separating it from 483 the control room ( Figure 1A). The behavior station consists of a touchscreen monitor and juice delivery arm ( Figure 1I) and is mounted on stainless steel channels which allow the 485 panels and hence the behavior station to be modular. These panels can be repositioned 486 or swapped with modified panels, so any change can be accommodated. The rest of the 487 behavior room is also covered in these panels (except the floor but affixed to walls unlike 488 the behavior station panels) which are layers of high-pressure laminate with a thin copper 489 sheet sandwiched in the middle ( Figure 1I and Figure S1A). This was done to insulate the Because monkeys had to sip juice from the reward arm, this itself led to fairly stable 528 head position during the task. To further stabilize the head, we designed modular head 529 frames at the top of the reward arm (Atatri Inc), onto which monkeys voluntarily rested their heads while performing tasks ( Figure S1D). We formed a variety of restraint shapes 531 with stainless-steel based on 3D scans of our monkeys with progressively increasing 532 levels of restriction (left to right, Figure S1D). Positioning their heads within the head 533 restraint was not a challenge for the monkeys and they habituated to it within tens of trials. 534 We also iterated on the structure of the reward arm, head restraint and fabricated custom 535 attachments (hand grill, Figure 2A) that allow the monkey to comfortably grip at multiple 536 locations with its feet and with the free hand and this in turn greatly reduced animal 537 movement while providing naturalistic affordances on the reward arm ( Figure 1H, right 538 most panel). 539 The reward for performing the task correctly was provided to the monkey as juice 540 drops delivered at the tip of a custom reward delivery arm ( 591 We find that this method gives a satisfactory quality of raw eye-data to touch 592 screen location mapping. We used these session-wise calibration models to transform 593 eye-data if a higher degree of accuracy was required than what is provided by the initial 594 coarse offset and scaling of the eye-signal that we manually perform in the beginning of 595 each trial. We find that even the coarse centering and scaling of raw eye-data is sufficient response and no response. We felt that maintenance of hold, before the sample is shown 643 is not crucial to task performance. Hence, we choose to make the task much easier and reduce errors by reducing the initial hold time down to 100ms which reduced the hold 645 maintenance time to 700ms from 1.1 second. When the monkey started to get reward on 646 50% of responded trials, we increase the initial hold time to be 300ms on day 16 and 647 500ms on day 17. After that the hold was 500ms throughout the training. We modified 648 inter-trial intervals (for correct and incorrect responses) and reward amount to keep M2 649 motivated to learn the task. : Custom design elements in the behavior room (A) Schematic of the copper sheet sandwiched between layers of high-pressure laminate panels. These panels are installed on the walls and roof of the behavior room and electrically connected so as to form a closed circuit to block external radio frequency noise. (B) Power spectrum (in dB) of noise recorded from the behavior room with shielding (red) and the control room without shielding (blue). The copper sandwiched panels in the behavior room and all stainless steel supporting frames were connected electrically to the ground of the pre-amplifier (Plexon Inc). Signals were recorded at 40 kHz for 1 s using a 24-ch Uprobe electrode floating in air connected to a 32channel data acquisition system (Plexon Inc). (C) Schematic of juice reward arm. At top right, a close-up view of the spout portion of the juice reward arm showing how the juice pipe and drain-pipe are concealed within a tubular stainless-steel pipe. This prevents monkeys licking any run-off juice or from tampering with the thin steel juice pipe itself. Bottom close-up shows how

METHODS
Animals. M1 and M3 participated in the Tailored Automated Training. The animals were each provided a 45-minute period of access (session) to the behavior station with no fixed order of access. Training was conducted only if animals voluntarily moved to the behavior room. Animals were moved one at a time through to behavior room, closing partition doors behind them. If the animal was not willing to go forward to the behavior room, training was not done on that day and the animal was supplemented with 50 ml of water later in the day. Weight of the animals were checked twice a week and if any sudden drop in weight was measured the animal was given time to recover (by removing water restriction and pausing training).
Stimuli. For TAT, stimuli were selected from the Hemera Objects Database and consisted of natural and man-made objects with a black background to match the screen background.
Training. The aim of the TAT was to teach monkeys the temporal same-different matching tasks (SD task), a schematic of which is shown in figure 3A. We employed TAT as a proof of concept to show that it is possible to achieve unsupervised training for animals on a complex same-different (SD) matching task. We automated the training by dividing the SD task into sub-tasks (stages) with further levels within each stage to titrate task difficulty. Animals progressed to successive levels and stages based on their performance (when accuracy on the last 50 attempted trials within a session was greater than 80%). Like recent automated training paradigms (Berger et al., 2018), we provided an opportunity to go down a level, if the animal performed poorly but we ultimately moved to a more stringent level progression where the animals were not allowed to slide back to an earlier level/stage. We started from a lower level only when the training was resumed after a long break, due to unavoidable circumstances like equipment failure or issues related to animal health. Overall, we find that the rate of learning depends on animal's underlying learning capability and the design of the automated training regime. Hence to achieve fastest learning rates, we optimized the level-wise difficulty of the automated design. In general, the progression of task difficulty across levels and stages was selected such the animal could always perform the task at above-chance performance. Although we set out to train animals using a completely automated pipeline, we also wanted to ensure that both our naive animals could complete the learning process in full without drop out as is common in many automated regimes (Calapai et al., 2017;Tulip et al., 2017;Berger et al., 2018). We implemented a pragmatic approach, to intervene and tailor the training parameters at particularly difficult stages for so as to avoid the monkey dropping out of the training process entirely.
The SD task was divided into ten conceptual stages. A single parameter was varied across levels within a stage. The smallest unit of the TAT is a trial, but composition of each trial is dependent on the current level. Each trial started with the presentation of trial initiation button and trials were separated by a variable inter-trial interval (ITI). The duration of ITI depends on the outcome of the current trial (500 ms for correct trials; 2000 ms for incorrect trials). Provision was made to change some parameters quickly without aborting the experiment. The ITI and reward per trial were adjusted within a session based on animal's performance. We increased ITI to give another level of feedback when animals were showing very high response bias by pressing only one button or when the animals were satisfied with 50 percent chance performance.
Liquid juice reward was delivered after every correct trial. We started each session with 0.2 ml of juice reward per trial. Juice reward was increased for consistent behavior but never decreased within a session. The motive behind increasing the reward was to keep the motivation high when learning a new task as any kind of error done by the animal aborts the trial. Monkeys got two distinct audio feedback tones: a high-pitched tone for correct response and a low-pitched tone for incorrect responses (including uninitiated, aborted or no response trials).

TAT Stages
Stage-1 (Touch): A green button (square) was presented on the touch screen where monkey had to touch for reward. Any touch outside was considered as error. There were two levels in this stage (Button size: 200 x 600 pixels in level 1.1 and 200 x 200 pixels in level 1.2). Center of the buttons were same as the that of the hold button in Figure 3A.

Stage 2 (Hold):
The hold button was presented and monkeys had to touch and maintain the touch within the button area until it was removed. Any touch outside the hold button was considered an error. There were thirty levels in this stage, in which hold time varied from 100 ms to 3 s in equally spaced intervals. M3 cleared all the levels but M1 was trained only up to a hold time of 2.6 s.

Stage 3 (1-Response Button):
A temporal same different task with only correct choice button was presented. Choice buttons were green colored squares and were presented above and below the hold button for same and different choices respectively. Image presentation sequence was same as that shown in Figure 3A. We had a wait to hold time for initiating the trial as 8000 ms, pre-sample delay time of 500 ms, sample-on time of 400 ms and post-sample delay of 400 ms. We reduced the time to respond in this level from 5 s to 400 ms in several steps (in 1000 ms steps till 1s, 100 ms steps till 500 ms and 50 ms steps till 400 ms). Four image pairs formed from two images were used to construct the same different task.

Stage 4 (2-Response Buttons):
In this stage the wrong choice button (also of similar dimensions and color to the hold button) was also displayed with brightness that increased from 0 to the maximum intensity (same as the correct choice button). This is a full temporal same different task with an intensity difference between correct and wrong choice buttons. Wrong button was introduced in ten steps with brightness scaled relative to the maximum intensity (scaling factor for each level: 0.2, 0.4, 0.5, 0.8, 0.85, 0.90, 0.925, 0.95, 0.975, 1). A scaling factor of 1 meant that there was no intensity difference between the choice buttons, and the monkey would have to use the visual cues (sample & test images) to perform the task. Time to respond was 800ms and all other task parameters are same as stage 3.

Stage-5 (Ad-hoc Strategies):
We introduced two new strategies (Immediate Repeat and Overlay) to facilitate same-different training. With the immediate repeat strategy, for every wrong trial, we repeated the same trial again with a lower reward (0.1 ml) for correct response. This allowed the animal to switch its response upon making an error. In the overlay strategy, we presented images of sample and test side by side blended on the correct choice button (blended image = α*image + (1-α)*choice button), where α is a fraction between 0 and 1. We started the first level of this stage by giving three kinds of additional information (Button intensity difference, Immediate Repeat and Overlay) to identify the correct response. As the levels progressed, we removed the cues slowly. First, we removed button intensity difference in 6 levels (scaling factor of wrong button intensity in each level: 0.2, 0.3, 0.5, 0.7, 0.9, 1). Second, we removed the overlay cue in 15 levels. (Blending factor α: 0.5, 0.4, 0.3, 0.2, 0.15, 0.1, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01,0). We removed the immediate repeat of error when blend cue reached α = 0.06.
Stage-6 (Test Stimulus Association): Stages 6, 7, 8 and 9 were based on a spatial version of the same-different task. In Stages 6 and 7, a new condition was introduced with overlay on correct response and this happened on 50% of trials in trial bag. The remaining trials were already learned conditions which were shown with no overlay. A level with overlay on correct response was repeated with a level without overlay. This spatial task differed from the temporal tasks in the position of the test image (shifted right or between sample and hold button) and sample ON time (sample image is presented till the trial ends). Each level introduced two new images through two specific image pairs (Images A and B are introduced through trials AA and AB). The trials only differed in the test image, so the monkey can do the task only by associating a test stimulus to the correct choice button. In all, we introduced 20 new images and 20 image pairs across levels. Since we were presenting newly introduced image pairs more often (ratio of new image pairs to learned image pairs is 1:1), the monkeys could reach 80% accuracy without attempting all learned image pairs. Hence, to check the monkey's performance on all learned image pairs, we created the last level with all 20 image pairs presented equally likely without cue.

Stage-7 (Sample Stimulus Association):
In this stage we introduced image pairs formed from two images which differed in sample image (Images A and B are introduced through image pairs AA and BA but not AA and AB). In total we introduced 8 new image pairs formed from 8 images. All other experimental conditions were same as Stage-6

Stage-8 (Sample and Test Association):
Here we presented 16 image pairs selected from Stage-6 and Stage-7 together.
Stage-9 (Spatial same-different task): All possible image pairs from 20 new images were introduced in this level and this was done along with learned pairs (ratio of new pairs is to learned pairs is 1:1 with new pairs shown with choice button overlay). In next level overlay was removed and in subsequent levels the proportion of new image pairs were increased (this was done in two levels: 75:25 and 100:0). We tested the generalization introducing two new set of images (number of images in these sets: 20 and 100) in next two levels.
Stage-10 (Temporal same-different task): The task was switched to temporal from spatial SD task. In the first level we retained the sample image and test image location but we turned off the sample image before presenting the test image. There was no delay between sample and test. Next level, the sample and test were spatially overlapping and the delay between sample and test were zero. In the subsequent levels the delay between sample and test were increased in steps (50 ms, 100ms, 200ms).