ManyDogs 1: A Multi-Lab Replication Study of Dogs’ Pointing Comprehension

34 To promote collaboration across canine science, address reproducibility issues, and 35 advance open science practices within animal cognition, we have launched the ManyDogs 36 consortium, modeled on similar ManyX projects in other fields. We aimed to create a 37 collaborative network that (a) uses large, diverse samples to investigate and replicate findings, (b) 38 promotes open science practices of preregistering hypotheses, methods, and analysis plans, (c) 39 investigates the influence of differences across populations and breeds, and (d) examines how 40 different research methods and testing environments influence the robustness of results. Our first 41 study combines a phenomenon that appears to be highly robust—dogs’ ability to follow human 42 pointing—with a question that remains controversial: do dogs interpret pointing as a social 43 communicative gesture or as a simple associative cue? We collected preliminary data (N = 61) 44 from a single laboratory on two conditions of a 2-alternative object choice task: (1) Ostensive 45 (experimenter pointed to a baited cup after making eye-contact and saying the dog’s name); (2) 46 Non-ostensive (experimenter pointed to a baited cup without making eye-contact or saying the 47 dog’s name). Dogs followed the ostensive point, but not the non-ostensive point, significantly 48 more often than expected by chance. Preliminary results also provided suggestive evidence for 49 variability in point-following across dog breeds. The next phase is the global participation stage 50 of the project. We propose to replicate this protocol in a large and diverse sample of research 51 sites, simultaneously assessing replicability between labs and further investigating the question of 52 dogs’ point-following comprehension. 53

To achieve these goals, we will use a "single study" approach, in which we design one 125 specific study for all participating labs to conduct in parallel. This approach was modeled after 126 the ManyBabies project, and since many of the logistical concerns of infant research are similar 127 to those found in canine research, this approach provided appropriate structure for our first study. 128 First, as with any research with non-verbal individuals (e.g., infants, non-human animals), 129 research with dogs is typically more time intensive than adult human psychology research, as all 130 dogs have to be tested one-by-one with extensive training phases on longer behavioral measures. 131 Second, it can be difficult to determine the cause of contradictory findings given vast individual, 132 cultural, training-related, and breed-related differences among canine populations. Due to the 133 intersections of these differences, it is very difficult to determine the reason behind failed 134 replications across labs: do they reflect meaningful individual differences across different 135 populations, or different methodological approaches across labs? Implementing a single, 136 methodologically uniform study across labs will provide the opportunity for us to directly 137 investigate some of these sources of variability. 138 For our first study, we have chosen to investigate dogs' interpretation of human pointing 139 gestures. Dogs' ability to follow human pointing is a highly robust finding in canine science (e.g., To study this ability further, and assess the 143 feasibility of the ManyDogs approach, we have chosen a simple choice task that can be 144 standardized across dog labs, addressing a question that is theoretically interesting to many 145 researchers in the field: how do dogs understand and act on human pointing? Do they perceive it as a social communicative gesture-whether informative or imperative-or as a simple 147 associative cue? Social communicative gestures, such as pointing, convey information from the 148 signaler to the observer, and are frequently enhanced by ostensive cues (such as eye-contact, gaze 149 alternation to a target, or vocal signals) that make the intentionally informative nature of the gesture understood (Csibra, 2010). Another way to interpret an intentional pointing gesture is that 151 the signaler is providing an imperative that requires a particular response from the observer (e.g., 152 Kirchhofer et al., 2012). While these two accounts lead to differences in how the cue is received 153 and understood, both involve social signals. However, it has also been proposed that point 154 following in dogs is based on associative learning mechanisms without any specific, 'infant-like' 155 understanding of the human's communicative-referential intention (e.g., Wynne et al., 2008). 156 Thus, point following in dogs could be the result of learning to associate a reward such as food 157 with either the specific gesture, or human hands more generally. We outline our hypotheses for 158 these various explanations below. 159 With a single experiment that can be carried out at most canine research sites and is 160 intended for widespread global participation, we intend to explore dogs' responses in two 161 different pointing conditions: an ostensive condition (pointing with eye-contact and dog-directed 162 speech) and a non-ostensive condition (pointing without accompanying eye-contact or speech). 163 By investigating dogs' responses to these two contrasting pointing styles with a large and diverse 164 sample, we aim to shed light on dogs' understanding of human pointing gestures, but more 165 importantly, also establish a foundation for multi-lab open science collaborations in canine 166 science. 167 One of the earliest findings in canine science that catalyzed the growth of the field is that 168 dogs follow pointing gestures more accurately, spontaneously, and flexibly than other species, such as great apes (Bräuer et al., 2006). It is now well-replicated that dogs follow human pointing 170 (Miklósi et al., 1998;Soproni et (Kaminski et al., 2012). Dogs were more likely to follow the pointing gesture if the experimenter was making eye contact than if she was not. In fact, dogs in the 194 condition without ostensive eye contact did not follow the pointing gesture above chance levels, 195 while dogs in the condition with ostensive eye contact did. This suggests that ostensive cues may 196 be necessary for dogs to follow pointing. Crucially, however, although eye contact is a sufficient 197 ostensive cue, it is not a necessary cue, as dogs follow pointing gestures even when a person's 198 back is turned, as long as they use high-pitched speech (Kaminski et al., 2012). In another study, 199 an experimenter pointed with ostensive cues (i.e., eye contact and calling the dog's name) either 200 preceding or following the gesture (Tauzin et al., 2015a). Dogs were more likely to follow 201 pointing gestures if the ostensive cues preceded the pointing than if they came after, and only 202 performed above chance levels when the ostensive cues preceded the gesture. Together, these two 203 studies provide promising initial evidence that dogs may find ostensive cues necessary for 204 following pointing gestures. However, in some instances neutral cues performed before the 205 pointing gesture, such as hand clapping (e.g., clapping control condition, Tauzin  as higher-level theories such as Natural Pedagogy theory propose (Csibra, 2010). However, 210 assessing this will require further experiments, with proper control conditions and clear, 211 contrasting predictions. The latter is especially important given that higher-level theories 212 incorporate attentional mechanisms in their explanations; however, this is beyond the scope of 213 the current replication study. 214 In this study, we aim to test if ostensive cueing has a facilitating effect on dogs' ability to act as imperatives for the dog, inducing a 'ready-to-obey' attitude that may result from the 242 domestication of dogs and/or from their extensive experience with humans. This claim is 243 supported by evidence that dogs prefer following a human's gesture even if it is against their 244 better knowledge (Scheider et al., 2013;Szetei et al., 2003), although this may also be analogous 245 to human infants, as explained by the Natural Pedagogy account (Csibra & Gergely, 2009). 246 Unlike the informative account, there is no clear prediction on dogs' point-following behavior in 247 the Non-ostensive condition if they view it as an imperative; it is possible they would follow 248 pointing equally in both conditions, or it is possible that the ostensive cues would still signal 249 intentionality and result in higher levels of point-following in the Ostensive condition. Thus, our 250 planned experimental contrast will not definitively answer this question. However, we expect that 251 if dogs view pointing cues as imperative, training history and trainability would be significant 252 predictors of their performance in both conditions. 253 Our third and final prediction for the study is that, as has previously been demonstrated in 254 similar paradigms (Bray et al., 2020a(Bray et al., , 2021, dogs are not using olfactory cues to find hidden 255 food in this task, and thus we will not see group level performance that is significantly above 256 chance in the Odor Control condition. 257 In this registered report, we first present the results of preliminary data collection of 258 ostensive versus non-ostensive point-following-validating our pre-registered protocol within a 259 single lab-and then outline the proposed expansion of the study, which will follow identical 260 procedures but include data from multiple labs. The labs will be recruited through an open call to Here, we present a proposed study design to address our research questions. In addition, 264 we include preliminary data from an initial pre-registered study from a single laboratory. touching the cup with their snout or a front paw (not an ear, back leg, or tail). If the subject does 276 not make a choice within 25 s, a "no-choice" will be recorded and the trial repeated. If the subject 277 has two no-choice responses in a row, they will undergo refamiliarization prior to reattempting to 278 complete the warm-up phase or test trials (see refamiliarization procedure below). 279 Throughout the study, the handler will sit in a chair behind the dog, holding the dog 280 stationary and facing toward the experimenter while the baiting is carried out. The experimenter 281 will be a trained researcher and will maintain a seated position during trials, looking at the floor 282 during the entirety of each choice period to avoid cueing the subject (Figure 1). The handler may 283 be either a trained researcher or the dog's guardian, as appropriate for a given lab. In cases where remain in the room, seated behind the handler. To minimize the potential for unintentional 286 cueing, trained handlers will close their eyes during baiting and cueing (opening them only once 287 the dog has been released), while guardian handlers will close their eyes for the entirety of the 288 trial duration. We believe that this protocol will sufficiently ensure that dogs are not cued to 289 choose a particular location by the handler, especially given that previous empirical work aimed 290 at assessing the Clever Hans effect in point-following tasks in dogs suggests that the effects of 291 any unintentional cueing may be less robust than is often suggested (Schmidjell et al., 2012;292 Hegedüs et al., 2013). 293

Warm-ups 295
To familiarize subjects with the testing space, the experimenter, and finding food under 296 cups, a series of warm-up exercises will be conducted. These warm-ups are not intended to be 297 predictive of test performance, simply to build an association between cups and rewards and 298 gauge the subject's willingness to participate in the task and indicate a choice (in a similar 299 paradigm, Bray et al. 2021 found that performance on warm-ups was not predictive of 300 performance on a pointing task). Throughout the warm-up phases, dogs will be spoken to in a 301 high-pitched voice using pet-directed speech; additionally, experimenters will attempt to make 302 eye contact with subjects at the beginning of each trial when showing them the food reward. All 303 cups used for warm-ups will be false-baited to ensure that the cups smell like food and to 304 minimize dogs' ability to choose cups based on their odor. Subjects will proceed to test trials 305 after completing all phases of the warm-ups, or after 15 minutes has elapsed from beginning 306 Phase 1. If, during warm-ups, subjects do not respond on two consecutive trials they will undergo First, there will be at least two repetitions of visible treat placement on the floor in front of 311 the experimenter to ensure the subject is willing to approach the experimenter and eat off the 312 floor in the testing area. Additional trials may be used as necessary. After the subject retrieves the 313 treat successfully from each visible placement, the experimenter will play a free-form cup game 314 to familiarize the subject with finding treats under cups and to encourage them to indicate a 315 choice by touching the cup. In the free-form cup game, the experimenter will show a single treat 316 before placing it on the floor and covering it with a cup. The experimenter will vocally encourage 317 the subject to approach and touch the cup, rewarding them with the treat underneath. This hiding 318 process will be repeated at least three times or until the subject readily touches the cup. On every 319 trial (true of all trial types throughout the study), subjects are allowed to make only one choice 320 and will be rewarded on trials where they touch the baited cup first. Upon choosing, the 321 experimenter will lift the cup, exposing the treat for the subject to eat. 322

Phase 2: One-cup Alternating 323
The second phase familiarizes the subject with the setup and general trial procedure and 324 ensures they are willing to approach the cup locations to the right and left of the experimenter 325 ( Figure 1). 326 In this phase, only one cup will be presented in each trial and placed at either the right or 327 left of the experimenter, in one of the two designated cup positions, which are 1 m apart from 328 each other, along a line 1.35 m in front of the dog's starting box (see Figure 1 and Figure 2). At 329 the start of each trial, the reward will be visibly placed under the cup; the experimenter will 330 attempt to make eye contact with the dog as they bait the cup. The subject will then be required to 331 indicate a choice by physically touching the cup on four trials within a maximum number of 332 seven trials. After each successful trial, the cup will be presented on the opposite side to ensure the subject receives two rewards in each location. Subjects that do not complete four touches 334 within seven trials will be excluded (see refamiliarization and abort criteria below). 335

Phase 3: Two-cup Alternating 336
The third phase ensures that the subject attends to the experimenter's actions, is willing to 337 approach both cup locations when a cup is present at each location simultaneously (i.e., not side-338 biased), and is not choosing randomly. These trials will be identical to the previous phase, except 339 that two identical cups will be used, such that the subject must attend while one cup is baited by 340 the experimenter in order to choose correctly. The experimenter will attempt to make eye contact 341 with the dog as they visibly bait the cup. Several predetermined sequences of baiting locations 342 (four pseudo-random orders, with no more than two trials in a row on the same side) will be 343 counterbalanced across the conditions within a lab (each sequence used four times within the 344 minimum sample of subjects). Subjects will be required to choose correctly on the first 345 presentation of four of the most recent six trials (sliding window) to advance to the test trials; 346 trials in which the dog does not choose correctly will be immediately repeated to minimize side 347 biases. Subjects that do not meet this criterion within 20 total trials (including repeated trials) will 348 be excluded. The experimental setup is shown in Figure 1. 349

Test Trials 350
The test trials will include two blocks of eight trials each-one block for each of the two 351 conditions (ostensive vs. non-ostensive)-with the order counterbalanced across individuals. The 352 two blocks will be separated by a one min play break and a re-familiarization (two trials of the 353 two-cup alternating procedure from the warm-up Phase 3). 354 In both conditions, occluded baiting will be used and each trial begins with the occluder 355 placed in front of the experimenter, by the experimenter, hiding the two cups from the subject's view. Both cups will be false-baited to minimize the dogs' ability to use odor cues. The 357 experimenter will first visually show the subject the food reward, and then place the reward 358 underneath one of two cups, both of which will be hidden behind the occluder (standardized size 359 across labs: 30 centimeters (cm) tall x 58 cm wide). The experimenter will then remove the 360 occluder and place it behind them, then simultaneously slide the two cups outward from their 361 central position until they are 1 m apart, and then provide one of the pointing cues (described 362 below). Across conditions, experimenters will use a contralateral momentary point, holding the 363 point stationary for 2 s before returning to resting position, maintaining a downward gaze. 364 Although there will be variation across labs and experimenters, the experimenter's finger will be 365 approximately 30 cm from the cup during the pointing cue. Once in resting position, and after 366 waiting for 1 s, experimenters will cue the handler to release the subject using a neutral word 367 ("now") and neutral tone to avoid additional social cueing from the experimenter. The handler 368 will release the subject by dropping the leash and saying "okay!" or any similar release command 369 usually used with the subject on which the subject was previously trained. The dog may only 370 choose one cup per trial and will be prevented from making a second choice by removal of the 371 cups or blocking the dog's access. If they choose the baited cup, they are allowed to eat the food; 372 if they choose the unbaited cup, they are shown the empty space under the cup and no reward is 373 given. On test trials, no praise is given for choosing the baited cup. Except for the gesturing 374 components, detailed below, all other aspects of the test trials will be identical in both conditions. 375 The primary dependent measure for all test trials will be the proportion of trials in which 376 the subject chooses the baited cup. Subjects have 25 s to make a choice on each trial, and they 377 must complete all test trials of both pointing conditions to be included in registered analyses. and say "[dog name], look!" in high-pitched pet-directed speech, while visibly presenting the 382 treat. After treat placement, cup movement, and occluder removal, the experimenter again repeats 383 "[dog name], look!" in pet-directed speech and makes eye contact before presenting the pointing 384 gesture (see Figure 3). While giving the neutral release signal and while the subject approaches, 385 the experimenter will look down at the floor directly in front of them. 386

Non-ostensive Condition 387
At the start of each non-ostensive trial, the experimenter will look down and clear their 388 throat to get the subject's attention while presenting the treat. Before pointing, the experimenter 389 will clear their throat again to attract the subject's attention and continue to avert their gaze by 390 looking at the ground in front of them while they present the momentary pointing gesture, and 391 while the subject approaches and indicates a choice. Throat clearing was chosen as an easy to 392 produce cue that is familiar to dogs, and not generally associated with ostensive cues or 393 intentional communication, but that would still attract the dog's attention thus balancing auditory 394 cues across pointing conditions. The experimenter will not speak to the dog during the non-395 ostensive trials, only speaking the neutral "now" as a cue for the handler to release the dog. 396

Odor Control Condition 397
After both blocks of test trials, another one m play break will take place. Finally, in the 398 four odor control trials, the cups will be baited identically to the test trials, except: (1) clean, un-399 baited cups will be used, without a treat taped into the cup (thus making it easier for subjects to 400 potentially use scent cues if they are using an olfactory search strategy), (2) only one verbal cue 401 will be given when presenting the treat, "[dog name], look," and (3) no pointing gesture will be 402 provided before the subject is released to search. Based on previous results with similar paradigms Bray et al. (2020a), we expect most subjects to perform at chance levels on these 404 trials. We will therefore use a reduced number of odor control trials to avoid dogs getting 405 discouraged and refusing to participate. This data will not be used on an individual level to 406 exclude subjects, but rather used in post-hoc analyses to investigate dogs' ability to use olfactory 407 information, or other unintentional cues, at the level of lab, breed, or training background. 408 preceded by ostensive cues is nevertheless informative for determining point following behavior

Coding and Reliability 428
Choices will be coded live by the experimenter. Additionally, videos will be recorded 429 when possible to enable reliability coding, as well as coding of additional exploratory measures. 430 For each participating lab a subset of the data (at least 8 subjects for data submissions with ≤ 40 431 subjects, see sampling plan below, or 20% of subjects of data submission containing > 40 432 subjects) will be re-coded for reliability. Recoding should contain equal numbers of subjects from 433 each pointing condition. When possible, reliability coding will be done from video by a coder 434 who is blind to the hypothesis of the project; otherwise, a secondary live coder will be used (only 435 in the event that video data collection is impossible). Labs whose data does not meet the inter-436 rater reliability threshold of ≥ 0.9 will be excluded. 437

Survey Data 438
Prior to participation in the behavior study, dog owners and guardians will complete a 439 survey on their dog's background including breed, training history, and other demographics. Dog 440 owners and guardians will also complete the Canine Behavioral Assessment and Research 441 Questionnaire (C-BARQ©, www.cbarq.org) (Serpell & Hsu, 2001;Hsu & Serpell, 2003). See 442 supplementary materials on OSF for the complete text of our in-house surveys. We included the 443 C-BARQ trainability score as a covariate in our confirmatory analysis to account for the potential 444 impact of varying individual training histories on the dogs' task performance. 445

Sampling Plan 446
This experiment will be conducted at labs around the world. In addition to current 447 consortium labs committed to collecting data (Table 1), we will recruit additional canine science 448 labs and research centers through relevant listservs, conferences and social media channels. Labs setting up and running the study, (2) obtain ethics approval from their institution, and (3) collect 451 data from at least 16 subjects that meet submission requirements. Because of the nature of this 452 project, the exact number of participating labs/collaborators cannot be specified ahead of time. 453 Our plan is to fix a data collection end date, and any labs/collaborators who collect data from the 454 minimum of 16 dogs by the end date will be included in the analysis. A minimum number of 455 dogs per lab is set to allow for an assessment of between-lab variation in performance. 456 For similar reasons, the number of subjects cannot be specified ahead of time. Each 457 lab/collaborator that submits data for this project is required to collect behavioral data, and 458 strongly encouraged to submit video data, from a minimum of 16 dogs in order to be included in 459 final analyses.

Data Analysis 486
Data will be analyzed in R Statistical Software (R Core Team, 2021). As an inference 487 criterion, we will use p-values below .05. Where possible, we will supplement the frequentist 488 statistics with Bayes factors. 489

Performance Relative to Chance 490
We will conduct one-sample (two-tailed) t-tests to compare the subjects' aggregated 491 performance across trials to the chance level (0.5) separately for each condition (Ostensive, Non-492 ostensive, and Odor Control). We will also conduct these analyses separately for each lab.

Condition Comparison 497
For our main analysis, we plan to fit a Generalized Linear Mixed Model (GLMM) with 498 binomial error distribution and logit link function using the glmer() function from the lme4 499 package (Bates et al., 2015). This model will include condition (Ostensive and Non-ostensive 500 only), order of condition (Ostensive first, Non-ostensive first), trial number within condition, dog 501 sex, dog neuter status, dog age (in years), and dogs' trainability score based on the C-BARQ 502 questionnaire (Hsu & Serpell, 2003) as fixed effects and subject and lab as random intercepts. 503 The full model, including fixed effects, random intercepts, and random slopes is defined by: 504 Correct choice ~ condition + order_condition + trial_within_condition + sex*desexed + 505 age + C-BARQ_trainability_score+ (condition + trial_within_condition + | Subject ID) 506 + (condition+ order_condition + trial_within_condition + sex*desexed + age + C-507 BARQ_trainability_score | Lab ID). In a second model, we will repeat this analysis with only 508 purebred and known crossbred dogs, excluding mixes of unknown breeds, or of more than two 509 breeds (only breeds/crossbreeds with at least 8 individuals will be included) and include the 510 random effect of breed in this model: Correct choice ~ condition + order_condition + 511 trial_within_condition + sex*desexed + age + C-BARQ_trainability_score + (condition + 512 trial_within_condition + | Subject ID) + (condition+ order_condition + 513 trial_within_condition + sex*desexed + age + C-BARQ_trainability_score| Lab ID) + 514 (condition+ order_condition + trial_within_condition + sex*desexed + age + C-515 BARQ_trainability_score | Breed ID). We will only include random slopes if the 516 corresponding predictor variable varies in at least 50% of the levels of the random intercept. We 517 will only include the random slope of the interaction if there is sufficient variation in both of its terms in at least 50% of the levels of the random intercept. We will only include the correlations 519 between random intercepts and random slopes if including them results in a model with better fit 520 (i.e., smaller log-likelihood). 521 All covariates will be centered and scaled to a standard deviation of 1. The random slope 522 components of the factors will be centered to ensure that the results are not conditional on the 523 choice of the reference category. 524 If the models do not converge, we will follow the steps reported by Bolker (2014). If these 525 procedures do not fix convergence issues, we will remove correlations between random effects 526 then remove random slopes, if needed, in the following order: Lab ID, Subject ID, Breed ID. 527 For the GLMM, we will calculate likelihood ratio tests using the drop1() function from 528 lme4 (using a chi-square test, Barr et al., 2013) with p-values below .05 as the criterion to make 529 inferences about fixed effects. 530 In addition to the frequentist GLMM, we will calculate Bayes factors for the models from 531 Bayesian models using the brm() function from the brms package (Bürkner, 2017(Bürkner, , 2018 with 532 default, non-informative priors. We will then use the bayes_factor() function to compare 533 models, using bridge sampling for repetitions (Gronau et al., 2020). The Bayes factors will 534 represent the evidence for the full model relative to the full model without the fixed effect under 535 investigation. The Bayesian analysis will be supplemental, and inferences will be drawn from the 536 frequentist statistics. 537

Genetic Analysis of Among-breed Heritability 538
To assess among-breed heritability (MacLean et al., 2019), we will fit an animal model 539 (Wilson et al., 2010) which incorporates a genetic effect with a known covariance structure to estimate the proportion of phenotypic variance attributable to additive genetic effects. Genetic 541 analyses will take a breed-average approach, integrating publicly available genetic data on the 542 breeds in our dataset, rather than genotyping the individuals in the cognitive experiment. 543 Breed average genetic similarity will be represented by an identity-by-state (IBS) matrix 544 calculated from publicly available genetic data collected using the Illumina CanineHD bead array 545 (Parker et al., 2017). The proportion of single-nucleotide polymorphisms (SNPs) identical by 546 state between pairs of individual dogs will be calculated using PLINK (Chang et al., 2015). These 547 values will then be averaged for every pair of breeds in order to generate a breed-average IBS 548 matrix. This breed-average IBS matrix will be extrapolated to an individual-level IBS matrix for 549 the purposes of our analysis. For individuals of different breeds, the IBS value will be set to the 550 average similarity between those breeds in the genetic dataset. For individuals of the same breed, 551 the IBS value will be set to the average IBS value among members of that breed in the genetic 552 dataset. The purpose of this approach is to simultaneously incorporate a measure of between-and 553 within-breed genetic similarity, retaining the ability to model phenotypes at the individual, rather 554 than breed-average level. Only breeds represented by N ≥ 8 individuals will be included in these 555 analyses. 556 Heritability models will be fit using the brm() function from the brms package (Bürkner,557 2017, 2018) with weakly informative priors. We will use 12,000 iterations per chain, with the 558 first 2,000 iterations being used as a warm-up, and a subsequent thinning interval of 10 iterations 559 for retention of samples for the posterior distributions. We will report the mean and 90% credible 560 interval for the posterior distribution of heritability estimates for this analysis. 561 Heritability models will include breed-mean body mass, sex, and age as covariates. We 562 will fit three separate models using the following dependent measures: (1) proportion correct in the Ostensive condition, (2) proportion correct in the Non-ostensive control condition, and (3) a 564 difference score between these conditions, in which performance in the Non-ostensive condition 565 is subtracted from performance in the Ostensive condition. 566 Model performance will be assessed by visualizing fitted values vs residuals and quantile-567 quantile plots. If problems are detected at this stage, models will be refit using an appropriate 568 statistical transformation of the dependent measure. 569

Preliminary Data 570
In order to validate our study design and analysis plan, we collected preliminary data from 571 a pilot experiment at the Clever Dog Lab at the University of Veterinary Medicine in Vienna, 572 Austria. We pre-registered the study design, procedure, predictions, and confirmatory analysis 573 prior to data collection at the Open Science Framework (https://osf.io/gz5pj/). The data and 574 analysis script are available online at ManyDogs OSF. 575

Methods 576
Ninety-one dogs (Males = 38, MAge = 5.13 years, SD = 3.31) across a variety of breeds 577 participated in the pilot experiment. Of these, a subset of 61 dogs (Males = 26, MAge = 4.74 years, 578 SD = 3.25) were tested after our pre-registration was submitted; all statistical models are limited 579 to these individuals. An additional 12 dogs started but did not complete the experiment due to 580 lack of motivation (n = 10) or fear/anxiety (n = 2). The study was discussed and approved by the 581 institutional ethics and animal welfare committee in accordance with Good Scientific Practice 582 guidelines and national legislation (ETK-081/05/2020). pre-registration. A meat-based sausage treat was used, and odor cues were controlled by rubbing 585 the interior of the cups with sausage prior to warm-ups and test trials. With the exception of four 586 subjects (who were handled by a female research assistant), subjects were handled throughout the 587 study by their guardians. While data were live-coded by the experimenter, a second rater naive to 588 the hypotheses and theoretical background of the study scored the video data of 18 randomly 589 selected dogs (ca. 30% of the pre-registered sample). We used Cohen's kappa to assess the 590 interobserver reliability of the binary response variable "correct choice." The two raters were in 591 complete agreement ( = 1, N = 360). 592

Data Analysis 593
To evaluate whether dogs' performance in correctly choosing the cup with the treat 594 deviated significantly from the chance level of 0.5 in the Ostensive, Non-ostensive, and Odor 595 Control conditions, we first aggregated the data across trials for each individual and condition. 596 We then conducted one-sample t-tests to compare the performance against chance. 597 To compare the performance between the test conditions, we fitted a GLMM with 598 binomial error distribution and logit link function. We included the predictor variables condition, 599 order of condition, trial number within condition, sex, age, and dogs' trainability score based on 600 the C-BARQ questionnaire. Additionally, we included the random intercept of subject ID and the 601 random slopes of condition and trial number within subject ID. Note that, unlike the proposed 602 study, this analysis did not include dog neuter status or lab ID in the model.  (Figure 4). 619

Condition Comparison 620
The dogs were significantly more likely to choose the baited cup in the Ostensive 621 condition compared to the Non-ostensive condition ( 2 (1) = 5.11, p = .024, 10 = 3.88) ( Figure  622 4A). None of the control predictors (order of condition, trial number within condition, sex, age, 623 C-BARQ trainability score) had any effect on dogs' choices (Table 2). 624

Discussion 625
Our results from the preliminary data suggest that ostensive cueing plays an enhancing 626 role in dogs' ability to follow pointing gestures from humans: dogs successfully followed 627 pointing gestures at above chance levels in the Ostensive condition but not in the Non-ostensive or Odor Control conditions, and they followed ostensively cued points significantly more often 629 than non-ostensively cued ones. These results suggest that ostensive cues may be sufficient for 630 dogs to successfully interpret and follow social gestures given by human informants. Conversely, 631 dogs did not successfully follow all pointing gestures, suggesting that the mere presence of a 632 human point is not interpreted by dogs as an imperative command, or as a sufficient associative 633 cue for point following. This preliminary result is in line with previous research suggesting that Delay (2016). By using an eye-tracker to determine dogs' looking behavior during the human 648 pointing gesture, she found that dogs readily followed the movement of the pointing arm, but 649 very rarely extended the signals further to the cups. In general, dogs looked at the experimenter's 650 head-area the most. These results are therefore in line with Tauzin et al. (2015) suggesting that 651 dogs perceive pointing as a spatial signal (where to go) rather than as a signal that refers to an object (e.g., Kaminski et al., 2012). More work will be needed to distinguish between the 653 informative and imperative accounts. Some of our proposed and further exploratory analyses may 654 begin to address these questions by looking at individual-level variation and the importance of 655 training and trainability. 656 It is worth noting, however, that while the difference in dogs' performance across 657 conditions in the preliminary data was statistically significant, this difference is subtle. This slight 658 difference should also warn us against overestimating the role of ostensive signals. Contrary to 659 the assumptions of the theory of (Human) Natural Pedagogy (Csibra & Gergely, 2006 But staring at the eyes of another is a strong attention-getter for adult individuals in almost all 669 social species (Emery, 2000). Our final, larger sample will allow for greater statistical power, and 670 as a result, increased confidence in our conclusions. Greater confidence will not only be achieved 671 through the increase in sample size, but also through increased variance in the sample, with 672 different experimenters, and dog populations across a multitude of labs. 673 An additional benefit of the multi-lab approach proposed by ManyDogs is the potential to age on dogs' ability to follow pointing gestures. Such analyses are difficult (if not impossible) to 676 conduct in single-lab studies due to the lack of statistical power as well as the potential 677 homogeneity of training history amongst dogs recruited from the same geographic area. The 678 multi-lab approach allows for the sampling of dogs with a variety of training backgrounds and 679 breeds. In addition to enabling these analyses of individual differences, sampling a more diverse 680 population of dogs will likely result in more generalizable data, and thus, more externally valid 681 conclusions.  centered and scaled to a standard deviation of 1. The standard deviations for the contribution of 934 the random effects were 0.099 for the random intercept of subject, 0.159 for the random slope of 935 condition within subject, and 0.063 for the random slope of trial number within subject. 936