Functionally analogous body- and animacy-responsive areas are present in the dog (Canis familiaris) and human occipito-temporal lobe

Comparing the neural correlates of socio-cognitive skills across species provides insights into the evolution of the social brain and has revealed face- and body-sensitive regions in the primate temporal lobe. Although from a different lineage, dogs share convergent visuo-cognitive skills with humans and a temporal lobe which evolved independently in carnivorans. We investigated the neural correlates of face and body perception in dogs (N = 15) and humans (N = 40) using functional MRI. Combining univariate and multivariate analysis approaches, we found functionally analogous occipito-temporal regions involved in the perception of animate entities and bodies in both species and face-sensitive regions in humans. Though unpredicted, we also observed neural representations of faces compared to inanimate objects, and dog compared to human bodies in dog olfactory regions. These findings shed light on the evolutionary foundations of human and dog social cognition and the predominant role of the temporal lobe.

I would ask the authors to engage in deeper reflection on how the results are discussed. In the abstract, the authors state that "…only humans had regions specialized for face perception". This is something one is unable to say on the basis of this kind of research; one can only state that the equivalent region was not found in dogs in the current study. It is always the uncomfortable way of the research that if you do not find something does not mean it is not there, only that it was not found. This is especially true for the paradigms used for the first time as in the current study; only through accumulating knowledge and similar findings elsewhere we can begin to collectively come to a conclusion.
More important is the reflection on the work in the field so far: how do the new findings fit into the picture so far, what are the possible reasons the face-selective region was not found in dogs? The authors carefully consider their stimulus low-level composition and sensitivity of the methodology, and note the fMRI result difference to the single-cell studies as conducted in sheep, but one could also consider e.g. the spatial differentiation in dog brains compared to those of humans, the effects of variation in the dog brain morphology, and perhaps the effects of dog breeds to the possible underlying functionality. This increases the usability of the current work and gives tools for the field to explore the issue further.
Also, for the discussion, instead of "faces are not specific for dogs" viewpoint, I suggest emphasizing "bodies may be equally specific as faces for dogs" viewpoint. This may better fit the literature and could be further studied in the future. Perhaps dogs can detect emotion, identity and such from faces, but maybe they are equally good with bodies. And while the faces are processed in a holistic manner in dogs as in humans, maybe bodies are also processed similarly. Or, as behavioral results of face processing are dense in many species (e.g. newly hatched chicks turn toward a face, Rosa-Salva et al. Dev Sci 2010) -do these face-specific processes already take place at subcortical level, perhaps also with dogs? Should this be targeted for further studies in the fMRI research?
A few minor comments are listed below: line 66 ref. #20: Apart from other references given here, this is a review instead of original research, so could be omitted here (or mentioned as such). lines 81-82: "prior work did not find greater activation for faces compared to scrambled images" -> strictly speaking, this is untrue, as Kujala et al (Sci Rep 2020) did find a difference between scrambled and intact dog/human facial images in dog brain responses. However, this study also lacked bodies as a control stimulus, and the information retrieved with a neurophysiological method taps partially different neural mechanisms from the fMRI, in which the quicker responses may be difficult to detect. lines 107-111: Pls add that you conducted the fROI analysis in the individual level. This is a very important specification as ROIs are determined in a number of different ways to overcome the multiple comparison problem, and it already answers some further questions.
lines 116-135: This paragraph already describes the main findings. As it is highly unusual for the introduction, I suggest omitting unless the journal instructions specifically instructed the authors to include findings in the introduction. Figure 6: The text in the figure is very small (actually, the same applies to other figures as well) and colors remain very difficult to distinguish from another, which is crucial for this figure. Please change one of the blues and the darker green color for the figure to be more readable (also in the CMYK print form, which people still do use). In the dog brain image, the color actually appears similar as the faces-only color in the label -better distinction in the colors would be beneficial here. There are not many studies so far about this particular issue, but the reason this kind of difference is not often found may be in the differences of the stimulus salience; if emotional stimuli are too mild, differences are not found. In the current data, the emotionality appeared very low; perhaps the emotionality of the stimuli adds to the species-specific processing via innate & learned processing. lines 410-412: Pls add the analysis which gave this result. With different analyses in the same context, it is difficult to follow which analysis is discussed. lines 429-430: "Note also that" -> pls consider "Notably" instead lines 433-434: "It is important though that we did not predict these findings and should therefore be regarded as preliminary." -> I do not see this necessary in this context. If the multiple comparisons problem is properly dealed with, a result is a result -it is not dependent on the psychic abilities of the researchers. Of course, it is appropriate to point out the need to examine this further and to replicate elsewhere.

Reviewer #1 (Remarks to the Author):
This study investigated the neural correlates of face and body stimuli presented to dogs and humans using functional MRI. Six image categories were presented (headless dog bodies, headless human bodies, inanimate objects, dog faces, human faces, scrambled images) in a block design. Several analyses were conducted based on regions of interest and whole-brain data. The contrasts were appropriate. In brief, analyses revealed functional similarity in occipito-temporal cortex for passively viewing body stimuli across species. The findings involving dog olfactory areas are intriguing. Importantly, hue and saturation were controlled as potential causes of contrast differences. The introduction succinctly summarized the appropriate literature pointing out important limitations. The present study improves upon limitations in prior published findings. The findings are novel and will be important to readers interested in cognitive evolution and comparative neuroimaging, as they help reveal information about the evolution of social perception. Overall, the paper was well written. There are several things that need to be explained better (or perhaps I missed the detail).

Author response:
We thank the reviewer for the very positive evaluation of our work, the constructive comments towards improving it further, and for confirming our view that this is novel and important work.
1. I do not understand the logic of removing the heads from the body images. Could the authors please elaborate on why this is necessary to do? The headless bodies look strange and unfamiliar. Wouldn't this be a potential confound when comparing faces vs bodies in that dogs/humans are likely to have very little experience with headless bodies.
We understand the reviewer's concern and would like to explain our rationale. Our goal was to disentangle category sensitivity for faces vs. bodies. By removing the faces from the body images, we attempted to achieve a balance between the naturalistic depiction of bodies and experimental control. We agree with the reviewer that dogs only have little exposure to headless bodies. However, dogs typically also do not see isolated faces (without bodies). Several behavioural studies still showed that isolated faces can be used to investigate familiarity, emotion perception, or emotion discrimination in dogs 1-3 . Additionally, similar stimuli (isolated faces, isolated bodies) have been used in studies with humans and non-human primates (see e.g., 4-7 ), successfully showing differential neural representations for faces vs. bodies.
We now explain this in more detail in our updated manuscript (see below for relevant text passages).
Material and methods, lines 544-549: "In line with prior human and non-human primate neuroimaging studies (see e.g., 4-7 ), we edited out the heads, as well as objects (e.g., a coffee cup, a soccer ball) from the body images to disentangle body from face and object perception. To increase ecological validity the face and body images showed a variety of postures (e.g., jumping, looking up), neutral and positive emotional displays (e.g., sleeping, smiling), and viewing perspectives (e.g., from above, from a side angle)." 2. I think using a 10% arbitrary threshold is a weird choice. The authors argue why they made this choice in relation to other thresholds, but what is the necessity for making these arbitrary choices is not clearly explained. They mention the following in the legend of the supplementary figure where this choice is justified: "Smaller percentages would have been less sensitive for dog fROIs and increased fROI sizes would have been less sensitive in the human fusiform and occipital face area as illustrated by the overlapping 95% confidence intervals". This is similar to p-hacking and this type of logic is not acceptable in statistics. One should choose thresholds based on an underlying principle which you think is correct and you then interpret the results. Seeing the results and choosing the threshold is going the other way round. The authors also go on to show that whole brain analyses kind of supports the results they got from ROI analysis. Then why do the complicated ROI analyses with arbitrary thresholds is not clear. Is that a way to circumvent multiple comparisons corrections?
We thank the reviewer for pointing out that further clarification is needed. We structured our reply in several parts.
(1) Rationale for conducting the functional region-of-interest (fROI) and exploratory whole-brain analyses The main aim of the univariate analyses was to investigate whether dogs and humans have comparable face-and body-sensitive brain regions. We applied a functional region-of-interest (fROI) analysis approach to address this research question (see e.g., 8 for recent application to investigate category-sensitivity in human infants) and split the data into two independent data sets: (a) a localizer data set (first task run) to define individual potential face-or bodysensitive areas in visual-responsive brain regions and (b) a test data set (second task run) to extract activation levels from these regions We chose this approach for two main advantages. First, defining individual fROIs within constrained search spaces accounts for slight variations of activation peaks between participants, as reported in past dog neuroimaging studies 35,37 . Second, this approach allowed us to directly test the category-sensitivity of the localized category-sensitive regions using the left-out data set. This analysis approach has also been used in a recent human infant fMRI study to investigate face and body-sensitivity 8 .
While we considered this the ideal approach, fROI analyses have their know limitations, such as being "blind" to areas outside the fROIs tested. This is why we performed a complementary and exploratory whole-brain analysis. The results largely confirmed the fROI findings, but as expected showed lower sensitivity and detected only one animate-sensitive area, in the mid suprasylvian gyrus. Importantly, it revealed no areas additional to those focused on in the fROI analyses. Taken together, this complementary approach bolsters our interpretations regarding category-sensitive areas in the dog or human brain.
Regarding the reviewer's concern that this analysis approach may have been "a way to circumvent multiple corrections," we emphasize that all analyses, i.e., both the fROI analyses and the whole-brain analyses were corrected for multiple comparisons, as explicitly stated in the manuscript, on page 8 (lines 125-126) and throughout the methods and results sections.

(2) Rationale for choosing the top-10% most active voxels as a threshold to form functional regions-of-interest (fROIs)
This comment seems to suggest that we engaged in questionable research practices. This is not the case -we did not conduct p-hacking or any related practices, and we fully agree with the reviewer that "peaking" at data and then selecting a threshold is highly problematic. This is why choosing the top-10% most active voxels was an a priori analytical decision that we made solely based on the size of the resulting fROIs, before we extracted any parameter estimates and analysed them. This approach aimed to strike a balance between fROIs that would contain a sufficient amount of voxels to be analyzed, while still being able to detect potentially small category-selective regions in the dog brain.
After conducting the main fROI analysis using this threshold, however, we received feedback on this analysis approach at conferences and lab talks that suggested evaluating our decision to choose this 10% threshold using supplementary exploratory analysis (see Supplementary Figure S1, next page). Note that this is also evident by the fact that the first version of this study's preprint did not contain the supplementary analysis (see https://www.biorxiv.org/content/10.1101/2021.08.17.456623v1?versioned=true).
The rationale behind the supplementary analysis was to collect additional information on how much different thresholds may affect the results. This should also provide information for future dog neuroimaging studies. We, therefore, extracted parameter estimates for percentage cutoffs ranging from 1% to 100%. The results showed that smaller fROI sizes were less categorysensitive in all search spaces in the dog brain (see Supplementary Figure S1, next page). In contrast, in the human fusiform face area and occipital face area, thresholds above 60% resulted in fROIs that could not detect differences in activation levels between faces and bodies. Thus, the human fROIs likely also contained voxels that are not face-or body-sensitive at this size. For the dogs, larger fROI sizes resulted in comparable outcomes in the mid and caudal suprasylvian gyrus but had less sensitivity in the ectomarginal gyrus. Taken together, the supplementary analysis confirmed, in a qualitative fashion, that the top-10% voxel threshold indeed strikes an optimal balance in the definition of category-sensitive fROIs.
The reviewer's comment made it evident that our rationale and approach require a more elaborate explanation. We thus carefully revised all relevant sections in the manuscript as follows: Results, lines 135-139: "We chose this approach for two main advantages. First, defining individual fROIs accounted for differences in the location of activation peaks between participants (as reported in previous studies 35,37 ). Second, this allowed us to not only localize potential face-or body-sensitive regions but also to directly evaluate their category-sensitivity using the left-out data." Results, lines 168-175: "Choosing the top 10% voxels was an a priori analytical decision we made based on the size of the resulting individual fROIs before any activation levels were extracted (see Materials and Methods: Functional region-of-interest approach for details). However, after we conducted the main analysis, we also extracted parameter estimates for a range of different percentage cut-offs between 1% to 100% to validate the results using this threshold, altogether confirming that the 10% threshold was an appropriate fROI size for detecting relevant activation levels (see Supplementary Note 1 and Supplementary Figure S1)." Results, lines 213-215: "Next, we conducted an exploratory whole-brain analysis to complement the fROI analysis and investigate if we can detect the category-sensitive areas using whole-brain group comparisons."

Materials and Methods, Functional region-of-interest approach, lines 680-687: "Choosing the top 10% voxels to define individual functional regions-of-interest (fROIs) was an a priori analytical decision
we made based on the size of the fROIs before any activation levels were extracted. The aim was to create functional fROIs with a sufficient amount of voxels to be analyzed while still being able to detect potentially small category-sensitive regions in the dog brain. The chosen threshold resulted in mean fROI sizes ranging from 4.6 voxels (left occipital face fROIs) to 14

.27 voxels (left splenial face fROI; see Supplementary Table S2 for all average fROI sizes and section Alternative top-% voxels threshold to define functional fROIs below)."
Materials and Methods, Alternative top-% voxels threshold to define functional fROIs, lines 731-740: "We decided to use top-10% voxels as a threshold to create fROIs with sufficient data points that would still be able to detect potentially small category-selective regions in the dog brain. Thus, we decided based on the dog fROI sizes before any parameter estimates were extracted (see Supplementary Table S2 for mean fROI sizes). While choosing the top-10% was an a priori but analytical decision, we also performed validation analyses after completing the main analysis, using fROIs for the dog and human sample for percentage cut-offs ranging from 1% to 100% of the top active voxels in steps of 5%, report parameter estimates for these percentages, and compare them to the main analysis with top-10% activated voxels."

Supplementary Note 1, Validation of threshold to define individual functional regions-of-interest:
"Individual functional regions-of-interest (fROIs) were determined based on the top-10% most active voxels within each anatomical search space. To validate the chosen threshold and investigate, for example, if smaller fROI sizes would have been able to detect face-sensitive areas in the dog brain, we extracted parameter estimates for percentage cut-offs ranging from 1% to 100%. The results show that smaller fROI sizes were less sensitive in all anatomical search spaces in the dog brain (see Supplementary Figure S1). In the human fusiform face area and occipital face area, thresholds above 60% resulted in fROIs that could not detect differences in activation levels between faces and bodies. Thus, the fROIs likely also contained voxels that are not face-or body-sensitive at this size. For the dogs, larger fROI sizes revealed comparable outcomes in the mid and caudal suprasylvian gyrus but had less sensitivity in the ectomarginal gyrus than lower thresholds. Thus, comparing fROI-defining thresholds confirmed 10% as a sufficient threshold to define category-sensitive fROIs in the dog and human brain." Supplementary Figure S1. Exploratory analysis of parameter estimates for faces, bodies, and inanimate objects retrieved from individual functional regions of interest (fROIs) defined based on top-% most active voxels for faces or bodies > inanimate objects (run 1) ranging from 1% to 100% in steps of 5%. For dogs, fROIs defined based on thresholds below 10% were less sensitive in all anatomical search spaces. Large fROI-defining thresholds were less sensitive in the human fusiform and occipital face area and dog ectomarginal gyrus as illustrated by the overlapping 95% confidence intervals (CIs). Points represent the mean. a.u., arbitrary units. The dashed line represents the a priori selected 10% threshold used for the main analysis that we set out to validate.
3. Fig S1 also shows that the problem with dog results is lower t-values and larger inter-individual variability. The human sub-sample analyses (Fig S2-S3) also have a relatively low t-value. So, lower sample size probably explains the lower t-value. But even with lower t-value, human subsample analysis has significantly smaller variance than dog results for a similar-sized sample. I don't know whether this variability is due to noise (movement), differences in acquisition parameters between dogs and humans, issues with dog compliance, or because in-plane resolution of dogs and humans were almost same, but the dog brain is much smaller (this may lead to blurring of effects in the relatively larger voxels of the dog). For these reasons, quantitative comparisons of results between the species and strong conclusions about them (in a biological sense) are problematic.
We fully agree with the reviewer and therefore discuss the possibility of face-sensitive areas in the dog brain that were simply not detectable with our setup in our manuscript. We now also explicitly mention the difference in brain sizes between the two species but the same image resolution as a methodological limitation. Furthermore, we carefully revised the interpretation of our results in the manuscript. We now refrain from strong conclusions and emphasize that more studies and meta-analyses are needed to further elucidate our understanding of face and body perception in the dog brain. For example, we exchanged phrasings like "only humans have a specialized region for face perception" with "we only detected face-sensitive regions in humans".
Abstract, lines 6-9: "Combining univariate and multivariate analysis approaches, we found functionally analogous occipito-temporal regions involved in the perception of animate entities and bodies in both species and face-sensitive regions in humans." Introduction, lines 99-104: "The analysis also revealed analogous occipito-temporal brain areas sensitive for animate entities (i.e., faces or bodies) compared to inanimate objects and low-level visual controls in dogs and humans indicating a convergent evolution of these neural bases. However, we only detected face-sensitive areas in humans. This suggests that previously identified face-responsive areas in the dog brain may respond more generally to animate compared to inanimate stimuli." Discussion, lines 311-314: "By adding bodies as stimuli, and thus controlling for animacy, our findings crucially expand those from earlier investigations on face perception in dogs [9][10][11][12][13][14] and suggest that previously identified face-sensitive areas may respond more generally to animate entities." Discussion, lines 322-324: "Hence, our findings suggest a convergent evolution 15 of the neural bases of animate vs. inanimate stimuli perception but potential divergence regarding face and body perception in dogs and humans." Discussion, lines 352-355: "In the present study, we also localized several occipito-temporal regions that responded to animate stimuli more generally; this might further indicate that dogs, in comparison to humans, focus more on whole-body social cues rather than on specific body parts." Discussion, lines 471-478: "Nevertheless, we were still able to detect face and body-preferences in humans when we conducted the analysis again in 1000 randomly drawn human sub-samples (i.e., resampling analysis approach) with the identical fROI approach (i.e., anatomical mask) and sample size as for the dogs, indicating that the observed results were not driven by these methodological differences. However, considering that human and dog functional scans had the same image resolution, but the size of their brains significantly differs, it is, as mentioned earlier, possible that dogs do have small face-sensitive patches that were not detectable with the present setup." Discussion, lines 487-490: "Overall, the present study marks the first step towards comparing body perception in the dog and human brain. However, more research is needed to elucidate further and compare the neural mechanisms underlying face and body perception in the dog and human brain."

Reviewer #2 (Remarks to the Author):
The manuscript "Functionally analogous body-and animacy-responsive areas in the dog (Canis familiaris) and human occipito-temporal lobe" (by Boch, Wagner, Karl, Huber and Lamm) represents a well-written, thoughtful, and methodologically advanced piece of research, with decent sample sizes and intriguing findings that are important for the field. They show, for the first time, how human & dog brains differentiate faces and bodies from objects and low-level scrambled controls.

Author response:
We thank the reviewer for this overall very positive evaluation of our work and the constructive suggestions to improve the interpretation and reproducibility of our findings.
1. The methodology is well explained and carefully constructed. However, while the human brain imaging data often have specific ethical restrictions, the dog raw neuroimaging data could be openly shared to allow for reproducibility and should be considered by the authors. The authors share the end-result data maps, but this is not usable for the replication of the analysis but only for checking if the results were reported reasonably.
We thank the reviewer for this great suggestion. The dog neuroimaging data is now publicly available at zenodo.org 16 . Due to ethical reasons, we are indeed not able to share the raw human neuroimaging data but we will share them with researchers upon request. We now explicitly mention this option in our updated manuscript.
To improve the reproducibility of our analysis, we carefully explained all analysis steps and the rationale behind them in the manuscript and made the analysis scripts and raw functional region-of-interest (fROI) data publicly available. We also share group beta maps to enable future meta-analyses.

Revised sections in the manuscript:
Data availability, lines 825-827: "Univariate and multivariate beta maps, individual raw functional region-of-interest (fROI) data, motion parameters, low-level visual properties descriptives of the stimulus material and further sample descriptives have been deposited at the Open Science Framework (OSF) and are publicly available at https://osf.io/kzcs2/. Raw human neuroimaging data is made available upon reasonable request and raw dog neuroimaging data is publicly available at zenodo.org 16 ." 2. I would ask the authors to engage in deeper reflection on how the results are discussed. In the abstract, the authors state that "…only humans had regions specialized for face perception". This is something one is unable to say on the basis of this kind of research; one can only state that the equivalent region was not found in dogs in the current study. It is always the uncomfortable way of the research that if you do not find something does not mean it is not there, only that it was not found. This is especially true for the paradigms used for the first time as in the current study; only through accumulating knowledge and similar findings elsewhere we can begin to collectively come to a conclusion.
We fully agree with the reviewer and have carefully revised the interpretation of our results in the manuscript. We now refrain from strong conclusions and emphasize that more studies and meta-analyses are needed to further elucidate our understanding of face and body perception in the dog brain.

Introduction, lines 99-104: "The analysis also revealed analogous occipito-temporal brain areas sensitive for animate entities (i.e., faces or bodies) compared to inanimate objects and low-level visual controls in dogs and humans indicating a convergent evolution of these neural bases. However, we only detected face-sensitive areas in humans. This suggests that previously identified face-responsive areas in the dog brain may respond more generally to animate compared to inanimate stimuli."
Discussion, lines 311-314: "By adding bodies as stimuli, and thus controlling for animacy, our findings crucially expand those from earlier investigations on face perception in dogs [9][10][11][12][13][14] , and suggest that previously identified face-sensitive areas may respond more generally to animate entities." Discussion, lines 322-324: "Hence, our findings suggest a convergent evolution 15 of the neural bases of animate vs. inanimate stimuli perception but potential divergence regarding face and body perception in dogs and humans." Discussion, lines 352-355: "In the present study, we also localized several occipito-temporal regions that responded to animate stimuli more generally; this might further indicate that dogs, in comparison to humans, focus more on whole-body social cues rather than on specific body parts." Discussion, lines 487-490: "Overall, the present study marks the first step towards comparing body perception in the dog and human brain. However, more research is needed to elucidate further and compare the neural mechanisms underlying face and body perception in the dog and human brain." 3. More important is the reflection on the work in the field so far: how do the new findings fit into the picture so far, what are the possible reasons the face-selective region was not found in dogs?
The authors carefully consider their stimulus low-level composition and sensitivity of the methodology, and note the fMRI result difference to the single-cell studies as conducted in sheep, but one could also consider e.g. the spatial differentiation in dog brains compared to those of humans, the effects of variation in the dog brain morphology, and perhaps the effects of dog breeds to the possible underlying functionality. This increases the usability of the current work and gives tools for the field to explore the issue further.
We thank the reviewer for raising these important points. We structured our reply into several parts.

Spatial resolution
As mentioned by the reviewer, we already critically discussed methodological differences that might explain why we did not find a face-sensitive region in the dog brain. In terms of spatial resolution, it is of course also important to note that the brain sizes of dogs and humans significantly differ. Thus, considering that functional scans of both species had the same image resolution, sensitivity to detect small face-sensitive regions was lower for dogs compared to humans. We have now added this limitation to the discussion of our manuscript.
Discussion, lines 471-478: "Nevertheless, we were still able to detect face and body-preferences in humans when we conducted the analysis again in 1000 randomly drawn human sub-samples (i.e., resampling analysis approach) with the identical fROI approach (i.e., anatomical mask) and sample size as for the dogs, indicating that the observed results were not driven by these methodological differences. However, considering that human and dog functional scans had the same image resolution, but the size of their brains significantly differs, it is, as mentioned earlier, possible that dogs do have small face-sensitive patches that were not detectable with the present setup."

Variation of brain morphology and breed differences
Brain morphology systematically varies across dog breeds 17 . Our sample of pet dogs was rather homogenous in terms of dog breeds (80% Border collies) and behavioural specializations, and all dogs had mesocephalic skull shapes. We further chose an analysis approach for the main univariate analysis that accounts for slight deviations of activation peaks, e.g., due to neuroanatomical variations (i.e., functional region-of-interest analysis) and used a breed-averaged template 18 to improve image registration for whole-brain analyses.
Our findings indicate body-and animate-sensitive regions in the dog suprasylvian and ectomarginal gyrus. Both regions have been identified as part of a network that systematically covaries in size (i.e., grey matter volume) across dog breeds 17 and that is positively associated with behavioural specializations related to vision (herding, sight hunting) but also with explicit companionship, which further emphasizes the role of these areas for social cognition. Considering reported breed-specific differences of grey matter volume in olfactory and gustatory regions 17 , the observed involvement of dog olfactory cortices in face and body perception in our study might be more pronounced in breeds selectively breed for olfactory skills such as scent hunting or detection. In the present study, we did not have enough variance to test for potential differences between breeds, future studies, and cumulative meta-analyses should, however, consider further investigating the link between behavioural specializations and the neural bases of face and body perception in the dog brain.
We have now added this important discussion to our manuscript and hope that it will inspire exciting new research into the effects of dog breeds and behavioural specializations. 18 for whole-brain analyses. We have localized body-and animate-sensitive regions in the ectomarginal and suprasylvian gyrus. These regions have been identified as part of a network that systematically covaries in size (i.e., grey matter volume) across dog breeds and that is positively associated with behavioural specializations related to vision (e.g., herding or hunting), but also with explicit companionship 17 , which further emphasizes the areas' role for social cognition. However, considering breed-specific neuroanatomical differences in olfactory and gustatory brain regions 17 , the findings in dog olfactory cortices might be even more pronounced in dogs selectively bred for scent detection. In the present study, we did not have enough variance to test for potential differences between breeds; future studies and cumulative meta-analyses should, thus, consider further investigating the link between behavioural specializations and the neural bases of face and body perception in the dog brain." 4. Also, for the discussion, instead of "faces are not specific for dogs" viewpoint, I suggest emphasizing "bodies may be equally specific as faces for dogs" viewpoint. This may better fit the literature and could be further studied in the future. Perhaps dogs can detect emotion, identity and such from faces, but maybe they are equally good with bodies. And while the faces are processed in a holistic manner in dogs as in humans, maybe bodies are also processed similarly. Or, as behavioral results of face processing are dense in many species (e.g. newly hatched chicks turn toward a face, Rosa-Salva et al. Dev Sci 2010) -do these face-specific processes already take place at subcortical level, perhaps also with dogs? Should this be targeted for further studies in the fMRI research?

Face and body perception
We agree with the viewpoint that bodies may be equally important or specific as faces for dogs. We have therefore argued that our findings might indicate that dogs focus more on whole-body social cues rather than specific body parts and that our results do not contradict previous behavioural findings of dogs perceiving facial cues of conspecifics and humans [19][20][21][22][23][24] but might suggest that the majority of brain regions involved in the perception of faces are also involved in the perception of bodies. We have now carefully revised the discussion of our findings to emphasize this interpretation further and added suggestions for future studies to investigate, for example, if dogs are able to detect identity or emotional expressions equally well from bodily stimuli as they do from faces 1,2,25 .

Sub-cortical involvement
The exploratory whole-brain univariate analysis did not indicate any involvement of subcortical brain areas in dogs during face or body perception. However, representational similarity analysis revealed increased pattern similarity for conspecific compared to human bodies in the amygdala and insula cortex. These findings further emphasize the need for future investigations of emotion and species perception in dogs using whole-body or bodily stimuli. We have added this important insight to the discussion of our manuscript. [26][27][28][29] and dog neuroimaging studies so far have overlooked body perception entirely. We thus hope that localizing a novel region that preferentially processes non-facial bodily cues will inspire more research on how dogs perceive bodily social cues and if, for example, they are able to detect identity or emotional expressions equally well from bodily stimuli as they do from faces 1,2,25 . In the present study, we also localized several occipito-temporal regions that responded to animate stimuli more generally, this might further indicate that dogs, in comparison to humans, focus more on whole-body social cues rather than on specific body parts. This interpretation is in line with a recent comparative eye-tracking study showing that dogs equally attend to a whole-body social cue (i.e., face and rest of the body), whereas humans spend significantly more time looking at the face 30 . Thus, our results do not contradict previous behavioural and imaging findings of dogs perceiving facial and bodily cues of dogs and humans [19][20][21][22][23][24] but might suggest that the majority of brain regions involved in the perception of faces are also involved in the perception of bodies." Discussion, lines 396-401: "Dog cortical body-and animate-sensitive areas might therefore be tuned to respond equally to human and dog stimuli. However, we did find neural representations of conspecific compared to human bodies in dogs' limbic regions (i.e., amygdala, insula), which further emphasizes the need for behavioural research investigating the perception of emotional body cues in dogs."

A few minor comments are listed below:
5. line 66 ref. #20: Apart from other references given here, this is a review instead of original research, so could be omitted here (or mentioned as such).
We removed the reference. 6. lines 81-82: "prior work did not find greater activation for faces compared to scrambled images" -> strictly speaking, this is untrue, as Kujala et al (Sci Rep 2020) did find a difference between scrambled and intact dog/human facial images in dog brain responses. However, this study also lacked bodies as a control stimulus, and the information retrieved with a neurophysiological method taps partially different neural mechanisms from the fMRI, in which the quicker responses may be difficult to detect.
The focus of this paragraph was on fMRI studies, but we agree with the reviewer that the EEG study by Kujala et al (2020) should also be mentioned to have a better overview on non-invasive neuroimaging research investigating face perception in dogs and therefore added the study to the introduction.
Introduction, lines 60-67: "Apart from one electroencephalography (EEG) study 31 , prior neuroimaging studies did not find greater activation for faces compared to scrambled images 10,13 , but compared to scenes 10 or objects 9,10,14 , or didn't have any non-facial controls 11 , questioning if facesensitivity rather reflects differences in low-level visual properties. Further, almost all prior studies lacked animate stimuli other than faces [9][10][11]13,14,31 and the only study 12 with another animate stimulus category (i.e., the back of the head) had no inanimate control condition." 7. lines 107-111: Pls add that you conducted the fROI analysis in the individual level. This is a very important specification as ROIs are determined in a number of different ways to overcome the multiple comparison problem, and it already answers some further questions.
We added this information to the introduction.
Introduction, lines 88-92: "For our main analysis, we employed a functional region-of-interest (fROI) analysis on the individual level to investigate face-and body-sensitivity in the occipito-temporal cortex of dogs and humans and whether these regions responded differently to conspecific vs. heterospecific stimuli, as indicated by differences in activation levels." 8. lines 116-135: This paragraph already describes the main findings. As it is highly unusual for the introduction, I suggest omitting unless the journal instructions specifically instructed the authors to include findings in the introduction. This is a journal specific requirement. As stated in the style and formatting guide of communications biology (https://www.nature.com/documents/commsj-life-style-formattingguide-accept.pdf), the final paragraph of the introduction should be a brief summary of the major results and conclusions.
9. Figure 6: The text in the figure is very small (actually, the same applies to other figures as well) and colors remain very difficult to distinguish from another, which is crucial for this figure. Please change one of the blues and the darker green color for the figure to be more readable (also in the CMYK print form, which people still do use). In the dog brain image, the color actually appears similar as the faces-only color in the label -better distinction in the colors would be beneficial here.
We changed colours and made sure they are distinguishable in CMYK and RGB print form and increased the font size (see below). We also carefully revised the font size of all other figures to improve readability.

Figure 6. Graphical summary of the main study findings illustrating brain regions with analogous and divergent functions between both species.
The schematic brain figures show results from the functional regions-of-interest (fROIs; univariate activation levels) and representational similarity analyses (RSA; multivariate activation patterns). For visual guidance, we also labelled some anatomical landmarks, such as the cruciate (dog) and central sulcus (human), the parahippocampal and cingulate gyrus, as well as the (pseudo-)sylvian fissure. For visual comparisons of the results, it is important to note that the last common ancestor of dogs and humans most likely had a smooth brain consisting mainly of primary and secondary sensory regions 32 ; dog and human temporal lobes thus evolved independently and differ significantly in overall morphology 33,34 . The most significant landmark, the (pseudo-) sylvian fissure, is at the centre of the dog temporal lobe with the gyri wrapped around but constitutes the border to the frontal-and parietal lobe in humans (see lateral views). To reduce complexity, observed results are always summarized on one hemisphere and they do not mark the exact but the approximate anatomical location. Also, increased pattern similarity for bodies compared to faces in the human mid cingulate gyrus and insula are not depicted. Example category images are license-free stock photos derived from www.pexels.com and were modified for the study purpose (i.e., head or body cut out). We thank the reviewer for pointing out the missing discussion of this finding. Own-species (i.e., conspecific) preference appears to be more pronounced in non-human primates compared to humans 7,35 . This might also explain the mixed results in previous human fMRI studies, reporting either greater 5,12,36,37 or comparable 7,38,39 activation levels for human compared to dog, macaque or other non-human animal faces. In regard to conspecific preference in the extrastriate body area, we replicated previous results 40 . We added this information and relevant references to the discussion of our manuscript.  1 show species-specific differences for the perception of negative facial expressions. This might result in differential activation levels for dog compared to human angry emotional expressions. Future neuroimaging studies investigating conspecific preferences or species perception should therefore consider using both positive and negative emotional displays.
For the present study, we only selected images with neutral or positive emotional displays (as described in 4.2.1 Stimulus material) and did not find any differences in activation for human or dog stimuli in dog body-or animate-sensitive areas which is also well in line with the findings from Somppi et al. 41 .
We added this information and relevant references to the discussion of our results. 12. lines 410-412: Pls add the analysis which gave this result. With different analyses in the same context, it is difficult to follow which analysis is discussed.
We were referring to the whole-brain representational similarity analysis. We added this information to the text.
Discussion, lines 409-413: In this context, it is important to note that in accordance with a potential divergent evolution of face, body, and species representations in dogs and humans we also observed increased pattern similarity for faces (regardless of species) and conspecific (dog) bodies in dog higher-order olfactory association cortices using whole-brain representational similarity analysis.
13. lines 429-430: "Note also that" -> pls consider "Notably" instead We rephrased this section based on the reviewer's comment #3 and the word "notably" was therefore no longer fitting.
Discussion, lines 444-446: "However, considering breed-specific neuroanatomical differences in olfactory and gustatory brain regions 17 , the present findings in dog olfactory cortices might be even more pronounced in dogs selectively bred for scent detection." 14. lines 433-434: "It is important though that we did not predict these findings and should therefore be regarded as preliminary." -> I do not see this necessary in this context. If the multiple comparisons problem is properly dealed with, a result is a result -it is not dependent on the psychic abilities of the researchers. Of course, it is appropriate to point out the need to examine this further and to replicate elsewhere.