Pitch Change in Dog-Directed Speech

Humans cannot help changing their speech in different situations. This kind of ‘intraspeaker variation’ happens every day when news-reporters turn from talking to their colleagues to talking to the camera, or when people suddenly start speaking in a strong dialect when talking with their family. People not only change the way they speak to different people, they also change the way they speak when not talking to people at all. This study examines one speaker and her dog, the goal being to find out why humans change the way they speak when talking to dogs. Dog-Directed Speech (DDS) is characterised by having a high fundamental frequency (pitch), as well as large amounts of questions and imperative utterances (HirshPasek and Treiman 1982, Burnham et al. 2002, Mitchell 2001). In order to take both of these features into account, we measured and compared the mean pitch of different speech acts which we expected to be common in DDS: disapproving utterances (“Stay”), approving utterances (“Good boy!”), and questions (“Who’s a good boy?”). We wanted to see whether the frequency of certain speech acts in DDS has anything to do with its high pitch and high pitch range. Although we did find variation in the pitch of different speech acts, we do not think that this explains the high pitch and pitch range of DDS. Rather, we conclude that DDS is motivated by social norms associated with talking to dogs. Our findings can give us insight into sociolinguistics in general because they show us that we respond to social norms that come with the speech situation, as well as just responding to the way our audience talks.


Introduction
Animals cannot talk as humans do.Nevertheless we persist in using human language to talk to animals.What is so interesting about this from the perspective of sociolinguistics is that it illustrates how style-shifting can also be directed towards a non-linguistic audience.Dog-Directed Speech (DDS) is difficult to analyse according to Speech Accommodation Theory (Giles et al. 1991) or Audience Design Theory (Bell 1984), because both theories show that the speaker often adjusts his or her speech to the way the audience speaks.In this article I set out to find some of the factors that motivate style-shifting to a non-linguistic audience, using the example of one human speaker and her dog.
Previous studies have already confirmed high pitch to be a feature of DDS (Hirsh-Pasek and Treiman 1982, Burnham et al. 2002, Mitchell 2001).We 1 wanted to shed some new light on the discussion about pitch in DDS by bringing in other findings which show that DDS is also characterised by having many imperatives, questions, and other utterances, such as "come" or the dog's name (Hirsh-Pasek and Treiman 1982, Burnham et al. 2002, Mitchell 2001, Mitchell and Edmonson 1999).When we coded our DDS, we divided it into four categories based on this previous literature: questions, disapprovals, approvals, and ambivalent.Each of these we termed a "performative" speech act (Austin 1962:4), that is, an "action performed with words" (Stapleton 2004:9-10).
According to Syrdal and Kim (2008), different utterances tend to be uttered with different fundamental frequencies in Human-Directed Speech (HDS).For example, positive remarks ("That's good!") have on average a higher F0 than negative exclamations ("Oh no!").This led to our first hypothesis, which is that approving utterances will have a higher pitch than disapproving utterances.We also measured the mean F0 of some of the speech that was directed at the human audience.Our second hypothesis was that the DDS would have an overall higher pitch than the HDS.Finally, we hypothesised that the high occurrence of approvals, disapprovals and questions is one of the reasons why DDS has such a high pitch and high pitch range in the first place.If there is a dominance of speech acts which are even in HDS uttered with a higher mean F0, this would influence the overall pitch of DDS.If, however, there is little contrast in the pitch of different speech acts, yet a difference between the pitch of questions directed to dogs and questions directed to humans, then it is more likely that the high pitch of DDS happens only because of a change in audience.

Literature Review
One of the most interesting and perhaps the most amusing observations made about DDS from the literature is its similarity to Child-Directed Speech (CDS).CDS is another case of intraspeaker variation where adults accommodate their speech to an audience that is not fully linguistic. 2Hirsh-Pasek and Treiman's 1982 article Doggerel: Motherese in a New Context is an excellent comparison of 'motherese' (CDS) and 'doggerel' (DDS).They found both registers to have a high pitch, many repetitions, and more grammatically correct structures than Adult-Directed Speech (ADS).However, there were also some differences: motherese was more understandable, with even shorter sentences, fewer declaratives, more questions, and an overall higher pitch and pitch range than doggerel.The authors relate this to the adult's awareness of "the linguistic level of the listener": they know that young children will learn to speak fluent language in the future, although they cannot speak at that stage, so they take care to make their speech understandable (Hirsch-Pasek and Treiman 1982:235-6).Burnham et al. (2002) compared pet-directed speech and CDS and discovered that adults hyper-articulated vowels to a greater extent when talking to infants than talking to pets.Their explanation is the same as that of Hirsh-Pasek and Treiman (1982): adults have an interest in helping their children learn to speak, but less interest in helping their pets learn to do so.
Other studies looking at DDS not only discuss the way people speak to dogs, but also what they say to dogs.Mitchell (2001) found a clear preference for certain kinds of utterances in DDS, particularly questions, imperatives, and short utterances, such as "come", "ball" or the dog's name.Rogers et al. (1993) observed that elderly people walking their dogs in the park mainly used imperatives with their dogs, while the dog's name constituted about one fourth of the total utterances.Yet another study by Mitchell and Edmonson (1999) found that seven words (of DDS), including "come," the dog's name, and "ball", accounted for about 50% of the words, while the other 50% were commands.
DDS is not only characterised by a perceived high pitch, but also by very limited linguistic content.This qualitative feature of DDS cannot be ignored in an analysis of pitch change in DDS.This is why we chose to analyse the data in terms of utterance.In order to do this, we needed a way of analysing these utterancessomething which Speech Act Theory (Austin 1962) can provide for us.
Speech acts describe utterances in terms of their communicative function in a dialogue (Syrdal and Kim 2008).Speech Act Theory was first developed by Austin in 1955 in his lectures on How to Do Things with Words (see Austin 1962).According to Austin, words not only have inherent meaning, but also gain meaning from the situation in which they are uttered.For example, the meaning of the warning "don't jump in the water" has a different meaning when uttered sitting on the living room sofa than when standing on the edge of a lake.They are "part of doing an action" rather than "just saying something" (Austin 1962:5).This theory of meaning is more applicable to DDS, given that dogs do not understand actions more than they understand words.
Syrdal and Kim's 2008 study shows how pitch can help convey this kind of meaning in a dialogue.While the purpose of their study was to aid natural language dialogue systems and text-to-text speech synthesis in expressing human intention (asking a question, expressing pleasure, etc.), their findings are relevant to our study of pitch in DDS.Syrdal and Kim grouped all the speech acts into 4 main groups: imperative, interrogative, assertive, and affective, and measured maximum F0, minimum F0 and pitch range.Questions and positive remarks had on average a higher F0 than declarative sentences and negative exclamations, which had a lower pitch.This led to our second hypothesis, which was that positive approving utterances would have a higher pitch than negative disapproving utterances.Syrdal and Kim's research was on HDS, but we can assume that these tendencies will also apply when humans talk to dogs.
Summing up, we know from the literature that DDS generally has a high overall pitch compared with HDS.DDS is also dominated by certain speech acts: imperatives, questions, and other recurrent words such as the dog's name.If it is indeed the case that the speech acts that have been observed to dominate DDS also happen to come with high F0 values, then we can expect that the fact that we spend most of our time telling dogs off, praising them, and asking them questions, will influence the fact that DDS has a high pitch.We now move on to explain how we made our own findings.

Methods
Our data comes from one two-hour session with our speaker, Amelia, a 21-year-old female student, and Floyd, a seven-year-old male Straffie greyhound (see Figure 1 below).Amelia was not the owner of the dog and had only known him for a few months.She was born and brought up in Liverpool and her English is close to the General Northern variety of English (Wales 2006).The session was carried out inside Amelia's flat as well as outside, when we took Floyd for a walk.Both video and sound were recorded.Eliciting speech from a speaker meant that we had to avoid the observer's paradox.We made a cover story which suggested that our research was about dog communication skills.This helped focus the session on the dog and take our speaker out of the spotlight.A low key and relaxed session gave us natural HDS.In order to elicit DDS, we encouraged Amelia to guide Floyd through different tasks.For instance, we carried out an object permanence test (Wynne 2001), where a treat is placed behind an obstacle for the dog to find.A successfully completed task provided instances of the approvals: "well done!" and "good boy", while misbehavior, such as when Floyd would not sit down or when he ate grass, which Amelia did not want him to do, provided the disapprovals: "no", "sit", and "don't eat the grass".
At first we coded Amelia's DDS into three speech acts: command, praise, and question.While 'command' covered the category imperatives (which Mitchell [2001] found to be particularly frequent in DDS), we needed another category to cover positive exclamations, which we called 'praise'.But we thought that this would still not be enough, so we added 'approval' and 'disapproval' to account for more utterances (for example "no", which is disapproving, but not really a command).For clarification purposes we decided only to use approval and disapproval and not to use command and praise, as using all four terms would be too redundant.All but two commands were disapproving, and all praises were approving.The exceptions were where the disapprovals were not commands, and the commands were approving.For example, Amelia said "Floyd" only to display dissatisfaction, not to order Floyd to do something.Similarly "go on" and "get your nose in" were commands, but they were encouraging.Any utterances that were neither disapproving nor approving were coded as ambivalent.These included "come on", "shake", "Floyd", "go on'', and "hey".
Another difficulty in coding our data was how to determine the speech act.It is possible to show disapproval in other ways than through words, in the same way that we ask each other questions by raising our eyebrows, or shrug our shoulders to say "I don't know".This led to some questions: should we code based on the meaning of an utterance (e.g., "don't eat the Grass!"), or on the context (Amelia says "Floyd" as he is running too far away from her, so this must be a command), or based on what Floyd did (Amelia says "sit" and Floyd sits)?
A combination of a visual and an audio analysis (both sound and video) seemed the best solution.Amelia's facial expression, for example, a frown, could be an indication of disapproval.Or we could understand that Amelia was disapproving of Floyd because of what she was doing, e.g., pulling at the lead because Floyd was running away.We coded "Floyd" as a disapproving utterance, even though the lexical content alone did not offer this analysis.
Another advantage of an analysis which brings in paralinguistic cues is that it became easier to avoid the interference of our own perception of pitch.We did not want to code an utterance as disapproving only because we perceived a low pitch, as this would make this a study of what we think about DDS rather than of what DDS really is.
We measured the minimum, maximum and mean F0 of all voiced utterances in the speech software, Praat (Boersma 2001).The only exception is that we only measured one speech act in the HDS, namely questions.We wanted to compare the two different styles in terms of pitch.We thought that one speech act would be sufficient in order to compare the two styles, but it turned out that there was only one speech act that existed in both the DDS and the HDS, and that was questions.When we measured the F0 of questions, we purposefully omitted the rising intonation at the end, as it would alter the mean F0 value of the total utterance.We did not code any of Amelia's speech that was interrupted by other sounds in the recording, whether by wind noise when we were outside, or by people speaking over her voice.
With pitch as our dependent variable, approval and disapproval became our independent variables.In the case of our control, one speech act, questions, remained constant, and the audience became the independent variable.

Results
We managed to collect a total of 108 dog-directed utterances (from 33 minutes of recordings).Floyd was not a very responsive dog, so there was not always much for Amelia to approve or disapprove of.There were only 16 approvals and 29 disapprovals.Examples of disapprovals are: "stay", "wait", and "no", while some examples of approvals are: "good boy!", "well done", and "oh, you're so close!".The plurality of utterances (0.33%) were questions.Altogether, we collected 36, examples of which are: "where are you going?", "are you ticklish?", and "where's the next treat?".The remaining 26 tokens of DDS were labelled "ambivalent".These included utterances such as the dog's name "Floyd" and several imperatives ("come on", "move")."Floyd" was the most common utterance overall, said 20 times."Come on" was said eight times.This confirms something else that was noted in previous studies: that repetitions of words, and especially repetitions of the dog's name, were typical of DDS (Mitchell 2001, Mitchell andEdmonson 1999).
As can be seen in Figure 2 (next page), the mean as well as the range of Dog-Directed Questions (DDQs) is notably higher than that of Human-Directed Questions (HDQs).Unfortunately, there were only 11 HDQs to compare with the 36 we found in the DDS.
Table 1 (next page) compares DDQs with HDQs in terms of mean F0, and maximum and minimum F0.The standard deviation of the mean F0 values (33.5 for HDQs and 46.4 for DDQs) tells us that the pitch of DDQs varied within a larger spectrum than HDQs. Figure 3 (below) shows the mean F0 of the different speech acts.The x-axis shows ambivalent speech acts in red, approvals in green, and disapprovals in blue.The y-axis shows the mean and range of F0.Ambivalent utterances have the widest range.The range of approving and disapproving utterances slightly overlaps: the mean F0 of approvals is higher than that of disapprovals.In Figure 4 (below) the Dog-Directed Questions are set against DDS Approvals and Disapprovals.The yaxis shows the mean and range of F0.Approvals have a higher F0 than disapprovals, which confirms our first hypothesis.Interesting, DDQs had a higher mean F0 than both approvals and disapprovals.Table 2: Average Mean F0 of disapprovals, approvals, ambivalent utterances, Dog-Directed Questions (DDQs) and Human-Directed Questions (HDQs).

Analysis
DDS is both a register that has a higher pitch than HDS and a register with a large pitch range.Our detailed analysis of pitch shows that, at least for this one speaker, DDS is different from HDS, not only because it has an overall higher pitch, but because of how much the pitch varies.This variation depends upon the utterance spoken.Yet the clear differences in pitch between DDS and HDS still overwhelm this variation.
Our results from the comparison of questions in both styles strongly indicate that DDS has a higher overall mean F0 than HDS.However, the only utterance type of HDS we considered was questions.Future studies would benefit from comparing all kinds of HDS utterances in order to draw this conclusion.What indicates that DDS has a higher pitch more than anything else is the fact that HDQs had a lower mean pitch than disapprovals, even though interrogatives tend to have a high F0 relative to other speech acts (Syrdal and Kim 2008).
We did not focus our study only on differences in pitch between DDS and HDS, because this is the one difference that has been studied most in previous literature; and the goal of the present study was to gain insight into the way that pitch varies within DDS and, in particular, about how speech acts determine this variation.The standard deviation of all dog-directed utterances was 62.96 Hz, which was quite high given that the normal female voice lies between 150 and 300 Hz (Aronson and Bless 2009:17).A good example of pitch range is the utterance of the dog's name, "Floyd".Its lowest F 0 value was 144 Hz (disapproval) and its highest 396 Hz (approval).What is interesting about this is that it really shows the communicative function of pitch.Amelia used pitch, not a different word, to indicate approval or disapproval.
We also discovered things about DDS that have nothing to do with pitch.The questions that Amelia asked Floyd were short and practical, such as: "what's this?" and "what's in there, Floyd?".The questions she asked us were completely different: we talked about life, family, and dogs, etc.; for example: "my mum's dog replaced me when I moved out".We found only two declaratives spoken to Floyd: "that's not your toy, that's your blanket" and "you're embarrassing yourself".We also found one dog-directed "hello".Unlike Mitchell (2001:195), who found 80.1% of DDS speech to be imperatives, we found most of our speech to be questions, which might be explained simply by Floyd being a better behaved dog.
This absence of declaratives might help explain why DDS has a higher pitch than that of HDS and a wider pitch range.According to Syrdal and Kim (2008), declarative sentences have neither especially high nor especially low F0 values in relation to all the other speech acts in their study.Interrogatives, on the other hand, have a high mean F0, second only to positive exclamations and repetitions.We think the absence of declaratives and the dominance of questions is one of the reasons why DDS is perceived to have such a high pitch.
In the next section I move on to discuss why the results are the way they are.Why should there be differences between the mean F0 of DDS and HDS?Why should the pitch be different in different speech acts (e.g., approval and disapproval)?
One possible interpretation is that an intraspeaker pitch difference between DDS and HDS reflects the role of the audience (Bell 1984).Variation in the pitch of speech acts, on the other hand, would indicate that it is not only the audience causing the high pitch and pitch range of DDS, but also the use of speech acts which are characterised by a high pitch.The analysis reveals that both reasons for pitch variation (a change in audience and a change in speech act), are tied to an understanding of how a dog audience affects a human speakera key point being that a change in audience is also what calls for the use of certain speech acts.Exactly how and why this effect happens is what I will discuss in the section below.

Discussion
Allan Bell asked the following question in his article on audience design: "What is it in the addressee that the speaker is responding to?" (1984:167).I ask the same question about Amelia and Floyd.Bell discusses how human audiences influence intraspeaker variation; for example, speakers "accommodate their speech style to their addressee" (1984:162).But speech is not the only thing that speakers respond to; Bell also says that speakers respond to "the personal characteristics of the addressee" (1984:162).Amelia may see Floyd as a friendly personality that she can joke around with, e.g., "are you ticklish?" and "get your nose in!".But more than anything else, Floyd is a pet to Amelia.Floyd is playful and sometimes misbehaves.Amelia is the adult who is there to direct his misbehaviour and encourage his good behaviour.
Because the relationship between pet and human quite resembles child and parent, I turn to the realm of speech directed to children for more understanding of speech directed to dogs.Adults change their language to accommodate to children's cognitive level, even when that audience is pre-linguistic.However, this does not happen in all cultures.According to Shneidman and Goldin-Meadow (2012:672), children receive "little input in directed speech early in life".They compared the input that children get growing up in the United States to that of children growing up in a Mayan village: "Unlike young children in the United States who receive the majority of their linguistic input in child-directed speech, Mayan children receive most of their input in overheard speech" (Shneidman and Goldin-Meadow 2012:670).This, they relate to the fact that adults do not see children as "valid conversation partners" and "rarely address children directly" (Shneidman and Goldin-Meadow 2012:660).
It is also questionable whether all cultures keep pets in the same way that western (wealthy) cultures do (see, for example, Francis 2009).All of the secondary literature I found on pet-directed speech is written by American or Canadian authors.Tannen, who did her research in the United States, suggests that the baby-like talk used to talk to dogs comes from the baby-like function they have in the home, and that "[e]ighty-three per cent [of pet owners] refer to themselves as their pet's mom or dad" (2004:401).Our study is also from a western country, Britain.Our speaker was brought up in Northern England.Not only does she come from a country where pets are part of the family, but her own family kept a dog since she was two years old.This means that she would be more used to talking to dogs than someone with less experience.
Although we have established that DDS is a culturally constructed phenomenon, the fact remains to be explained as to why it has become the way it is.Where does the high pitch come from?Why are disapproving utterances uttered with a low pitch?I offer two explanations: (1) The use of pitch in DDS comes from beliefs that dogs are capable of associating meaning to extreme sounds like high pitch and low pitch, despite not being able to speak or understand human language.
(2) Pitch is manipulated in DDS to achieve different personas which the owner embodies, and especially to execute power over the dog.
In order to expand on the first point, I bring into light prescriptivist literature which insists that dogs have a unique ear for pitch.In The Dog Rules: 14 Secrets to Developing the Dog YOU Want, Sundance (2009:191-2) encourages dog-owners to use a high-pitch voice ("happy voice") when "they are pleased with the dog" and to growl like a dog when they really want to enforce a command.According to Sundance, dogs associate high pitch with "reward and excitement" because dogs themselves make high-pitched sounds when they are "nonthreatening, peaceful or empathetic " (2009:191); low sounds, on the other hand, convey authority.Mitchell (2001:183) describes talking to dogs as "both reasonable and unreasonable".One of the "unreasonable" social norms is that they should be addressed in a way that invites a response.Owners ask dogs questions not because they expect an answer, but in order to "produce a conversation-like engagement" (Mitchell 2001:4).Amelia does this.She asks Floyd: "are you ticklish?","do you wanna take the blanket off?", and "are we going somewhere?".She knows that Floyd cannot answer: "yes please, I'd love to go to the park".Nonetheless, it does give the feeling of being with a conversation partner.
While some consider dog owners' expectations of dogs' cognitive abilities to be "unreasonable" (Mitchell 2001), others such as Sundance (2009:191) believe that vocal strategies are "intuitively understood" by the dog.Even if Amelia misjudges Floyd's cognitive abilities, her choice to vary her pitch to show approval and disapproval could be motivated by a belief she has about Floyd's ability to understand her.Floyd's responses to Amelia's commands (such as when he eventually sat down after Amelia said "sit") make it look like he really understands some of her speech.Taking a closer look at the situation, it is unlikely that he had much understanding.Amelia had to say many words: "Floyd, what's this, sit, sit, come on, no, sit, sit down, sit, sit" before he finally sat down.When Floyd did not react, Amelia started using words in different pitches as if he would understand this better (and maybe he did, because he eventually sat down).Now we come to the second point, which is that pitch could be used as a tool to help embody different personas.The Mexican language, Lachixío Zapotec, is a very good example of how this can be done.Lachixío Zapotec uses pitch as well as voice quality (e.g., breathy voice, creaky voice, and whisper) to indicate specific social roles or social actions.For example, a high-pitched voice indicates respect: "It is used when addressing God in prayer, […] when addressing deceased relatives", and "when speaking to elders" (Sicoli 2010:523).A low-pitched creaky voice, on the other hand, marks authority, and is part of a spectrum from low pitch to high pitch, where "relatively higher pitch marks higher-ranked social relations and increasingly lower pitches convey greater authority or influence over another speaker" (Sicoli 2010:530).
English also uses a low-pitched voice to show authority.According to Talbot (2010), male voice has an average fundamental frequency of somewhere around 100-150 Hz, which is lower than that of women (200-250 Hz) because of different sizes of the resonating chamber in the body.Gaudio (1994) relates the association with the deep male voice and authority to the powerful position that men have held in our society for hundreds of years.Lachixío Zapotec has an intrinsic use of pitch and voice quality quite different from English, but when it comes to low pitch and authority, they operate in the same way.
Power is an important player in the relationship between pet and owner.What was interesting about our case is that Amelia was not, in fact, the owner of Floyd.She mentioned that she was not very comfortable with telling the dog off or letting him off the lead, because she had not known him for long.This might make Amelia feel the need to perform a powerful personality, as she does not already embody this identity.She can do this by using a deep voice when disapproving.
Whether Amelia is adapting to Floyd's cognitive level, or whether she is adapting her speech because she is projecting a powerful personality, she is in both cases doing what other humans do when they talk to dogs.Coupland (1985:154) claims that speakers are motivated by adherence to social norms even if they are not "conscious" of them.When I asked Amelia to think about why she talked this way to Floyd after I had revealed to her the purpose of our study, she said exactly this: that she spoke this way to dogs because she had observed other people doing so.Coupland (1985:155) also states that "particular codes and/or styles are perceived to be appropriate to, even required by, particular situations".Amelia has learnt the code which is appropriate for speaking to a dog in a country where owners rule over their pets, but where they also see pets as companions that you form a relationship with.
Last but not least, we should not forget that we three students carrying out the research were present during the session.Not only were we overhearing Amelia, we were watching her dog do tricks, which put Amelia in the spotlight as the owner of the dog.We cannot know how Amelia would have spoken to Floyd had we not been there.

Conclusion
Dog-Directed Speech (DDS) is an interesting case of human behaviour that is completely common, yet completely absurd.As predicted, speakers style-shift when addressing their dogs by switching to using a higher pitched voice.This was seen in a comparison of the pitch of questions in both DDS and HDS.The results also showed considerable variation within DDS with respect to speech act.As hypothesised, approvals were higherpitched than disapprovals.All results were analysed with respect to two suggestions.First, the existence of this variation is grounded in cultural assumptions about what dogs can understand of human language.These include the idea that dogs understand high-pitched sounds to be positive, and low-pitched sounds to be negative.Secondly, human speakers associate a low-pitched voice with having authority, and so use a low-pitched voice when disapproving of the dog.This second explanation can help us disentangle some of the differences between DDS and HDS: to the human audience, Amelia did not feel the need to perform an authoritative persona, and to do so would be highly socially inappropriate.
If we had had more time, we would have liked to compare disapproving and approving utterances in DDS and HDS, as well as a larger data set (we only used 11 human-directed utterances).Then we could assess whether speakers exaggerate the low pitch to disapprove of dogs, or whether this low pitch only happens because of the reprimanding act.In general, future studies on DDS would benefit from having more participants, both human and canine, given that every speaker has a different background and a particular relationship with a particular dog.As Amelia was not Floyd's owner, she did not feel as comfortable talking to him and might have needed to perform an authoritative persona to a greater extent than someone who has been talking to their dog for years.More quantitative research on DDS will tell us more about how speakers accommodate to social norms concerning how to speak to dogs.

Figure 4 :
Figure 4: Mean fundamental frequency of approvals, disapprovals and questions in DDS.

Finally,
Table 2 (below) compares the mean F0 of all the groups that we tested.