Vocal and locomotor coordination develops in association with the autonomic nervous system

In adult animals, movement and vocalizations are coordinated, sometimes facilitating, and at other times inhibiting, each other. What is missing is how these different domains of motor control become coordinated over the course of development. We investigated how postural-locomotor behaviors may influence vocal development, and the role played by physiological arousal during their interactions. Using infant marmoset monkeys, we densely sampled vocal, postural and locomotor behaviors and estimated arousal fluctuations from electrocardiographic measures of heart rate. We found that vocalizations matured sooner than postural and locomotor skills, and that vocal-locomotor coordination improved with age and during elevated arousal levels. These results suggest that postural-locomotor maturity is not required for vocal development to occur, and that infants gradually improve coordination between vocalizations and body movement through a process that may be facilitated by arousal level changes.


Introduction
Vocal development is typically thought of as the adaptive coordination of the vocal apparatus (i.e., the lungs, larynx and the mouth) and the associated muscles and neural systems that influence its activity. In adult animals, however, vocal behavior does not occur in isolation of the operations of other motor systems. Studies investigating the real-time coordination between vocal and locomotor outputs show that some vocalizations can be produced concurrently with locomotor activity and/or postural changes, while others cannot (Suthers et al., 1972;Blumberg, 1992;Fusani et al., 1996;Williams, 2001;Wong and Waters, 2001;Holderied and von Helversen, 2003;Branchi et al., 2004;Cooper and Goller, 2004;Berg et al., 2013;Dalziell et al., 2013;Hoepfner and Goller, 2013;Ota et al., 2015;Alves et al., 2016;Laplagne and Elías Costa, 2016;Ullrich et al., 2016). For example, the 'A', 'B' and 'D' song types of the lyrebird (Menura novaehollandiae) co-occur with courtship dance wing flaps, while the 'C' type occurs when wings are still (Dalziell et al., 2013). In rats (Rattus norvegicus), 50 kHz ultrasonic vocalizations are produced during locomotor activity, while 20 kHz vocalizations occur when rats are immobile (Laplagne and Elías Costa, 2016). Similarly, bats in flight coordinate the production of echolocation sounds with particular phases of their wing beats (Suthers et al., 1972;Wong and Waters, 2001;Holderied and von Helversen, 2003). Thus, the adaptive coordination achieved during vocal development must also include other developing motor systems as important factors, notably those related to posture and locomotion. How this 'vocal-locomotor' coordination is accomplished over the course of development is not well understood.
In humans, current theoretical frameworks posit that the different motor systems are interactive over the course of development; one system can influence another in ways that change as a function of time (Thelen, 1991;Adolph, 2008;Iverson, 2010;Adolph and Robinson, 2015;Libertus and Hauf, 2017). In support of this, there is evidence that locomotor skills at one time point predict speech ability months or even years later (LeBarton and Iverson, 2013;Walle and Campos, 2014;Wang et al., 2014;He et al., 2015;LeBarton and Iverson, 2016;Libertus and Violi, 2016;Walle, 2016;Garrido et al., 2017;Libertus and Hauf, 2017;Salavati et al., 2017;West et al., 2019). However, only a handful of empirical studies investigated human infant vocal-locomotor coordination (Ejiri and Masataka, 2001;Fagan and Iverson, 2007;Abney et al., 2014;Berger et al., 2017). These studies showed that infant production of pre-linguistic vocalization is highly sensitive to body movement. For example, infants who are beginning to transverse their environment (i.e., crawling) are less likely to vocalize during locomotion than while they are sitting (Ejiri and Masataka, 2001;Fagan and Iverson, 2007;Abney et al., 2014;Berger et al., 2017). What is missing is an understanding of how vocal and locomotor outputs are coordinated in real-time in infants and how this coordination may change over the course of development.
Key to understanding these developmental dynamics is to also identify physiological conditions that may promote (or potentially inhibit) coordination between different motor outputs. One candidate is the state of arousal, a product of the autonomic nervous system and relevant for a range of behaviors (Pfaff, 2006). An animal would be said to exhibit a high arousal state if it is more alert to sensory stimuli, more motorically active and more reactive (Pfaff, 2006). The role of arousal is essentially to allocate metabolic energy (i.e., to prepare the body for action). As it relates to vocal production, arousal modulates respiration, which in turn provides the power for vocal output. Humans, for example, exhibit an increase in arousal-as measured by heart rate-prior to speaking (Lynch et al., 1980;Linden, 1987). In developing individuals, variable and spontaneous behaviors are ubiquitous, providing the scaffolding for more complex and organized behaviors later in life (Blumberg et al., 2013). These early behaviors, including vocal output and other bodily movements, primarily reflect the interplay between the infants' arousal states, sensorimotor coordination and biomechanical conditions (Robinson et al., 2000). Thus, investigating the relationship between arousal fluctuations and the development of vocal and locomotor behaviors may prove to be illuminating.
Using marmoset monkeys as a model, here we investigate the relationship between vocal and locomotor systems and arousal levels during postnatal development. In the vocal domain, infant marmosets spontaneously produce sequences of immature and mature vocalizations, and these are linked to real-time changes in arousal levels (Zhang and Ghazanfar, 2016). Over the course of approximately two months, infants exhibit changes in the acoustic properties of their vocalizations that reflect a transition from producing mostly immature-sounding contact calls (e.g., cries) to mature-sounding contact calls (e.g., phees) (Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Teramoto et al., 2017;Zhang and Ghazanfar, 2018). As in humans (Goldstein and Ja, 2008), this transition is facilitated by, and dependent upon, social reinforcement from parents (Takahashi et al., 2015;Gultekin and Hage, 2017;Takahashi et al., 2017;Gultekin and Hage, 2018). Moreover, these parallels with human prelinguistic development occur in the same life history stage (early infancy) (Ghazanfar and Liao, 2018). In the postural and locomotor domains, marmoset monkey development transitions from immature to mature forms in a pattern that is also similar to human development (e.g., righting reflex before sitting, and crawling before walking) (Wang et al., 2014;Braun et al., 2015;Schultz-Darken et al., 2016).
We address three fundamental questions: (1) Does one motor system -vocal or postural-locomotor -mature first or do they follow an overlapping trajectory? (2) How are vocal-postural-locomotor systems coordinated and do these coordination dynamics shift across development? (3) How do real-time fluctuations in arousal relate to vocal production, locomotion and their coordination?

Results
During their first two months of postnatal life, we measured infant marmoset behavior in a controlled context for 10 minutes approximately every 2 days. In each session, individuals were placed in a testing box in an experiment room that was outside visual and auditory range of their family groups. This brief 'isolation' context is a standard testing paradigm used to elicit vocalizations (Takahashi et al., 2015;Zhang and Ghazanfar, 2016) and to study the postures and locomotion of marmoset infants (Wang et al., 2014;Braun et al., 2015). The subjects were seven marmosets (three females) from three different parental pairs (two sets of twins, one set of triplets). We recorded the behaviors of each subject across~30 sessions (28-33 per subject for a total of 220 sessions).
For vocal behaviors, we focused on two types of contact calls -cries and phees ( Figure 1A). As described previously (Takahashi et al., 2015), cries are immature contact calls that have a short duration and noisy spectral properties (i.e., high Wiener entropy); phees are mature-sounding contact calls that have a longer duration and tonal spectral properties (i.e., low Wiener entropy). Cries transform into phees over the course of development (Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Takahashi et al., 2017).
We measured five types of postural behaviors -righting reflex, head raising, forelimb support, hindlimb support, and hanging (Wang et al., 2014;Braun et al., 2015) ( Figure 1B). The righting reflex is when infants re-establish their body orientation so that their hands and feet are on the ground; head raising is when infants lift their head off the ground and look forward or up; forelimb support is when infants sit on the ground with their hands touching the ground; hindlimb support is when infants sit on the ground with their hands off the ground; hanging is when infants grasp elements in their environment (e.g., bars of the testing box) so that their hands and feet do not touch the ground.
We measured five types of locomotor behaviors -crawling, digging, jumping, climbing, and walking (Wang et al., 2014;Braun et al., 2015) ( Figure 1C). Crawling is when infants move forward on the ground with their stomach touching the ground; digging is when infants move their hands back and forth across the ground; jumping is when infants push themselves off the ground or cage to move from one location to another; climbing is when infants traverse across the cage; walking is when infants traverse across the ground in a standing orientation.
Finally, for all seven infants, we concurrently measured arousal levels by acquiring heart rates during the sessions using non-invasive surface electrocardiography (Borjon et al., 2016;Zhang and Ghazanfar, 2016).

The vocal system matures before postural and locomotor systems
By measuring both vocal and postural-locomotor behaviors longitudinally in developing marmosets, we determined how these motor systems changed relative to one another. We first classified each behavior as immature or mature by measuring how their use shifted across development. In the vocal domain, the proportion of time producing cries decreased across development, while the proportion of time producing phees increased ( Figure 2A; Table 1; Appendix 1.1) (Takahashi et al., 2015;Zhang and Ghazanfar, 2016). As indicated by physiology and biomechanics (Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Zhang and Ghazanfar, 2018), cries were categorized as an immature contact call, and phees were categorized as a mature version of the contact  Table 1; Appendix 1.2). Thus, righting reflexes were categorized as an immature posture, and hindlimb support was categorized as a mature posture. The proportion of locomotor time engaged in crawling decreased across development, while the proportion of locomotor time engaged in walking increased ( Figure 2A; Table 1; Appendix 1.3). Thus, crawling was categorized as an immature locomotor behavior, while walking was categorized as mature. These postural-locomotor classifications for developing marmosets are consistent with previous findings (Wang et al., 2014). We used the immature-mature classifications to summarize developmental changes using a 'maturity index' (see Materials and methods)-a value that represents the proportion of mature behaviors observed relative to all immature and mature behaviors across postnatal days. Values below 0.5 indicate that immature behavior is more common, and values above 0.5 indicate that mature behavior is more common. The population-level developmental trajectories of all three motor systems (vocal = 192 observation days, postural = 201 observation days, locomotor = 191 observation days) followed s-shape patterns that ended in maturity indices around one, which represents adult-like motor outputs ( Figure 2B). We tested whether the developmental time courses of the vocal system and postural-locomotor systems overlap ( Figure 2C). The null hypothesis is that vocal and postural-locomotor systems transition from immature to mature forms around the same time. This would suggest that the development of the different motor systems reflect a global process of neural or physiological maturation (Gesell, 1929;McGraw, 1943). The alternative hypothesis is that the postural-locomotor system develops either before or after the vocal system. If postural-locomotor behaviors were to mature first, it would suggest that a developed post-cranial body is a prerequisite for producing mature contact calls.
Supporting the 'vocal behavior develops first' hypothesis, contact calls transitioned to a more mature form around 10 postnatal days, while postural and locomotor behavior transitioned around postnatal days 19 and 21, respectively ( Figure 2D). A generalized linear mixed model (GLMM) showed that the relationship between maturity indices and postnatal day fits a logistic regression, with maturity indices increasing with age (n = 519 observation days, b ±SE = 0.21±0.03, z = 6.33, p<0.0001; Appendix 1.4). The same model showed that the vocal maturity indices were larger than the maturity indices of postural (b ±SE = À2.03±0.58, z = 3.48, p=0.0005; Appendix 1.4) and locomotor behavior (b ±SE = À2.86±0.98, z = 2.93, p=0.0034; Appendix 1.4) ( Figure 2E). In other words, vocal development occurred earlier than postural and locomotor development. However, simply because vocal-postural-locomotor systems follow different trajectories does not mean that they do not interact in real-time. The question of whether vocal-locomotor coordination changes over the course of development is addressed next.
Mature contact call production and locomotor activity become increasingly coordinated during development Infant marmosets require considerable muscular effort to produce mature contact calls (phees) (Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Zhang and Ghazanfar, 2018), and adult marmosets tend to produce mature contact calls during periods of reduced locomotor activity (Borjon et al., 2016). Therefore, our overarching hypothesis was that locomotor activity inhibits Table 1. Results of linear mixed models (LMMs) to test whether proportion of time spent in vocalpostural-locomotor behaviors changes with postnatal day. For each model, the proportion of vocal, postural, or locomotor time (per postnatal day) spent engaged in a behavior is the dependent variable, postnatal day is the fixed effect, and infant identity is the random effect. For each behavior category, a Bonferroni-Holmes correction was applied to adjust p-values. mature contact call production. Using linear mixed effect models (LMMs), we found initial support for this hypothesis when examining the relationships between vocal acoustic parameters, locomotor activity and postnatal day ( Figure 3A,B). Call duration was negatively associated with locomotor activity (n = 9609 calls, b ±SE = À0.56±0.13, t = 4.25, Bonferroni-Holm adjusted p=0.0001; Appendix 1.5) while Wiener entropy (higher entropy means noisier) was positively associated with locomotor activity (n = 9606 calls, b ±SE = 0.93±0.12, t = 7.68, Bonferroni-Holm adjusted p<0.0001; Appendix 1.6). In other words, on average, infant marmosets produced short, noisier calls when locomotor activity was high, and longer, more tonal adult-like calls when locomotor activity was reduced. Next, we tested three hypotheses about the developmental dynamics of vocal-locomotor coordination. We know that by 1-2 months of age, marmosets only produce mature sounding contact calls (Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Zhang and Ghazanfar, 2018). One hypothesis is that they simply learn to stop moving when they need to produce a contact call. Doing so would eschew any potential physiological constraints on the production of mature sounding contact calls. In such a scenario, we would predict the number of contact calls produced during movement would decrease, while those produced during periods of immobility would increase. An alternative hypothesis is that as marmosets grow bigger, they become more capable of producing mature sounding contact calls while moving. This outcome would suggest that infants overcome any potential physiological constraints on vocal-locomotor coordination. In such a scenario, we would predict that the number of contact calls produced during movement would increase, while those produced during periods of immobility would decrease.
To test the above hypotheses, we first needed to estimate locomotor activity on a continuous scale. These estimates were extracted from frame-by-frame pixel differences in the video footage of the sessions (Figure 3-figure supplement 1). We first summarized the temporal real-time dynamics of locomotor activity surrounding vocal production events ( Figure 3C). We found that locomotor activity started to increase above the 95% threshold of the bootstrap significance test~5 s before the immature call onsets and then decrease back to inside the 95% threshold~10 s after call offsets. In contrast, locomotor activity started to decrease below the 95% threshold of the bootstrap significance test~8 s before mature contact call onsets and then increase back to inside the 95% threshold~10 s after call offsets. A LMM confirmed that the average locomotor activity was higher during immature contact calls as compared to mature contact calls (n = 9609 calls, b ±SE = À0.07±0.01, t = 5.19, p<0.0001; Appendix 1.7). When these temporal dynamics were mapped across postnatal days, we found that locomotor activity during immature contact call production remained around or above the randomized expected levels of locomotor activity. In contrast, early in postnatal life, mature contact call production occurred when locomotor activity was below the 95% threshold of the bootstrap significance test; however, their production gradually became coordinated with increased levels of locomotor activity ( Figure 3D). Locomotor activity during mature contact calls was higher during late development (PND 52-61) as compared to early development (PND 1-10) (Early = 914 calls; Late = 932 calls; b ±SE = 0.04±0.01, t = 3.05, p=0.0056; Appendix 1.8).
These data therefore support the alternative hypothesis, that the potential constraints of producing mature contact calls during movements are mitigated as the infants get older. Thus, even though postural-locomotor maturity does not appear to be a prerequisite for vocal development (Figure 2), vocal-locomotor coordination is still an important component of marmoset monkey motor development. This finding then raises the question of how real-time fluctuations in physiological condition may predict call production and locomotor activity in developing infants. Presumably, infants in elevated states of arousal are more likely to engage in these motor behaviors. The question of whether temporal associations between motor output and arousal levels change over the course of development is addressed next.

Mature call production and locomotion occurs during elevated arousal levels
We first tested hypotheses about the developmental dynamics of arousal state during marmoset contact call production. One very basic hypothesis is that the production of mature contact calls requires elevated arousal levels more than does the production of immature contact calls (Teramoto et al., 2017). Being in an elevated arousal state also means that individuals may have more respiratory power needed to generate mature sounding calls (Borjon et al., 2016;    . Another hypothesis (that doesn't preclude the first one) is that mature contact calls are more likely to occur during elevated arousal levels earlier in development. This would be consistent with computational models that suggest that younger infants are unable to effectively coordinate the elements of their vocal apparatus to produce mature contact calls and may require enhanced respiratory power to do so (Takahashi et al., 2015;Teramoto et al., 2017); indeed, there is now empirical support for the link between respiratory power and mature contact call production (Zhang and Ghazanfar, 2018). Conversely, a third hypothesis is that mature contact calls are more likely to occur during elevated arousal states later in development than earlier. This could occur if, as the infants get older, their overall motivation changes (i.e., less stressed by isolation) and so higher arousal levels are needed to motivate vocal production. An elevated arousal state may also enable respiratory power and laryngeal tension (Zhang and Ghazanfar, 2016); this could lead to better coordination between vocal production and locomotion ( Figure 3C,D). We found that mature contact calls occurred during elevated levels of arousal, a pattern that become more pronounced as development proceeded. A summary of the real-time dynamics of heart rate fluctuations in the 10 s before to 10 s after vocal production revealed that marmosets produced immature contact calls when heart rate percentiles dropped below the 95% threshold of the bootstrap significance test ( Figure 4A). Production of mature contact calls, on the other hand, occurred when heart rate percentiles went above the 95% threshold of the bootstrap significance test. In both cases, changes in arousal levels were 'global', meaning that the change in arousal occurred well before (at least 10 s) the start of the call. A LMM confirmed that heart rate percentiles were higher during mature contact calls than during immature contact calls (n = 6215 calls, b ±SE = 3.38±1.45, t = 2.32, p=0.0276; Appendix 1.9). The developmental dynamics of arousal levels during vocal production further supported this main effect ( Figure 4B). Early in postnatal life, heart rate percentiles during immature contact calls were inside the 95% threshold of the bootstrap significance test, but at around two weeks, heart rates during immature contact calls started to decrease below the 95% threshold. In contrast, mature contact calls were produced when heart rate percentiles were at, or above, the 95% threshold of the bootstrap significance test during the first two months of postnatal life. There was a marginal increase in heart rate percentiles during mature contact calls during late development (PND 52-61) as compared to early development (PND 1-10) (Early = 490 calls; Late = 780 calls; b ±SE = 6.28±3.48, t = 1.81, p=0.0795; Appendix 1.10).
We next tested hypotheses about the developmental dynamics of arousal during locomotor activity. One basic hypothesis is that locomotion occurs during elevated arousal states, which is a pattern well supported by studies of heart rate during physically demanding tasks (Rotstein et al., 2004;Baker et al., 2008). Another hypothesis (that is not mutually exclusive of the first one) is that locomotor activity is more likely to occur during elevated arousal levels earlier in development. One interpretation of this result is that, as with vocal production, younger infants are less able to coordinate their body movements and require enhanced respiratory power to do so. Conversely, a third hypothesis is that locomotor activity is more likely to occur during elevated arousal levels later in development than earlier. As with vocal production, this could occur if, as the infants get older, their overall motivation changes (i.e., less stressed by isolation) and so higher arousal levels are needed to motivate movement. An elevated arousal state may also enable respiratory power and laryngeal tension to enable better coordination between vocal production and locomotion ( Figure 3C,D).
Like vocal production, we found that locomotor activity occurred during elevated arousal levels, a pattern that appeared to become more pronounced later in development. A summary of the realtime dynamics of heart rate fluctuations in the 10 s before to 10 s after locomotion events revealed that these events occurred when heart rate percentiles were elevated above the 95% threshold of the bootstrap significance test ( Figure 4C). As with vocal production, this change in arousal was 'global', meaning that the change in arousal happened well before (at least 10 s) the start of movement. The developmental dynamics of arousal levels during locomotor activity further supported this main effect result ( Figure 4D). Early in postnatal life, heart rate percentiles during locomotor activity were within the 95% threshold of the bootstrap significance test, but continued to increase throughout the first two months of postnatal life. There was a marginal increase in heart rate percentiles during locomotor events during late development (PND 52-61) as compared to early development (PND 1-10) (Early = 752 calls; Late = 566 calls; b ±SE = 8.12±4.04, t = 2.01, p=0.0580; Appendix 1.11).  Our data suggest that mature contact call production and locomotor activity are both associated with elevated levels of arousal, an association that becomes more pronounced with age. As such, arousal state may be an important predictor of whether infant marmosets coordinate these different motor outputs. We tested this idea next.

C
Coordination of mature contact call production with locomotor activity occurs during elevated levels of arousal Over the course of infant marmoset development, both locomotor activity and arousal levels predicted whether an infant produces a mature contact call over an immature call. Low levels of locomotor activity support mature contact call production early in development, and elevated arousal levels support mature contact call production later in development. The missing piece of the puzzle is whether there is a connection between vocal-locomotor coordination and arousal state. Here, we test, and find support for, the hypothesis that arousal levels during mature contact call production are elevated when marmosets are moving around ( Figure 5A). Locomotor activity during mature contact call production was positively associated with heart rate percentiles (n = 3966 calls, b ±SE = 9.66±4.05, t = 2.39, p=0.0208; Appendix 1.12; Figure 5B). In other words, this result suggests that individuals in an elevated arousal state are better able to coordinate mature vocal production with locomotion. Specifically, this positive association characterized infants that were one month old (1-30 days; n = 1705 calls, b ±SE = 16.32±6.60, t = 2.47, Bonferroni-Holm adjusted p=0.0382; Appendix 1.12), but not two months old (31-61 days; n = 2261 calls, b ±SE = 2.63±5.24, t = 0.50, Bonferroni-Holm adjusted p=0.6220; Appendix 1.12). This means that a positive association between locomotor activity and heart rate percentiles was not simply a by-product of age. Instead, there may be a particularly robust relationship between locomotor activity and heart rate during early infanthood when individuals are transitioning from producing immature to mature contact calls.

Discussion
Vocal development is a dynamic process that involves the interaction of multiple systems, from the biomechanics and muscles of the vocal apparatus to the nervous system and the social environment (Thelen, 1991;Teramoto et al., 2017). Using developing marmoset monkeys, we sought to understand how these processes related to vocal output are influenced by other systems of motor behavior, specifically posture and locomotion. First, we examined when the transition from immature to mature calls occurs relative to the transition from immature to mature body postures and locomotion. Second, we examined the putative temporal coordination of vocal production with locomotor activity, and whether such coordination changes over the course of development. Finally, we investigated whether fluctuating arousal levels (estimated from heart rate) predicts vocal and locomotor output. We found that marmoset monkey vocalizations develop sooner than postural-locomotor skills, that locomotor activity gradually becomes coordinated with the production of mature-sounding contact calls, and that this vocal-locomotor coordination occurs during elevated levels of arousal.

Head-to-tail sequence of development
The development of the vocal and postural-locomotor systems in marmoset monkeys, at a first approximation, seems to follow similar trajectories (Wang et al., 2014;Takahashi et al., 2015). By investigating both motor systems in the same individuals longitudinally, however, we showed that vocal behavior matures sooner than either posture or locomotion. Marmosets transitioned to producing a higher proportion of mature-sounding contact calls (phees) than immature-sounding contact calls (cries) around postnatal day 10, whereas posture and locomotion transitioned to more mature forms around days 18 and 21, respectively. This finding implies that the ability to produce adult-like sounds is not contingent upon advanced postural and locomotor control. These findings also parallel the developmental sequence of human prelinguistic vocal output, posture, and locomotion. Human infants begin producing babbling sounds (consonant-vowel combinations) at around 2 to 4 months (Vihman, 2014), unsupported upright sitting around 4 to 5 months (Adolph, 2008), and walking without support around 11 months (Adolph, 2008). Our data suggest that, like humans and other animals (Starck and Ricklefs, 1998;Muir, 2000;Adolph, 2008), marmoset monkey motor development takes the form of a head-to-tail sequence.
Why do marmoset monkey and human infants both transition to producing more adult-like sounds before they transition to sitting in upright postures and walking? One explanation is that an initial investment in the vocal system allows infants to elicit caregiver attention (e.g., carrying and food sharing) more effectively, thereby prolonging the amount of time they need to develop their locomotor autonomy. This explanation makes sense given the developmental strategy (i.e., altricial) and social system (i.e., cooperative breeding) of humans and marmosets (Snowdon, 1996;Hrdy, 2007;Solomon and French, 2007;Burkart et al., 2009;Lukas and Clutton-Brock, 2012). For altricial species like marmoset monkeys (relative to other nonhuman primates), early development is an energetically costly period because individuals are unable to fully regulate their body temperature, and yet, altricial infants must invest energy to grow at a high rate and refine their motor skills (Rosenblatt, 1976;Case, 1978;Derrickson, 1992;Blumberg and Sokoloff, 1998;Starck and Ricklefs, 1998;Muir, 2000;Blumberg, 2001;Schilling, 2005;Price and Dzialowski, 2018). In marmosets and humans, locomotor and physiological constraints (such as control of arousal levels) may be overcome by receiving care and physical contact from both maternal and non-maternal adults (Case, 1978;Snowdon, 1996;Hrdy, 2007); infants elicit such contact and care by producing vocalizations (Locke, 2006;Zuberbühler, 2012;Ghazanfar and Takahashi, 2017). And yet, not all vocalizations are created equal. Previous work in marmosets and humans suggests that more adult-like sounds elicit caregiver attention better than do immature sounds (Gros-Louis et al., 2006;Takahashi et al., 2016). By investing first in developing mature-sounding contact calls, infants may be 'buying the time' they need to learn how to move about independently in their environment. Week 1 Week 2 Week 3 Week 4-6 Week 7-9

A B
Locomotor Locomotion as a potential constraint on vocal development The ability to coordinate biomechanical features of the vocal system -breathing, thoracic pressure and vocal fold tension -is critical for producing species-typical vocalizations (MacLarnon and Hewitt, 1999;Maclarnon and Hewitt, 2004;Takahashi et al., 2015;Zhang and Ghazanfar, 2016;Teramoto et al., 2017). In developing marmosets, a mismatch between biomechanical dynamics of the vocal system is thought to generate immature cries instead of mature contact calls (Teramoto et al., 2017). Our study indicates that locomotor activity is one force that can potentially disrupt vocal production. Higher levels of body movements co-occurred with the production of immature-sounding contact calls (i.e., those with a short duration and high entropy), while lower levels of movement co-occurred with mature-sounding contact calls with long durations and lower entropy. These results are consistent with what is known about how movement affects respiration. Vigorous motor activities, like running, speed up breathing cycles (Wasserman et al., 1973;Bramble and Carrier, 1983), resulting in articulation deficits (Sundberg et al., 1991;Price et al., 2006;Baker et al., 2008;Orlikoff, 2008). The finding that calls produced during movement were shorter and noisier indicates that very young marmosets lacked adequate respiratory power (Zhang and Ghazanfar, 2018).

Arousal levels during vocal-locomotor coordination
A unique aspect of our study design is that we could test how real-time fluctuations in arousal related to vocal production, locomotion, and the coordination of these different motor outputs. Infant marmosets produced mature-sounding contact calls during elevated arousal levels and immature ones during low arousal levels, a finding that is consistent with previous work on infant and adult marmosets (Borjon et al., 2016;Zhang and Ghazanfar, 2016). Similar to mature-sounding contact calls, infant marmosets also tended to engage in locomotion during elevated arousal states. Moreover, real-time variability in arousal state indicated whether mature-sounding contact calls cooccurred with locomotor activity. Infant marmosets exhibited elevated arousal levels when maturesounding contact calls were produced during movement, and decreased arousal levels when such calls were produced during periods of immobility. These results are consistent with the hypothesis that elevated arousal may help to overcome physiological demands (e.g., respiratory power) of producing vocalizations while moving. Similar associations have been observed in humans engaged in physically demanding tasks (Rotstein et al., 2004;Baker et al., 2008). For example, heart rate increases at a faster rate during sustained exercise in adults who are engaged in a speech task than in adults engaged in a non-speech task (Baker et al., 2008). Our study design precludes testing causality but does suggest that arousal state is a key player in the coordination between vocal and locomotor systems during development. The inhibition of motor activity during mature contact call production shifted over the course of development. That is, by the time that infant marmoset monkeys stopped producing immaturesounding calls, they no longer showed decreased movement during mature contact call productions. This developmental shift suggests that marmosets gradually improved their ability to coordinate locomotor behaviors with vocal production. From a biomechanical perspective, this improvement suggests that marmosets acquired the ability to better control the vocal apparatus during movement. From a neural perspective, this improvement suggests that more experience with vocal behaviors and/or locomotion leads to better coordination between these motor systems. In either case, arousal state appears to be the common currency by which vocal-motor coordination emerges. One intriguing possibility is that these shifting dynamics of the autonomic nervous system create the scaffolding by which mature social behavior can emerge (Porges and Furman, 2011).
Despite being identified as a critical line of inquiry over 30 years ago (Tipps et al., 1981;Yingling, 1981), to date, only a handful of empirical studies investigated infant vocal-locomotor coordination (Ejiri and Masataka, 2001;Fagan and Iverson, 2007;Abney et al., 2014;Berger et al., 2017). Our study using marmoset monkeys as a model for developmental processes represents one of the first to integrate longitudinal and second-by-second timescales to investigate vocal development from a 'whole-body' perspective. We believe that this timescale integration is key to characterizing the fine-grained dynamics that dictate how mature vocalizations emerge. Trade-offs in vocallocomotor coordination is a potential dynamic that needs to be reckoned with as individuals grow up. By concurrently measuring heart rate, we show that processes related to autonomic arousal may enable individuals to cope with this trade-off.

Subjects and housing
The subjects used in this study were seven (three females) infant common marmosets (Callithrix jacchus) from three different parental pairs (two sets of twins and one set of triplets,<2 months old). Subjects were born in captivity and lived with their family groups (mother, father and siblings). The colony room was maintained at a temperature of approximately 27˚C with 50-60% relative humidity, and a 12:12 hr light-dark cycle. All subjects had ad libitum access to water and were supplied daily with standard commercial chow supplemented with fruit and vegetables.

Experimental design
The experimental protocol follow methods previously described (Borjon et al., 2016;Zhang and Ghazanfar, 2016). Infant marmosets were separated from their parents and placed in a testing box in an experiment room. The triangular, prism-shaped testing boxes were made of Plexiglas and wire (0.30 m x 0.30 m x 0.35 m). All observation sessions were conducted during daylight hours between postnatal day 1 to 61, and each observation session lasted 10 minutes. Subjects participated in a total of 220 observation sessions across the first two months of life (Subjects 1-7: 29, 29, 34, 34, 31, 31, 32). All experimental procedures were performed in compliance with the guidelines of the Princeton University Institutional Animal Care and Use Committee.

Vocal behavior data collection
Undirected vocalizations (i.e., socially isolated) were recorded using a Sennheiser MKH416-P48 microphone suspended 0.9 m above the testing box. The microphone signal was sent to a Mackie 402-VLZ3 line mixer whose output was relayed to a Plexon Omniplex and PC computer. We used the same custom MATLAB software established in previous research for computationally defining and segmenting infant marmoset vocalizations (Takahashi et al., 2015;Zhang and Ghazanfar, 2016). A researcher manually verified which calls were one of two types of contact calls -cries and phees ( Figure 1A). As described previously (Takahashi et al., 2015), cries are contact calls that have a short duration and noisy spectral properties (i.e., high Wiener entropy); phees are contact calls that have a longer duration and tonal spectral properties (i.e., low Wiener entropy). We recorded audio for 192 of the 220 observation sessions (Subjects 1-7: 28, 29, 21, 20, 31, 31, 32) for a total of 10,956 calls (Subjects 1-7: 1,100, 801, 1,604, 1,021, 1,657, 2,199, 2,574). A custom-made MATLAB routine calculated two main acoustic properties for each call, the duration and Wiener entropy . Wiener entropy is a non-positive number that is calculated by taking the logarithm of the ratio between the geometric and arithmetic means of the values of the power spectrum for different frequencies (Tchernichovski et al., 2000). Wiener entropy represents the broadband properties of a signal's power spectrum in which the closer the signal is to white noise, the higher (closer to zero) the entropy value.

Postural and locomotor behavior data collection
Postural-locomotor behavior was video recorded at 30 fps with a Plexon Cineplex. We recorded video for 215 of the 220 observation sessions and identified a total of 3195 instances of specific postural-locomotor behaviors 635,414,574,207,427,276). We manually scored the recorded videos to identify the onset and offset of behaviors using BORIS, an open-source eventlogging software for video coding (Friard and Gamba, 2016). As the video frame rate is at 30 Hz, the onset and offset of behaviors had a maximum resolution of 1/30 s. Our definitions of posturallocomotor behaviors were based on prior literature (Wang et al., 2014). During video coding, frames without a clear posture or locomotion described in our ethogram were not assigned a behavior.
Postural behaviors were defined as instances that infants spent repositioning itself: forelimb support, hanging, hindlimb support, raising head, and righting reflex ( Figure 1B). The righting reflex is when infants re-establish their body orientation so that their hands and feet are on the ground; head raising is when infants lift their head off the ground and look forward or up; forelimb support is when infants sit on the ground with their hands touching the ground (or cage); hindlimb support is when infants sit on the ground with their hands off the ground (or cage); hanging is when infants grasp the cage so that their hands and feet do not touch the ground. Locomotor behaviors were defined as instances when infants traversed the testing box: crawling, digging, jumping, climbing, and walking ( Figure 1C). Crawling is when infants move forward on the ground with their stomach touching the ground; digging is when infants move their hands back and forth across the ground; jumping is when infants push themselves off the ground or cage to move from one location to another; climbing is when infants traverse across the cage; walking is when infants traverse across the ground in a standing orientation.
Continuous variability in locomotor activity (i.e., body movement) was assessed by investigating pixel differences in the video data (Figure 3-figure supplement 1), following methods used in earlier research (Borjon et al., 2016). Each video recording was split into segments of 30 frames (each 640 vs 400 pixels). We took the absolute difference of RGB values between the first and last frame of every second and divided by the total number of pixels. This value corresponded to the average difference in luminescence per pixel per second. A higher value indicates more pixel difference, signifying movement. Because absolute levels of locomotion differ across individuals and ages (e.g., movement of larger individuals causes larger pixel differences), it was necessary to re-scale locomotor activity levels. To do this, we converted all 1 s movement values to binary values with a 90 th percentile threshold. Then, we use csaps in MATLAB to fit a cubic spline (smoothing parameter of 0.10) to the binary values in each 10 min observation session. In other words, locomotor activity ranged between zero (immobile) and one (mobile) so that comparisons could be made across individuals and ages.

Electrocardiography data collection
To quantify arousal fluctuations, we recorded heart rate for 149 of the 220 observation sessions 14,21,19,26,25,28). To record electrocardiographic (ECG) signal, we used two pairs of Ag-AgCl surface electrodes (Grass Technology) (Figure 4-figure supplement 1). Tethered electrodes were sewn into a soft elastic band, which was clasped around the animal's thorax. One pair of electrodes was positioned on the dorsal thorax, and the other pair was positioned on the ventral thorax. We applied ECL gel on the surface of each electrode to improved signal-to-noise ratio. Infants were shaved around the thorax if needed. Each pair of electrodes was differentially amplified (x250) with the resulting signal sent to the Plexon Omniplex, where it was digitized at 40 kHz and sent to a personal computer for data acquisition. The strength of heart rate signals varies throughout observation sessions as animal movement alters the positioning of surface electrodes. As such, we chose the channel with the largest signal-to-noise ratio on a session-by-session basis. We manually identified and isolated motion artifacts or signal cutoffs. To minimize bias, we did the following for each observation session: signals from both ECG channels were divided into 10 s segments and signal pairs were presented in random order for visual inspection. Regions exhibiting signal loss were replaced with NaNs.
Following the visual screening, we down-sampled data from 40 kHz to 1,500 Hz to extract the cardiac signals. These signals were high-pass filtered at 15 Hz to preserve the rapid waveform of the heartbeat. The resulting signal was notch filtered at 60 Hz. Heart beats were detected using an adaptive threshold of 1 s duration to find cardiac spikes greater than the 95 th percentile of the amplitude at each second of the signal. Occasionally, an artificial spike close to the actual heartbeat would be detected or a heartbeat would be missed. As such, we set inter-spike interval thresholds of 100 ms (600 beats/min) to 400 ms (150 beats/min), which are thresholds used in a previous study of marmoset autonomic activity (Borjon et al., 2016). If an inter-spike interval was less than 100 ms, the two spikes were substituted with a single spike located at the midpoint. If an inter-spike interval exceeded 400 ms, we replaced the signal with a NaN. To calculate heart rate, we constructed a binary series of heartbeat counts and convolved the resulting series with a 1 s Gaussian window. We only used heart rate data from observation sessions during which heart rate could be detected at least 50% of the time. Because heart rate can differ across individuals and ages, it was necessary to re-scale heart rate fluctuations so that comparisons could be made across individuals and ages. To do so, we converted all 1 s heart rate values to percentiles for each 10 min observation session (i.e., heart beat fluctuations were centered around the 50 th percentile). In other words, heart rate percentile values ranged from zero (lowest heart rate level) to 100 (highest heart rate level) within each 10 min observation session.

Data analysis
All analyses were carried out in MATLAB (version R2019a) and R (version 3.5.3). To determine which motor behaviors were categorized as 'immature' and 'mature', we calculated the proportion of vocal, postural, or locomotor time engaged in specific behaviors. We used a series of linear mixed effect models (LMM, 'lmer' of the R package 'lme4'; Bates et al., 2015) to test whether the proportion of time engaged in these behaviors changed based on postnatal day. We used the 'lmerTest' R package to determine the significance of the coefficients (Kuznetsova et al., 2017). In the LMMs to examine how vocal behavior changes across development, the dependent variable was the proportion of total vocal time (per daily observation session) engaged in a specific vocalization (cry or phee), the fixed effect was postnatal day, and the random effect was the infant subject (LMM equation: proportion of time~day + (day|subject)). In the LMMs to examine how postural behavior changes across development, the dependent variable was the proportion of total postural time (per daily observation session) engaged in a specific posture (forelimb support, hanging, hindlimb support, raising head, or righting reflex), the fixed effect was postnatal day, and the random effect was infant subject (LMM equation: proportion of time~day + (day|subject)). In the LMMs to examine how locomotor behavior changes across development, the dependent variable was the proportion of total locomotor time (per daily observation session) engaged in a specific posture (climbing, crawling, digging, jumping, or walking), the fixed effect was postnatal day, and the random effect was infant subject (LMM equation: proportion of time~day + (day|subject)). We applied the Bonferroni-Holm method to correct for issues of multiplicity within each behavior type (vocal, postural, or locomotor behaviors), resulting in adjusted p-values with an alpha threshold level of 0.05. We report detailed outcomes of these regression models (e.g., model formulas, random effect variance, regression coefficients, standard errors, t-values, and p-values) in Appendix 1.1-1.3. Immature behaviors were those whose use decreased across development, and mature behaviors were those whose use increased across development. We used these immature-mature categories to calculate 'maturity indices' per session. This index ranged from 0 to 1 and was calculated as follows, where m represents the percent of time spent engaged in mature behavior and im represents the percent of time engaged in immature behavior. A maturity index value less than 0.5 means that an individual produced more immature behavior, and a value greater than 0.5 means that an individual produced more mature behavior. Cubic splines (MATLAB csaps function) were fit to individual data (smoothing parameter of 0.03) and population data (smoothing parameter of 0.01) to determine the developmental trajectories of vocal, postural and locomotor maturity indices. Then, we used a logistic generalized linear mixed effect model (GLMM, 'glmer' of the R package 'lme4'; Bates et al., 2015) to test whether maturation time courses differed between vocal and postural-locomotor behaviors. We used the 'lmerTest' R package to determine the significance of the coefficients (Kuznetsova et al., 2017). In this GLMM, the dependent variable was maturity index (per daily observation session, per behavior type-vocal, postural, locomotor), the fixed effect was postnatal day, and the random effect was infant subject (GLMM equation: maturity index~behavior type +day + (behavior type|subject) + (day|subject)). In Appendix 1.4, we report detailed outcomes of this regression model (e.g., model formulas, random effect variance, regression coefficients, standard errors, t-values, and p-values), as well as the outcomes of this model per infant subject.
We sought to understand the real-time and developmental dynamics between vocal production and locomotor activity, as well as between arousal fluctuations and vocal-locomotor behavior. Then, to visualize the real-time dynamics of locomotion during call production, we extracted locomotor activity from À20 to 5 s surrounding call onsets and À5 to 20 s surrounding call offsets. To visualize the real-time dynamics of arousal fluctuations surrounding vocal-locomotor events, we extracted heart rate percentiles from À10 to 5 s surrounding call (or locomotor activity) onsets and À5 to 10 s surrounding call (or locomotor activity) offsets. To summarize the real-time dynamics for individual sessions, we fit a cubic spline to the session data (MATLAB csaps, smoothing parameter of 0.1), and then we fit a population spline to all session splines (smoothing parameter of 0.3). To visualize developmental dynamics, we used the average locomotor activity (or heart rate percentile) during each call (or locomotor activity event) to calculate the mean locomotor activity (or heart rate percentile) per individual session. Then, we fit cubic splines (smoothing parameter of 0.0001) to individual sessions to model the change in the population data across postnatal days.
To test the statistical significance of real-time and developmental dynamics, confidence intervals for the population splines were generated by randomly resampling (10 samples per session) from the signals used to generate individual session splines (real-time dynamics) or session averages (developmental dynamics). We fit a cubic spline to the resampled data to generate a population spline, and repeated the process 1000 times. The 95% confidence interval corresponds to the 2.5 th and the 97.5 th percentiles of the resampled population splines. To determine whether the real-time or developmental dynamics of locomotor activity and arousal fluctuations were significantly different from null expectations, we performed bootstrapped significance tests. For each session, we scrambled the order of call durations and inter-call interval durations. This allowed us to choose random segments equivalent in the number and length to the calls produced in that session while maintaining naturalistic spacing between the calls. We fit a cubic spline to session splines (for temporal analysis) or session averages (for developmental analysis) to generate bootstrapped population splines. We repeated this process 1000 times. The 95% threshold for significance corresponds to the 2.5 th and 97.5 th percentiles of the bootstrapped population splines. With this bootstrap procedure, we are taking into account data variability due to day, subject and call timing on the statistical estimates.
To complement our bootstrap procedure, we used a series of linear mixed effect models (LMM, 'lmer' of the R package 'lme4'; Bates et al., 2015) to investigate associations between call production, locomotor activity, and arousal fluctuations. We used the 'lmerTest' R package to determine the significance of the coefficients (Kuznetsova et al., 2017). We used LMMs to investigate how average locomotor activity during a call predicted call duration and Wiener entropy. In these LMMs, the dependent variable was contact call acoustic parameter (call duration or Wiener entropy), the fixed effects was average locomotor activity (per call), and the random effect was infant subject nested within postnatal day (LMM equation: acoustic parameter~locomotor activity + (locomotor activity|day/subject)). We applied the Bonferroni-Holmes method to correct for multiplicity issues associated with testing two acoustic parameters, resulting in adjusted p-values with an alpha threshold level of 0.05. In Appendix 1.5-1.6, we report detailed outcomes of these regression models (e. g., model formulas, random effect variance, regression coefficients, standard errors, t-values, and p-values), as well as the outcomes of these models per postnatal day and infant subject. To complement the real-time dynamic analyses, we used LMMs to compare locomotor activity and ANS activity between contact call types. For the LMM examining locomotor activity, the dependent variable was the average locomotor activity (per contact call), the fixed effect was contact call type (immature vs. mature), and the random effect was infant subject nested within postnatal day (LMM equation: locomotor activity~call type + (call type|day/subject)). For the LMM examining ANS activity, the dependent variable was the average heart rate percentile (per contact call), the fixed effect was contact call type (immature vs. mature), and the random effect was infant subject nested within postnatal day (LMM equation: heart rate~call type + (call type|day/subject)). In Appendices 1.7 and 1.9. we report detailed outcomes of these regression models for example model formulas, random effect variance, regression coefficients, standard errors, t-values, and p-values), as well as outcomes of the models per postnatal day and infant subject.
We also used LMMs to complement the developmental dynamics analyses. We ran an LMM to examine developmental changes in locomotor activity during mature contact calls, in which the dependent variable was the average locomotor activity (per contact call), the fixed effect was age group, and the random effect was infant subject nested within postnatal day (LMM equation: locomotor activity~age group + (age group|day/subject)). Age group was split into early development (PND 1-10; infant contact calls transition to sounding more adult-like around PND 10) and late development (PND 52-61). In Appendix 1.8, we report detailed outcomes of this regression model, as well as outcomes for each infant subject separately. We ran an LMM to examine developmental changes in ANS activity during mature contact calls, in which the dependent variable was the average heart rate percentile (per contact call), the fixed effect was age group, and the random effect was infant subject nested within postnatal day (LMM equation: heart rate~age group + (age group| day/subject)). In Appendix 1.10, we report detailed outcomes of this regression model, as well as outcomes for each infant subject separately. We also ran an LMM to examine developmental changes in ANS activity during locomotor activity events, in which the dependent variable was the average heart rate percentile (per locomotor event) the fixed effect was age group, and the random effect was infant subject nested within postnatal day (LMM equation: heart rate~age group + (age group|day/subject)). In Appendix 1.11, we report detailed outcomes of this regression model, as well as outcomes for each infant subject separately.
Finally, we used a LMM to investigate how ANS activity during mature contact calls is associated with vocal-locomotor coordination. The dependent variable of this LMM was average heart rate percentile (per mature contact call), the fixed effect was average locomotor activity (per mature contact call), and the random effect was infant subject nested within postnatal day (LMM equation: heart rate~locomotor activity + (locomotor activity|day/subject)). We also ran this model on two separate subsets of the data, the first month of development (postnatal day 1-30) and the second month of development (postnatal day 31-61). We applied a Bonferroni-Holmes method to correct for issues of multiplicity due to testing different time frames separately, resulting in adjusted p-values with an alpha threshold level of 0.05. In Appendix 1.12, we report detailed outcomes of these regression models (e.g., model formula, random effect variance, regression coefficients, standard errors, t-values, and p-values), as well as the outcomes for each postnatal day and subject separately. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Linear Regression Models
We report below the model formulas, random effect variance, estimated regression coefficients, standard errors, test statistics and p values of the models reported in the main text.
1.1 Linear mixed models to predict the proportion of vocal time spent producing specific contact calls (cries or phees) from postnatal day. Results are associated with Figure 2A 1. 1.5 Linear regression models to predict contact call duration from locomotor activity. Results are associated with Figure 3B 1 1.8 Linear regression models to predict locomotor activity during mature contact calls from age group (early vs late development). Results are associated with Figure 3D 1 1.9 Linear mixed models to predict heart rate percentile during contact calls from contact call type. Results are associated with Figure 4A 1.9.1 Population-level linear mixed model results 1.11 Linear regression models to predict heart rate percentile during locomotor events from age group (early vs late development). Results are associated with Figure 4D 1.11.1 Population-level linear mixed model results