The State of Assessment in Human-Animal Interaction Research

1 Center for the Human-Animal Bond, Department of Comparative Pathobiology, College of Veterinary Medicine, Purdue University, IN, USA, 2 Departments of Psychiatry and Pediatrics, School of Medicine, University of Colorado Anschutz Medical Campus, CO, USA, 3 Center for the Interaction of Animals and Society, Department of Clinical Sciences & Advanced Medicine, University of Pennsylvania School of Veterinary Medicine, PA, USA, & 4 Center to Study Human-Animal Relationships and Environments, School of Public Health, University of Minnesota, MN, USA

In this paper, we refer to assessment tools as methods of gathering and interpreting data aiming to quantify the relationship between one or more constructs of interest.Examples of constructs examined in HAI can be individual attributes in humans (e.g., extraversion), attributes in animals (e.g., affiliative behaviors), or attributes of the relationship between a human and an animal (e.g., attitude towards pets).Constructs themselves are often not directly observable, but they can be quantified through the measurement of a set of behaviors or variables (Cronbach & Meehl, 1955).In turn, theories are refined by testing hypotheses about the relationship between various constructs of interest.In order to draw scientifically valid conclusions, constructs must be adequately measured by the assessment tool with regards to validity (e.g., the extent to which a tool assesses what it intends) and reliability (e.g., the ability of the tool to produce consistent results across raters and time, Cronbach & Meehl, 1955;Kimberlin & Winterstein, 2008).
The scarcity of valid and reliable assessment tools is repeatedly discussed in critical evaluations of HAI research (Esposito, McCune, Griffin, & Maholmes, 2011;Herzog, 2015;McCune et al., 2014).In addition, concerns regarding the practice of using inconsistent assessment and its implications for slowing the advancement of the field of HAI are often pointed out by systematic literature reviews on animalassisted interventions (AAI) for various populations, including patients with intellectual disabilities (Maber-Aleksandrowicz, Avent, & Hassiotis, 2016), dementia (Bernabei et al., 2013), and autism spectrum disorder (Davis et al., 2015;O'Haire, 2016).Specifically, there is a large amount of heterogeneity in how outcomes are assessed.These variations are so large that findings from systematic reviews are often limited to speculative interpretation, with results that are not comparable across studies.In fact, a systematic review of randomized controlled trials on the effects of animalassisted therapy (AAT) could not perform a meta-analysis partly due to this problem (Kamioka et al., 2014).Reliable and consistent assessment is necessary to build an empirical evidence base so that studies can be compared and replicated (Asendorpf et al., 2013).
The goals of this paper are to provide an overview of the importance of assessment tools both broadly and in the context of HAI research and to provide HAI researchers with information necessary to select appropriate assessment methods for their research.Specifically, this paper provides an overview of the state of assessment in HAI research while describing (1) current common assessment techniques in HAI (2) potential measurement issues in their reliability and validity and (3) future directions towards refinement and selection of measurement in HAI.This discussion will focus on three frequently used broad categories of assessment: (I) questionnaires, (II) physiological measures, and (III) behavioral observation.

Questionnaires
Questionnaires can include selfreport or report by others (e.g.caregivers, teachers, clinicians), and can be used to collect information about a wide variety of subjective experiences such as attitudes, emotions, thoughts, beliefs and perceived mental and physical health.As many psychological constructs of interest to HAI researchers are true measures of selfexperience (e.g.life satisfaction, pain, or emotional affect), self-report methods in the form of both qualitative and quantitative assessment are essential to the field.With relatively low costs and resources needed to carry out successful data collection, questionnaires are the most frequently used form of outcome assessment in HAI (O'Haire et al., 2018).

Developing and adapting questionnaires for use in HAI
A questionnaire is considered standardized when all questionnaire items are asked in the same way and order to all study participants, allowing for replication and consistency across studies.In the early years of HAI research, standardized questionnaires for constructs specific to HAI were scarce.Therefore, researchers commonly established new questionnaires or adapted previouslyexisting questionnaires for the needs of their individual studies.This resulted in a large pool of heterogeneous tools to study the human-animal relationship (McCune et al., 2014).A 2012 inventory of questionnaires used in HAI research identified 140 scales in total, including an array of measures of attitudes towards animals, attachment to animals, and bonding scales (Wilson & Netting, 2012).While these questionnaires may be pertinent to HAI-specific constructs, a large majority of these measures lack evaluations of construct validity, or the extent to which a questionnaire is able to accurately measure the construct that it is designed to measure (Cronbach & Meehl, 1955;May, Seivert, Cano, Casey, & Johnson, 2016).
To aid in establishing construct validity in a new tool, researchers developing or adapting questionnaires for use in HAI research should consider the intended theoretical framework in mind.For example, one of the most popular tools for assessing the human-canine relationship is the Monash Dog-Owner Relationship Scale (MDORS;Dwyer, Bennett, & Coleman, 2006).The development of the MDORS was based on Social Exchange Theory which posits that a relationship will continue to persist if the benefits of the interactions outweigh the costs (Blau, 1964;Netting, Wilson, & New, 1987).Thus, the measure aims to quantify both the perceived costs of the relationship and the emotional closeness of the relationship to assess both the positive and negative aspects of dog ownership (Dwyer et al., 2006).In addition, it incorporates a third subscale to examine the moderating effects of doghuman interactions occurring in the relationship.Thus, the MDORS is an example of how the development of questionnaires for specific use in HAI may be guided by a theoretical basis.
When a new questionnaire is developed for use, it is important to assess the validity metrics to prevent a misalignment of questionnaires with the construct intended to be measured.There are two general types of construct validity that can be measured in order to evaluate a new assessment tool: convergent validity and discriminant validity (Campbell & Fiske, 1959).Convergent validity represents the extent to which a questionnaire correlates or agrees with other assessment tools intending to measure the same construct of interest.In HAI questionnaire assessment, tests of convergent validity are common.For example, the recently developed Pet Attachment and Life Scale (PALS; Cromer & Barlow, 2013) was tested for convergent validity with the Lexington Attachment to Pets Scale (LAPS; Johnson, Garrity, & Stallones, 1992) and the Companion Animal Bonding Scale (Poresky, Hendrix, Mosier, & Samuelson, 1987) via factor analysis and intercorrelations, verifying that PALS was measuring a similar construct with these measures.On the other hand, discriminant validity (or the extent to which a questionnaire does not correlate with assessment tools that do not measure the construct of interest) is not as often assessed in the field of HAI, but is additionally valuable to provide a balanced perspective of a questionnaire's construct validity (Campbell & Fiske, 1959).With the continuous and rapid development of questionnaire measures in HAI (Wilson & Netting, 2012), it is important that researchers verify these aspects of validity (e.g.The Human-Animal Interaction Scale; Fournier, Berry, Letson, & Chanen, 2016;Animal Attitude Scale;Herzog, Grayson, & McCord, 2015).

Using existing questionnaires for HAI
For constructs related to human health and wellbeing, existing questionnaires with well-established psychometric properties, including construct validity, exist in the fields of psychology and medicine.Standardized clinical questionnaires have been used to assess the effects of AAIs on a range of populations comparing outcomes to other types of interventions or treatments in meta-analyses (e.g., Virués-Ortega, Pastor-Barriuso, Castellote, Población, & de Pedro-Cuesta, 2012).For instance, the Aberrant Behavior Checklist-Community (ABC-C; Aman, Burrow, & Wolford, 1995) has been well-researched in the field of psychopharmacology for individuals with autism spectrum disorder (ASD) and has been demonstrated to translate well to research on animal-assisted intervention in this population (e.g., Gabriels et al., 2012;Gabriels et al., 2015).In this case, the significant reductions in ABC-C irritability symptoms resulting from a 10-week randomized trial of therapeutic horseback riding (Gabriels et al., 2015) was almost half the effects of the irritability reductions in an 8-week medication trial using this same measurement tool with a same-age ASD population (R. Owen et al., 2009).This enabled inferences to be made that therapeutic horseback riding has the potential to be a complementary or alternative intervention to medications for ASD (Arnold, 2015).
Therefore, the use of existing standardized questionnaires is crucial to move the HAI field forward to understanding its efficacy in relation to other mainstream or established interventions by generating results that are comparable across research disciplines.

Learning about animals through questionnaires
One use of questionnaires specific to HAI research is to collect information about companion and service animals from humans who interact with them often, such as their handlers, trainers, or owners.These questionnaires can be used to assess an animal's health or its behavioral/ temperament traits.Information about the animal is obtained indirectly through the reporting of a third party in these questionnaires, creating a risk that the collected data may be biased by the subjective perceptions of the raters.For example, an owner may have an inaccurate representation of their animal's behavior if the owner is inexperienced at recognizing certain stress or fear behaviors in their animal.A comparable situation in human research is the use of parent-or teacherreports to assess traits in children.Research shows that parent or teacher-reports about a child are not always well-correlated with the direct report from a child (Phares, Compas, & Howell, 1989), indicating that researchers should rigorously assess the risk of bias when using data provided by a third person.
To parse out subjective bias, investigators developing questionnaires that capture animal behaviors should prioritize psychometric evaluation of their instrument.For example, the Canine Behavioral Assessment and Research Questionnaire (C-BARQ; Hsu & Serpell, 2003) is designed to collect owner-reports of dog behavior.The C-BARQ demonstrates good convergence with the clinical assessments of veterinary behaviorists (Hsu & Serpell, 2003), predictive validity of the success of assistance dogs in training (Duffy & Serpell, 2012), and inter-rater reliability between multiple members of the same household assessing the behavior of the same dog (Jakuba et al., 2013).In this regard, the C-BARQ is a good example of a questionnaire with demonstrated validity and reliability that can be used to collect information regarding canine behavior.When psychometric properties are properly evaluated, questionnaires can be important sources of information about animal behavior and health.When evaluating constructs that may be more subjective and thus more prone to bias, such as animal welfare, we recommend that investigators endeavor to combine questionnaires with the use of direct physiological and/or behavioral assessment.

New technology in questionnaires
Recent progress in new technologies has greatly improved questionnaire assessments by increasing their ecological validity (i.e.generalizability to real world settings), shortening the amount of time necessary for research participants to complete questionnaires, and reducing data entry errors (Gibbons, 2016).One example of an emerging technology in questionnaire assessment is electronic data collection, which can help avoid skipped responses, prevent missing data or out-of-range responses, and provide standardized scoring in real-time.A 2008 meta-analysis comparing self-reported treatment efficacy outcomes from 46 clinical trials found a high correlation (0.90) between paper and computer instruments, providing evidence for their equivalency (Gwaltney, Shields, & Shiffman, 2008).HAI researchers aiming to capture self-report before, during, or after an intervention may benefit particularly from including technology-based assessment.
One development in questionnaire technology that has been advantageous in numerous fields is crowdsourcing technology such as Amazon Mechanical Turk (MTurk).MTurk provides the ability for researchers to administer a questionnaire to over 500,000 individuals across 190 countries who are paid in small monetary amounts for completion of tasks, such as surveys.This method can provide researchers with large and diverse samples (Casler, Bickel, & Hackett, 2013) as well as fast and inexpensive data collection (Buhrmester, Kwang, & Gosling, 2011).A recent 2015 analysis estimated that a typical research group can reach a subject pool of 7,300 workers per quarter year (Stewart et al., 2015).Thus, MTurk can be advantageous to HAI researchers aiming to quantify perceptions, opinions, and beliefs (e.g.Mills, Robbins, & von Keyserlingk, 2016;Rabbitt, Kazdin, & Hong, 2014) or cross-cultural comparisons (e.g.Ruby, Heine, Kamble, Cheng, & Waddar, 2013).While using online crowdsourcing can sometimes pose threats to validity of the data (e.g.respondents who provide false demographics to fit inclusion criteria, or participants who rapidly advance through surveys without full attention), several guides exist to provide researchers with best practices to aid in minimizing these issues (e.g.Sheehan, 2017).
Ecological momentary assessment (EMA) is another recently developed form of questionnaire that attempts to minimize the risk of recall-bias present in traditional questionnaires collecting retrospective data (Shiffman, Stone, & Hufford, 2008).Instead of completing assessment tools in one session and answering questions about actions or feelings over the past few weeks, research participants are prompted to respond to questions of interest during their day-to-day lives.This could take multiple forms, from a diary updated at regular time intervals, to the use of smartphone applications and reminders to respond to research questions throughout the day.The use of EMA is particularly relevant to HAI research studies that are interested in daily life experiences, such as interactions with companion or service animals.For example, one 2017 study used EMA via a mobile phone app to assess the physical activity and emotional affect of 71 dog owners (Liao, Solomon, & Dunton, 2017).Participants were prompted randomly eight times per day for 12 days to report on their current activities ("what were you doing before the beep went off?"), their current emotional states, and if they were with their pet dog.By asking the participant to remember actions from a recent and short period, this technique reduces recall-bias and enhances the accuracy of the collected data.Liao et al.'s (2017) study, among others, demonstrate the feasibility and potential of this promising approach to HAI research.

Physiological Measures
Physiological assessment can provide an objective measure of how situations differentially affect the internal state of human and animal research participants.A range of techniques exist to help investigators explore indicators of sympathetic and hypothalamic-pituitary-adrenal (HPA) axis activity through endocrine, cardiovascular, and skin conductance measurements.In addition, the evaluation of both daytime and nighttime activity via actigraphy offers a unique objective assessment in which to better understand how animals may impact human health.In this section, we review and offer suggestions for use of common physiological measures assessed in HAI research including assessment of cardiovascular, physical, electrodermal, and endocrine activity.

Cardiovascular activity
Measures of cardiovascular health and activity have been widely used in the field of HAI to explore a range of both theoretical and applied research questions.Examples of these measures include blood pressure, heart rate, and heart rate variability; these signs of arousal and sympathetic-toparasympathetic nervous system balance can be measured in both humans and most animals included in HAI research (Schreiner, 2016).By quantifying these physiological metrics, researchers in HAI have begun to examine the mechanistic bio-behavioral underpinnings of human-animal interaction (e.g., Barker, Knisely, McCain, Schubert, & Pandurangi, 2010;Polheber & Matchock, 2013).For example, laboratory studies have quantified the associations of positive HAI with cardiovascular activity in humans (e.g., Polheber & Matchock, 2013), animals (e.g., McGreevy, Righetti, & Thomson, 2005), and both humans and animals (e.g., in humans stroking horses; Hama, Yogo, & Matsuyama, 1996).Objective measures such as heart rate have also been used to quantify the temporal calming aspects of AAI in populations with high arousal levels (e.g., Krause-Parello & Friedmann, 2014).In studies outside of the laboratory, measures such as systolic and diastolic blood pressure can be analyzed cross-sectionally in order to examine the impact of pet ownership on overall cardiovascular health (e.g.Parslow & Jorm, 2003).
Although valuable, these physiological assessment methods are not without their limitations.Measurement often requires complementary self-report or behavioral assessment to confidently interpret directionality, which can require a more complex study design.For example, without instantaneous self-report or behavioral data, it is unknown if a higher heart rate indicates a state of excitement (positive relation) or a state of fear (negative relation).In addition, measurement of the autonomic nervous system, especially heart rate variability, can be complicated by large inter-individual variation and a range of external factors such as time of day, use of nicotine and caffeine, and physical activity levels as well as measurement error in the device itself that need to be adequately controlled for (Quintana & Heathers, 2014).Reviews of the literature have found that many studies examining the relationship between cardiovascular health and pet ownership lack the methodological rigor necessary to draw valid conclusions, including failure to control for confounding variables and inadequate sampling techniques (Cutt, Giles-Corti, Knuiman, & Burke, 2007).Future HAI research incorporating assessment of cardiovascular activity will benefit from aiming to address as many of these methodological and confounding limitations as possible to generate valid and conclusive data.
Reliable measures of cardiovascular activity have become easier and less expensive in recent decades.While equipment size has previously made longterm studies of cardiovascular health costly and cumbersome, new technologies are making the recording of cardiovascular activity more portable and affordable.During the past decade, wearable wristband and chest monitors have been developed to collect moment-to-moment heart rate, heart rate variability, and blood pressure.With the latest design improvements, these wearable monitors are unobtrusive, lightweight, and can be worn as comfortably as a watch.However, researchers should be cautious of devices that have not been independently validated; while a device may be cheaper, it may also be less accurate resulting in invalid study conclusions (O'Brien et al., 2000).This is especially the case for commercially available devices that may be cost-effective and familiar to participants, but not wellvalidated for use in research (e.g.Fitbit HR, Apple Watch, Wang, Blackburn, Desai, & et al., 2017).

Physical activity
Physical activity has long been an important area of assessment for understanding the relationship between humans and their companion animals.Specifically, the greater physical activity associated with dog walking has been a popular research topic in recent decades (for review see Christian et al., 2013).With emerging actigraphy technology to measure activity, researchers can now objectively quantify physical activity rather than relying solely on self-report.For example, Owen et al. (2010) used wearable activity monitors to measure the physical activity (mean daily activity, mean daily steps, and activity per minute) of over 2,000 children in three different countries as part of the Child Heart and Health Study in England project.By examining daytime physical activity of children in families with and without a pet dog, researchers could determine the specific areas of activity affected by dog ownership in a large and diverse sample (C.G. Owen et al., 2010).In addition to daily activity, recent research has also used actigraphy to examine the effect of animal-assisted therapy on nighttime activity and sleep quality (Swall, Fagerberg, Ebbeskog, & Hagelin, 2014).
Activity trackers have become lighter, cheaper, and more easily used for large-scale research than ever before (Shih, Han, Poole, Rosson, & Carroll, 2015).However, as is the case with wearable technology measuring cardiovascular activity, HAI researchers using activity monitors need to be cautious of devices without established validity (Tryon, 2004).While commercially available trackers may be well-validated for step counts (Kooiman et al., 2015), they may produce unreliable data for metrics such as energy expenditure and sleep efficiency (Evenson, Goto, & Furberg, 2015).Further, care must be taken to ensure that changes in activity are directly related to the variable of interest (e.g.dog ownership) rather than confounding factors (e.g.healthrelated conditions motivating activity changes).For researchers interested in sleep actigraphy, it is vital that devices be carefully chosen based on their convergent validity with laboratory-measured sleep metrics (e.g.Jean-Louis, Kripke, Cole, Assmus, & Langer, 2001).Researchers must also be aware that participants must keep a sleep log (e.g., time got into bed, time fell asleep, times woken up) to enable researchers to accurately interpret the objective actigraphy data (Littner et al., 2003).
Concomitant with improvements in technology and application of ambulatory devices, researchers in both HAI as well as in veterinary and animal behavior fields have begun to non-invasively and reliably measure actigraphy in animals.For example, several small and unobtrusive devices have been validated to reliably measure physical activity in dogs via collar and harness attachments (e.g.Westgarth & Ladha, 2017;Yam et al., 2011).Measuring animal activity along with human activity may lead to a new understanding of bidirectional health associations in HAI research.In a recent study, Patel et al. (2017) placed wearable devices on 40 adults and 40 of their pet dogs to objectively examine the impact that sharing a bedroom or a bed has on both human and dog sleep; findings indicated that canine sleep outcomes were unaffected by position, but human sleep efficiency was significantly lower when the dog was in the bed (Patel et al., 2017).Future studies may also benefit from using actigraphy to study the activity, sleep, and welfare of animals used in AAI such as service dogs (Burrows, Adams, & Millman, 2008).

Electrodermal activity
Electrodermal activity (EDA), also referred to as skin conductance or galvanic skin response, has long been a frequently used psychophysiological measure in the field of psychology (Boucsein, 2012).While some sweat glands mostly respond to thermal stimuli (i.e., increases in temperature), the activity of eccrine sweat glands is typically associated with emotional arousal (van Dooren, de Vries, & Janssen, 2012).Measuring the electrodermal activity in areas of the body close to eccrine sweat glands is thus an effective way to measure emotional arousal in humans.In HAI research, electrodermal activity has been used in the laboratory to measure the arousal of humans when viewing videos of animals in distressing situations as an indicator of empathy (Westbury & Neumann, 2008), and to evaluate the arousal of children with autism spectrum disorder during animalassisted intervention (O'Haire, McKenzie, Beck, & Slaughter, 2015).Similar to cardiovascular activity, technological advances have replaced cumbersome monitors with lighter, wearable devices that can measure electrodermal activity with good precision and minimal burden for the participant (e.g.Garbarino, Lai, Bender, Picard, & Tognetti, 2014).However, while these devices enable researchers to collect real-time physiological activity in a noninvasive and unobtrusive manner, most devices are still relatively expensive which may limit their widespread use.
Although electrodermal activity is a common measure of arousal in humans, it remains challenging to measure in animals and its validity varies highly with the species studied.In particular, the function and process of sweating vary widely between mammal species commonly included in HAI (Allen & Bligh, 1969).One study found that when sheep are petted by a familiar human, they show a decrease in skin humidity correlating with behavioral signs of wellbeing, implying that sheep in a more positive emotional state may show decreased electrodermal activity (Reefmann, Wechsler, & Gygax, 2009).This study suggests that the use of skin humidity or conductance has potential to serve as a non-invasive measure of emotional arousal in some species.However, in other species in which electrodermal activity is not a validated measure of arousal state, other physiological assessment tools such as heart rate variability may be more appropriate for use (e.g.dogs, Katayama et al., 2016;farm animals, Von Borell et al., 2007).

Endocrine activity
Endocrine activity can complement other physiological activity measures by quantifying the levels of diverse hormones in the human body.Cortisol, an indicator of arousal and stress, and oxytocin, a hormone involved in affection and bonding, are two particular hormones popularly discussed with respect to HAI research (Beetz, Uvnäs-Moberg, Julius, & Kotrschal, 2012;Uvnäs-Moberg et al., 2011).
The hormone cortisol is produced in the adrenal cortex as part of the stress response system regulated by the hypothalamus-pituitary adrenal (HPA) axis (Stratakis & Chrousos, 1995).As a measure of both chronic and acute stress, cortisol is widely used in HAI research as a psychophysiological biomarker of humananimal interaction (Beetz et al., 2012).Cortisol has been used to measure the stressattenuating effect of companion animals on humans (e.g., Kertes et al., 2016) and in studies of the welfare of animals included in animal-assisted intervention (e.g., Koda, Watanabe, Miyaji, Ishida, & Miyaji, 2015;A. McCullough et al., 2017).While in the past, the inclusion of cortisol in research has required invasive procedures such as blood collection, breakthroughs in salivary bioscience have made measurement noninvasive and less costly (Dreschel & Granger, 2016).Samples can even be taken in the home environment, as cortisol in saliva is stable and can be transported and mailed at room temperature (Clements & Parker, 1998).However, obtaining, assaying, and analyzing salivary cortisol data requires careful consideration of a number of factors (Levine, Zagoory-Sharon, Feldman, Lewis, & Weller, 2007), especially in studies involving children (Harmon, Hibel, Rumyantseva, & Granger, 2007).Cortisol concentrations can be significantly influenced by a large number of state and trait factors that need to be addressed for their confounding potential in statistical analyses (Hellhammer, Wüst, & Kudielka, 2009).To address this, HAI researchers are encouraged to follow the methodological standards suggested by expert guidelines (e.g.Stalder et al., 2016).
Alongside cortisol, oxytocin is growing in its inclusion in HAI research to explore the human-animal bond due to its key role in affection and bonding in mammalian species.Oxytocin has historically been measured from blood samples, but protocols have recently been developed to measure oxytocin in saliva and urine in both humans (e.g., Francis, Kirkpatrick, de Wit, & Jacob, 2016;Grewen, Davenport, & Light, 2010) and dogs (MacLean et al., 2018).While the validity of the interpretation of salivary oxytocin can be controversial (e.g., Horvat-Gordon, Granger, Schwartz, Nelson, & Kivlighan, 2005;M. E. McCullough, Churchland, & Mendez, 2013), the development of salivary sampling methods has a strong potential to reduce the stress of intravenous oxytocin sampling procedures and to positively impact animal welfare.The use of oxytocin in particular may be a good indicator of bonding between humans and dogs (Beetz et al., 2012), as the level of this hormone has been shown to increase in both owners and dogs during mutual eye contact (Nagasawa et al., 2015), and after a few minutes of talking to and stroking a dog (Handlin et al., 2011).However, consideration needs to be given to other bonds and mood-altering lifestyle factors (e.g.pregnancy) when determining the HAI association.

Behavioral observation
Because of its focus on inter-species interactions, HAI research is inherently behavioral.Behavioral observation is uniquely complex in the field of HAI as it often involves the measurement of both human and animal behavior to quantify and describe the interactions taking place.While behavioral observation has been used less widely in HAI research compared to questionnaires (O'Haire et al., 2018), it is an important assessment that provides critical information to complement questionnaire and physiological assessments (Baumeister, Vohs, & Funder, 2007).

Like
questionnaires, several standardized behavioral paradigms already used in other fields can be adapted and reinterpreted for HAI research.The use of standardized behavioral testing may contribute to the theoretical understanding of the relationships between humans and animals.Although most standardized behavior tests must be done in a laboratory or experimental setting, this offers a controlled environment to test hypotheses and paradigms while minimizing confounding factors that may occur in the home environment.For example, in the field of psychology, the Strange Situation Procedure (Ainsworth, 1979) is a standard measure of infantile attachment; in a series of separations and reunions, behaviors of a child and its caregiver are evaluated.The Strange Situation has similarly been used in HAI research to examine a dog's behavior when separated from, and reunited with, its owner to examine the correlates of attachment in human-canine relationships (Konok, Dóka, & Miklósi, 2011;Palmer & Custance, 2008;Rehn & Keeling, 2016;Topál, Miklósi, Csányi, & Dóka, 1998).
In addition to the field of psychology, many behavioral tasks in the fields of animal cognition and behavior may be reinterpreted to evaluate the human-animal bond.Specifically, performance in a task may inform investigators about the efficiency of the communication between an individual and the animal rather than about the cognitive capacities of the animal.For example, Hall and colleagues (2016) examined the potential relationship between a dog's point-following behavior and a child's level of attachment to their dog.Results indicated that children who reported greater feelings of attachment to their pet dogs had dogs that were more likely to successfully follow their pointing gesture to locate hidden food (Hall, Liu, Kertes, & Wynne, 2016).Utilizing these types of behavioral paradigms may help HAI researchers further understand the complexities of the human-animal bond.

Human behavior in HAI
HAI investigators may be interested in collecting behavioral data from a range of situations in which animals and humans interact.Depending on the research question, human behavioral data can be used in a variety of ways in HAI research.As an outcome measure, it may provide detailed information about the specific effects that human-animal interaction may have (e.g.how animal-assisted activities impact the social behaviors of children with autism, O'Haire, McKenzie, Beck, & Slaughter, 2013).In addition, HAI researchers may choose to quantify behavior when self-report is not as practical for the research question of interest (e.g. the effect that a therapy dog's presence has on speed and accuracy of children's gross motor skills, Gee, Harris, & Johnson, 2007).As a mediator or moderator of outcomes, human behavior may have a significant effect on the construct being measured (e.g. a handler's affiliative and disciplinary behaviors on cortisol output of working dogs, Horváth, Dóka, & Miklósi, 2008).Finally, human behavioral data can also be used as a measure of treatment fidelity in interventions, in which audio or video recording interactions can allow researchers to verify that intervention providers are adhering to specific protocols while monitoring deviations from treatment content (Bellg et al., 2004).

Animal behavior in HAI
In HAI research, an animal's behavior can be used both to quantify the dynamics of human-animal interaction as well as acquire information about the animal itself such as monitoring animal welfare.In the context of human-animal interactions, an animal's specific directed behaviors towards a human can often provide key information necessary to interpreting results from animalassisted interventions.For example, the display of affiliative behaviors such as physical contact-seeking and eye contact, and the absence of aggressive behaviors such as growling and teeth-baring may be of critical importance to the success of an intervention.At the same time, animal welfare may be monitored with the coding of both negative (e.g., stress), and positive (e.g., play) behaviors during an interaction (Ng, Albright, Fine, & Peralta, 2015).

Coding behavior
Similar to other fields studying behavior, the utility of behavioral data is dependent on how it is coded (Chorney, McMurtry, Chambers, & Bakeman, 2014).Standardized coding tools attempt to systematically classify and quantify behaviors with a coding manual that is tested for reliability across multiple raters (Cone, 1999).The choice of coding tool depends on the research questions and behaviors of interest.If the behaviors of interest only include outcomes related to human behavior and ignore interactions with animals, the investigators may choose to use a standardized behavior coding tool developed in psychology or medicine (e.g., with a standardized pain behavior scale; Vagnoli et al., 2015).If behaviors of interest are animalspecific, HAI researchers will benefit by using ethograms developed in the field of animal behavior (e.g. equid ethogram, McDonnell, 2003).Because behaviors of interest in HAI research often include interactions between human and animals, existing single-species behavior coding tools are often not relevant for these HAI-specific behaviors.
As a response to the lack of standardized coding tools specific to HAI research, many studies have developed their own independent coding tools (e.g., Grandgeorge et al., 2014;Jenkins & Reed, 2013;Kotrschal & Ortbauer, 2003).One example of a coding tool specifically developed for HAI studies is the Observation of Human-Animal Interaction for Research (OHAIRE) coding system, which was developed to cover a range of human behaviors of interest in HAI research (e.g., emotional display, social communication), as well as behaviors specific to interacting with animals (O'Haire et al., 2013).The use of this coding tool in a number of different studies has demonstrated its ease of use, and its comparison with standardized questionnaires has demonstrated strong convergent validity (Guérin, 2017).In addition, the OHAIRE has established inter-rater reliability, which tests for agreement across two or more independent raters who are blinded to the study's conditions and hypotheses (Guérin, 2017).The use of standardized behavior assessment tools with established protocols, validity, and reliability is essential to create comparable results across studies in the field of HAI.
In addition to standardized behavior coding tools, a range of technologies are available for quantifying and interpreting the behavior of both humans and animals in HAI research.Behavioral coding software such as the Noldus Observer allows researchers to synchronize and easily code video data.HAI researchers have successfully used Noldus Observer for a range of topics, including examining pet dogs and their owner's behaviors during an operational task (Kotrschal, Schöberl, Bauer, Thibeaut, & Wedl, 2009) and quantifying the effect of positive human interaction on cow behavior during milking (Ivemeyer, Knierim, & Waiblinger, 2011).New developments in Observer software have also recently allowed for the integration of physiological data with behavioral data (Observer XT; Zimmerman, Bolhuis, Willemsen, Meyer, & Noldus, 2009).Eye tracking analysis, another type of behavioral coding applicable to HAI, has similarly experienced technological advances.Borrowing from infant eyetracking technology, HAI researchers have used increasingly reliable and non-invasive eye-tracking methods with dogs to begin to further interpret the intricacies of the doghuman relationship (e.g., Jakovcevic, Mustaca, & Bentosela, 2012;Téglás, Gergely, Kupán, Miklósi, & Topál, 2012).

Conclusion
In this paper, we have summarized the current state of assessment in humananimal interaction research in three key areas: questionnaires, physiological measures, and behavioral observation.For each assessment area, we have identified concerns with its use, presented examples of its successful application, and provided recommendations for its use in HAI research.
Questionnaires provide invaluable sources of information about the personal experiences of research participants and the animals in their lives.When evaluating constructs that are not specific to HAI research, investigators should first turn to the appropriate standardized assessment methods developed in other research fields.HAI research will continue to benefit from adopting well-validated questionnaires for constructs related to clinical outcomes and human health and wellbeing; doing so will increase generalizability and clinical significance of outcomes.When the unique characteristics of HAI research require investigators to adapt or develop new assessment tools, they should be subject to rigorous psychometric validation, including construct validity in the forms of convergent and discriminant validity.In addition, the development of new questionnaires in the field of HAI needs to be driven by a strong theoretical background and clear research questions.Finally, HAI researchers should take advantage of new and rapidly evolving technologies in questionnaire administration to further refine and develop their assessments.While questionnaires provide critical information from the subjective point of view of research participants in HAI, the use of physiological measures and behavioral observation are important, objective complements to questionnaires.
With physiological assessment, HAI researchers have begun to investigate how human-animal interactions influence biologic responses.Cardiovascular activity, physical activity, electrodermal activity, and endocrine activity have all been measured by HAI researchers aiming to quantify the physiological effects of human-animal interaction.However, these objective assessments are currently not as widely used in HAI as questionnaires (O'Haire et al., 2018), in part because of the additional resources and expenses necessary.As the field of HAI grows, however, so will its need to incorporate methodologically rigorous designs that combine both subjective and objective outcome measures.Through noninvasive saliva collection, HAI researchers can analyze a variety of biomarkers through emerging salivary bioscience technology.In addition, for measures with strong biological similarities between humans and their animal partners (e.g., heart rate, cortisol, actigraphy), future HAI research may benefit from using dyadic analysis to examine the potential bidirectional effects of human-animal interaction while exploring the possibility of physiological synchronicity between humans and animals who share a bond.However, if using physiological methods, researchers must carefully consider the validity, accuracy, and precision of their measurement methods as well as potential confounding variables.
Direct observation of behavior provides valuable complementary data to self-reported measures and physiological assessment.Recording how humans and animals interact with each other is important to understand the observable effects of human-animal interaction.When quantifying animal or human behavior, researchers should consider standardized coding tools readily available for the construct of interest.By utilizing existing behavior paradigms from other fields, standardized coding tools, and employing new technology for coding and behavior, HAI investigators can obtain a better understanding of HAI by capturing the unique and subtle behavioral aspects of dynamic interaction.Similar to physiological assessment, future HAI research may benefit specifically from using dyadic behavioral analysis to examine how human and animal behaviors reflect the strength of a bond or relationship.Finally, HAI researchers should explore emerging technologies in behavior coding to obtain more precise and valuable information.
Regardless of the primary assessment method chosen, HAI researchers should aim to incorporate multiple forms of assessment in their studies to gain the most comprehensive picture of the assessed constructs.By using multiple forms of assessment in a single study, results will increase in validity by minimizing the bias of one single method and developing a more comprehensive and multimodal picture of the construct of interest.
Historically, multimodal assessment has not been a prominent method in HAI research.A recent systematic literature review of animalassisted therapy for youth found that only 24% of 45 studies used more than one method of assessment to measure outcomes.Further, the tendency to use multiple methods was not significantly related with year of publication, indicating that researchers have not adopted this practice more frequently over time (May et al., 2016).While limited funding and resources to adopt multiple assessment methods may be an issue, this practice will further aid in establishing new standards of methodological rigor in the study of humananimal interaction (Esposito et al., 2011;Herzog, 2015;McCune et al., 2014).
Whether collecting subjective or objective data, the consistent use of standardized, well-validated assessment tools is critical in furthering the credibility and strength of the field of HAI.While assessment in HAI research can also be strengthened by such strategies as using large, diverse sample sizes and decreasing sources of error, we argue that the widespread use of multiple valid, reliable, and standardized assessment tools and methods is crucial for creating a rigorous empirical evidence base for the future of the field.