ReviewThe development and assessment of temperament tests for adult companion dogs
Introduction
The ability to select a dog for a particular role, particularly from a very young age, is an attractive idea for breeders and trainers. What might make this a feasible endeavor is the idea that individuals possess stable behavioral tendencies, i.e., they have what has been called “temperament.” Temperament is defined as differences in behavior between individuals that are relatively consistently displayed when tested under similar situations (Diederich and Giffroy, 2006). Using this definition, these differences are considered to be the product of both genetically determined and acquired behavioral traits (Stur, 1987), and therefore the age at which they can be considered to be stable is still debatable. Terms such as “personality” (Gosling and John 1999, Svartberg 2005), “character” (Ruefenacht et al., 2002) and “emotional predispositions” (Sheppard and Mills, 2002) have also been used in the same context. Temperament is made up of traits that are “correlations of internal factors that cause consistent individual differences in behavior” (Eysenck, 1994). In an attempt to identify these traits, interested parties have developed behavioral tests that take multiple measures of the dog’s behavior during a series of shorter tests, or subtests (Ledger, 1997). Often these measures are subjected to factor or principal component analysis, which are data reduction techniques that statistically identify consistently correlated measures within a data set and place them into factors (Goodloe and Borchelt, 1998). The composition of these factors can be used to describe the various behavioral traits exposed by the test and to predict the dog’s behavior in another, similar situation.
Tests to identify particular characteristics of interest, such as “sharpness” and “courage” have been a common feature of working dog associations and breed groups (Willis 1995, Wilsson and Sundgren 1997, Brenoe et al 2002, Ruefenacht et al 2002, Svartberg and Forkman 2002, Courreau and Langlois 2005, Fuchs et al 2005). These might include assessment of the dog’s hunting, tracking, or aggressive ability. Tests have also been developed for the assessment of the suitability of dogs as police (Slabbert and Odendaal, 1999), guide (Pfaffenberger et al 1976, Goddard and Beilharz 1983, Goddard and Beilharz 1984, Goddard and Beilharz 1985, Knol et al 1988, Murphy 1995), therapy (Fredrickson 1993, Schaffer and Phillips 1994), or assistance dogs (Weiss and Greenberg 1997, Weiss 2002, Lucidi et al 2005). Over the past 15 years, interest has increased in the development of tests to specifically determine the suitability of dogs as pets. Many of these tests have focused on the assessment of problem behaviors, particularly those involving aggression, which may be associated with an increasing trend toward legislation to ban supposedly dangerous breeds. The possibility of assessing both undesirable or negative and desirable or positive behavioral traits has also been of particular interest to rescue and re-homing groups (Sternberg, 2002). It is hoped that behavioral assessments conducted in the shelter may then help staff match dogs to potential owners (Ledger, 1997) and/or predict behavior that might be problematic in the new home. The results of such assessments have the potential to directly affect the welfare of the dog, because problem behaviors can result in punishment (Hsu and Serpell, 2003), euthanasia, or (repeated) relinquishment to shelters (Arkow 1994, Miller et al 1996, Salman et al 1998). Similarly, the welfare implications of an inaccurate assessment of potential aggressiveness can be disastrous for humans who encounter the dog.
For these reasons, if not for reasons of academic integrity, it is important that published temperament tests be accompanied by appropriate statistical evidence to support their specific claims, something highlighted by Goodloe (1996). Martin and Bateson (1993) have identified 3 specific measures (reliability, validity, and feasibility) that determine the quality of a behavior test. These measures determine whether a test is a good measure, the right measure, and a useful measure (Appendix A).
Reliability concerns the degree to which the test scores are free from errors of measurement (APA, 1985). To determine reliability, one must identify the consistency of the results across subtests, tests, observers, assessment centers, etc. Measures of reliability include consistency within the observer of the test (intraobserver), between observers (interobserver), within the dog (test-retest), and within components of measures designed to assess the same behavior (internal consistency). Evidence of the consistency, and hence the predictability, of the dog’s behavior is what differentiates a temperament assessment from a behavioral one, although this fact is not always explicitly stated (Hsu and Serpell, 2003). Demonstration of test-retest reliability is therefore key for a temperament test (Marston and Bennett, 2003). Additionally, if tests are not reliable, they will not be valid (Diederich and Giffroy, 2006).
Validity concerns the appropriateness, meaningfulness, and usefulness of the specific inferences made from the test results (APA, 1985). Temperament tests need to ensure that they are actually assessing the trait(s) of interest (e.g., fearfulness) if they are to be valid. Validity assessments for temperament tests are fraught with difficulty, because it is unlikely that any test will be wholly predictive of a dog’s behavioral reaction in any given circumstance. The aim of a temperament test is therefore to improve our knowledge of the dog and its likely future behavior above that of chance alone. The probability of this goal being achieved increases with limited context.
Finally, the quality of temperament tests must also address issues of practicality and appropriateness for widespread or commercial use, whether this use is in rescue shelters or in breeding and training establishments. Tests that are impractical, overly long, and difficult to assess are unlikely to be performed accurately or reliably, if at all. Accordingly, a scientifically developed test will often require refinement for practical use.
For test developers, two additional considerations need to be made in order to ensure that a test is reliable, valid, and feasible: consideration of the purpose of the test and standardization of the test procedure. If the goals of the temperament test are not clearly identified (i.e., the aspects of temperament that the testers wish to identify are not explicitly stated), then it is unlikely that the test will be valid. The next step in the development process is the selection of appropriate tests and corresponding scores for the dog’s behavior. If this stage is not standardized and formalized, it is unlikely that the test will be reliable. It is important that these two additional prior requirements be fulfilled before the test developers can proceed to assessment of reliability and validity.
Jones and Gosling (2005) and Diederich and Giffroy (2006) have both recently reviewed temperament assessments in dogs. Jones and Gosling (2005) broadly considered the issues of reliability and validity for all forms of temperament assessment, including those derived from individual-based and general questionnaires, but left open the question of the quality of specific temperament tests used in practice. Nonetheless, they found evidence that the issue of reliability, in particular, had been poorly addressed, and evidence for validity was low for tests conducted on young dogs. Diederich and Giffroy (2006) specifically highlighted the lack of standardization of temperament tests for a range of dog roles.
The initial aim for our paper was to review in detail the extent to which temperament tests specifically for adult pet dogs had demonstrated reliability, validity, and feasibility. Our search involved a Pub Med and Science Direct search using the terms “dog” or “canine,” “temperament” or “behavior(u)r,” and “test.” Only six papers relating to primary research were revealed in this search of the peer-reviewed literature. Van der Borg et al. (1991) and Hennessy et al. (2001) described tests to predict a range of problem behaviors in rescue dogs. De Palma et al. (2005) described a test to assess general temperament and re-homing suitability of rescue dogs. Netto and Planta 1997, Van den Berg et al 2003, and Kroll et al. (2004) described tests specifically to assess aggression in pet dogs. A number of other tests have been reported in conference proceedings (particularly those of the International Veterinary Behavior Meetings and the Companion Animal Behaviour Therapy Study Group) but have not been reported formally in the literature. This number includes tests for specific problem behaviors (McPherson and Bradshaw 1998, Notari et al 2005) and general temperament in rescue dogs (Heidenberger 1993, Ledger and Baxter 1997, Marder et al 2003, Mondelli et al 2003). The lack of publication is disappointing, because it is well known that many shelter organizations have also devised their own temperament tests (Sternberg, 2002). Given the lack of relevant, data-based scientific publications and the problems identified by other reviewers of this procedure, it is appropriate to review and reiterate the process of valid test development in order to provide a benchmark for future test developers. This paper reviews the range of evaluations required before claims can be made about either reliability or validity with the intent to guide future research and test developers. This process is broken down into identification of the purpose and content of the test, standardization, assessment of reliability and validity, and refinement for practical use, or feasibility (See Appendix A for definitions and Appendix B for key points for each of these.).
Section snippets
Purpose and content of the temperament test
The first step in creating a valid and useful temperament test is careful consideration of its purpose (Appendix B). Test developers need to first consider why they want a temperament test. What behaviors or traits should the test reveal (e.g., fearlessness), and what behaviors and traits should it avoid revealing (e.g., aggressiveness, stress-related responses)? Determination of the purpose of a test is key to determining the method to be used to reveal the properties under investigation
Standardization
For a test to have any chance of being reliable and valid, standardization of the test procedure is a minimum requirement. Standardization relates to the extent to which a protocol for carrying out the test is provided and consideration for minimization of variability between tests has been made. In standardization, all potential sources of variability need to be identified and controlled for so that the only variable is the dog’s response (Diederich and Giffroy, 2006). Considerations for
Intra-observer reliability
Intra-observer reliability measures the consistency of the reports of a single observer (Martin and Bateson, 1993). In theory, the observer’s assessments should report similarly when the same dog is tested using the same test on another occasion. However, in order to control for behavioral changes on the part of the dog, rather than by the observer, it is recommended that intra-observer reliability be assessed by the use of video recordings (Martin and Bateson, 1993). In this way, the observer
Content validity
Content validity evaluates whether the test measures what it should and whether the components of the measure cover all aspects of the behavior in question. Face validity is one aspect of content validity and refers to the subjective assessment of whether the item appears to be measuring the variable it claims to “on the face of it” (Eiser and Morse, 2001). For example, van den Berg et al. (2003) performed a principal components analysis on all the behaviors shown by the dogs in their
Feasibility for practical use
The ultimate aim of many temperament tests for pet dogs is that interested groups can perform the test themselves and make use of the results (Ledger and Baxter, 1997). Accordingly, the test needs to be standardized and short, easy to perform, and amenable to easily recording the dog’s response. Many of the tests reviewed here may be prohibitively long for practical use in a working environment like a shelter (Hsu and Serpell, 2003), taking one hour per dog (Planta et al 1991, Ledger and Baxter
Conclusion
Fewer than ten reports of temperament tests specifically for the selection of suitable adult dogs as pets could be found in the peer-reviewed scientific literature. Even among these, the reports of reliability, validity, and feasibility are not complete, with authors typically reporting on one, but not all, aspects. The absence of reports of the methodology, reliability, and validity of temperament tests for dogs in general has been noted by a number of authors (Hsu and Serpell 2003, Marston
Acknowledgments
This paper formed part of a wider review of approaches to the assessment of temperament, welfare, and quality of life in kenneled dogs commissioned by Dogs Trust, U.K., and we are indebted to this organization for its support of this work. The first author was supported by this charity to undertake these reviews. We would also like to thank members of the Dogs Trust “quality of life working party” for their support and comments: Jon Bowen, John Bradshaw, Keith Butt, Rachel Casey, Philip
References (83)
- et al.
Chronic stress in dogs subjected to social and spatial restriction In: Behavioural responses
Physiol. Behav.
(1999) - et al.
Estimates of genetic parameters for hunting performance traits in three breeds of gun hunting dogs in Norway
Appl. Anim. Behav. Sci.
(2002) - et al.
Genetic parameters and environmental effects which characterise the defence ability of the Belgian shepherd dog
Appl. Anim. Behav. Sci.
(2005) - et al.
Behavioural testing in dogs: A review of methodology in search for standardization
Appl. Anim. Behav. Sci.
(2006) - et al.
A method for rating the individual distinctiveness of domestic cats
Anim. Behav.
(1986) Temperament testing procedures for animals involved in nursing home, school and hospital visiting programs through Delta Society Pet Partners
Appl. Anim. Behav. Sci.
(1993)- et al.
Genetics of traits which determine the suitability of dogs as guide dogs for the blind
Appl. Anim. Ethol.
(1983) - et al.
The relationship of fearfulness, sex, age and experience on exploration and activity in dogs
Appl. Anim. Behav. Sci.
(1984) - et al.
Individual variation in agonistic behaviour in dogs
Anim. Behav.
(1985) - et al.
Plasma cortisol levels of dogs at a county animal shelter
Physiol. Behav.
(1997)
Behaviour and cortisol levels of dogs in a public shelter, and an exploration of the ability of these measures to predict problem behaviour after adoption
Appl. Anim. Behav. Sci.
Influence of male and female petters on plasma cortisol and behaviour: can human interaction reduce the stress of dogs in a public animal shelter?
Appl. Anim. Behav. Sci.
Temperament and personality in dogs (Canis familiaris): a review and evaluation of past research
Appl. Anim. Behav. Sci.
Fear of novel and startling stimuli in domestic dogs
Appl. Anim. Behav. Sci.
Avoidance reactions of domestic dogs to unfamiliar male and female humans in a kennel setting
Appl. Anim. Behav. Sci.
Ethotest: A new model to identify (shelter) dogs’ skills as service animals or adoptable pets
Appl. Anim. Behav. Sci.
Behaviour patterns and time course of activity in dogs with separation problems
Appl. Anim. Behav. Sci.
Re-forging the bond-towards successful canine adoption
Appl. Anim. Behav. Sci.
Olfactory and visual cues in the interaction systems between dogs and children
Behav. Proc.
Describing categories of temperament in potential guide dogs for the blind
Appl. Anim. Behav. Sci.
Behavioural testing for aggression in the domestic dog
Appl. Anim. Behav. Sci.
Proceedings of the Dogs Trust Meeting on Advances in Veterinary Behavioural Medicine London, 4th–7th November 2004: Veterinary behavioural medicine: a roadmap for the 21st century
Vet. J.
A comparison of dog-dog and dog-human play behaviour
Appl. Anim. Behav. Sci.
A behaviour test on German Shepherd dogs: heritability of seven different traits
Appl. Anim. Behav. Sci.
The Tuskagee behaviour test for selecting therapy dogs
Appl. Anim. Behav. Sci.
Development and validation of a novel method for evaluating behaviour and temperament in guide dogs
Appl. Anim. Behav. Sci.
Early prediction of adult police dog efficiency—a longitudinal study
Appl. Anim. Behav. Sci.
Shyness-boldness predicts performance in working dogs
Appl. Anim. Behav. Sci.
A comparison of behaviour in test and in everyday life: evidence of three consistent boldness-related personality traits in dogs
Appl. Anim. Behav. Sci.
Personality traits in the domestic dog (Canis familiaris)
Appl. Anim. Behav. Sci.
Consistency of personality traits in dogs
Anim. Behav.
Behavioural testing dogs in animal shelters to predict problem behaviour
Appl. Anim. Behav. Sci.
A friend or an enemy?Dogs’ reaction to an unfamiliar person showing behavioural cues of threat and friendliness at different times
Appl. Anim. Behav. Sci.
Service dog selection tests: effectiveness for dogs from animal shelters
Appl. Anim. Behav. Sci.
Male and female dogs respond differently to men and women
Appl. Anim. Behav. Sci.
The use of a behaviour test for the selection of dogs for service and breeding I: Method of testing and evaluating test results in the adult dog, demands on different kinds of service dogs, sex and breed differences
Appl. Anim. Behav. Sci.
Behaviour test for eight-week old puppies – heritabilities of tested behaviour traits and its correspondence to later behaviour
Appl. Anim. Behav. Sci.
A new look at pet “over-population”
Anthrozöos.
Measuring Health: a review of quality of life measurements scales, 2nd ed
Constructing validity: Basic issues in objective scale development
Psychol. Assess.
Cited by (130)
Mind your language! Lessons from the application of an English published version of a Japanese horse personality instrument to a French population
2024, Applied Animal Behaviour SciencePrediction of working outcomes in trainee dogs using the novel Assistance Dog Test Battery (ADTB)
2024, Applied Animal Behaviour ScienceObservational behaviors and emotions to assess welfare of dogs: A systematic review
2024, Journal of Veterinary BehaviorInvolving caregivers in behavioural research: A SWOT analysis of two citizen science research methodologies to study cat-cat interactions at home
2024, Applied Animal Behaviour ScienceWhat caregivers don't tell you. A comparison between survey responses and home videos of cat-cat interactions
2023, Applied Animal Behaviour ScienceAssistance dog selection and performance assessment methods using behavioural and physiological tools and devices
2022, Applied Animal Behaviour Science