Reducing consistency in human realism increases the uncanny valley effect; increasing category uncertainty does not

Human replicas may elicit unintended cold, eerie feelings in viewers, an effect known as the uncanny valley.

Theories to explain the uncanny valley are not lacking. They range from the biological to the cultural (reviewed in MacDorman, Green, et al., 2009;MacDorman & Ishiguro, 2006a, 2006bMacDorman, Vasudevan, & Ho, 2009;Pollick, 2010). Rather, evidence is insufficient to decide among them. For example, Mori (1970Mori ( /2012 proposed that the uncanny valley effect is a survival instinct, an aversive response to proximal threats like dead or diseased bodies and dangerous species of animals. We have further developed Mori' uncanny valley to the detection and avoidance of potential vectors of infection or of infertile or less fit mates (e.g., Neanderthals, MacDorman, Green, et al., 2009;MacDorman & Ishiguro, 2006a, 2006bMacDorman, Vasudevan, et al., 2009;Moosa & Ud-Dean, 2010). We have also proposed the uncanny valley may result from inconsistency in the realism of an anthropomorphic entity's features (MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009). Perhaps the earliest explanation of uncanniness in human-looking entities is category uncertainty, in particular uncertainty about whether an entity is real or human (Jentsch, 1906(Jentsch, /1997. This study experimentally examines whether the uncanny valley effect is increased by category uncertainty (Burleigh & Schoenherr, 2014;Burleigh et al., 2013;Green, MacDorman, Ho, & Vasudevan, 2008;Jentsch, 1906Jentsch, /1997Kang, 2009;MacDorman, Vasudevan, et al., 2009;Yamada, Kawabe, & Ihaya, 2013) or, as an alternative theory, by realism inconsistency (MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009;Meah & Moore, 2014;Mitchell et al., 2011;Moore, 2012). Category uncertainty denotes an inability to determine the category to which an entity belongs, such as whether a face is that of a real human being or a 3D computer model. Realism inconsistency denotes a mismatch in the realism of an entity's features, such as some facial features appearing human and others nonhuman (e.g., computer-animated skin paired with real eyes and mouth). We chose to evaluate category uncertainty and realism inconsistency theories because of their prominence in the literature on the uncanny valley.

Categorical perception
Although Mori proposed the uncanny valley in 1970, Jentsch, as early as 1906, developed a theory identifying category uncertainty as the cause of uncanniness (Jentsch, 1906(Jentsch, /1997MacDorman & Ishiguro, 2006a). He asserts that eerie feelings are most reliably elicited by uncertainty about whether an entity is inanimate or animate, or whether it is nonhuman or human. Category uncertainty occurs whenever an entity transitions from one category to another, qualitatively distinct category by a quantitative metric-for example, a fertilized ovum transitioning to a person by the metric developmental chronology (Ramey, 2005). Mori's (1970Mori's ( /2012 graph depicts industrial robot transitioning to healthy person by the metric human likeness. Owing to categorical perception, small changes along the continuum between two categories should appear much larger than equal-sized changes within either category (Beale & Keil, 1995;Campbell, Pascalis, Coleman, Wallace, & Benson, 1997;Etcoff & Magee, 1992;Harnad, 1987;Iverson & Kuhl, 1995). This phenomenon is also known as the perceptual magnet effect (Feldman, Griffiths, & Morgan, 2009). Near the category boundary, the increased salience of these changes could make them jarring. Categorical perception has been found on a continuum from 3D computer models to photographs of real people (Cheetham, Pavlovic, Jordan, Suter, & Jäncke, 2013;Cheetham, Suter, & Jäncke, 2011;Looser & Wheatley, 2010). Beyond the effects of categorical perception, transitions along nonhumanhuman continua could be disturbing because they undermine the separation between what we identify as us (e.g., human, person) and what we identify as not us (e.g., 3D model, robot, ovum;MacDorman & Entezari, 2015;MacDorman, Vasudevan, et al., 2009;Ramey, 2005).

Cognitive dissonance
The negative emotional appraisal of the uncanny valley has been identified with psychological discomfort caused by a conflict between the belief that an entity is human and the belief that the same entity is not human (Hanson et al., 2005;MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009;Tondu & Bardou, 2011). If nonhuman and human are conceived as distinct and mutually exclusive categories, entities whose appearance gradually transitions from nonhuman to human, as in Mori's (1970Mori's ( /2012 graph, must cross a category boundary. An entity crossing the boundary could at once elicit two mutually exclusive concepts (Moore, 2012)-or even oscillate between them as its appearance changes. Repeated nonconscious elicitation and conscious suppression of the concept human could interfere with empathy (Misselhorn, 2009).

Categorization difficulty
Another explanation of the uncanny valley effect is that difficulty in categorizing ambiguous entities results in the formation of negative impressions (Yamada et al., 2013). Thus, categorization difficulty predicts that the most ambiguous representations are perceived as the least likeable. Categorization difficulty (i.e., low processing fluency) is operationalized as longer response times during a categorization task.

Limited investigation of category uncertainty
Although the categorical perception of entities lying on a human likeness or animacy continuum has been established (e.g., Cheetham et al., 2011;Looser & Wheatley, 2010), the effect of categorical perception on the viewer's emotional appraisal of an entity was examined only recently (Yamada et al., 2013). In Yamada and colleagues' study, intermediate morphs between a real, hand-drawn, and stuffed-toy human face elicited the longest categorization latency and the lowest ratings of likeability.
The study, however, is not without limitations. First, it does not directly examine the uncanny valley in the domains where it is typically identified: humanoid robotics and 3D computer animation. Second, the study does not rule out a potential extraneous cause of negative emotional appraisals of ambiguous representations: morphing artifacts from feature misalignment (cf. in Fig. 2 of Yamada et al., 2013, one face has two noses). Third, the findings do not indicate whether the faces were uncanny, because the only dependent variable measured was likeability. Fourth, operationalizing the y-axis of Mori's graph as likeability could have confounded it with the x-axis of human likeness because these measures are highly correlated (e.g., r = .73, p < .001, in Ho & MacDorman, 2010). Cheetham et al. (2014) were unable to find support for the categorization difficulty theory. Unfortunately, the only dependent variable measured was subjective familiarity.

Realism inconsistency theory
To explain the uncanny valley effect, we have developed an alternative theory to category uncertainty-realism inconsistency. Realism inconsistency theory predicts that features at inconsistent levels of realism in an anthropomorphic entity cause perceptual processes in viewers to make conflicting inferences regarding whether the entity is real. Such inconsistency could violate neurocognitive expectancies, resulting in large feedback error signals (Friston, 2010;Rao & Ballard, 1999;Saygin, Chaminade, Ishiguro, Driver, & Frith, 2012). Prediction error could lead to a negative emotional appraisal and avoidance behavior (Cheetham et al., 2011;MacDorman & Ishiguro, 2006a, 2006b. Prior research has found inconsistent realism in an entity's features increases reported eeriness (e.g., in eyes and skin, MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009; or in voice and appearance, Mitchell et al., 2011).

Potential causes
The realism inconsistency theory predicts viewers will experience cold, eerie feelings when perceiving anthropomorphic entities that have features at different levels of realism. An artifact that is designed to appear human but fails to be indistinguishable from human in every feature is likely to have features that are inconsistent in their level of realism, because any discrepancy from human was unintended and thus beyond the designer's control. Therefore, computer-animated characters-or android robots-that are recognizable as such are inherently realism inconsistent.
A potential source of cold, eerie feelings in perceiving an entity that possesses both human and nonhuman features is category prediction error, which could have several potential causes: (a) human morphological features elicit neurocognitive expectances of behavioral responses that align with human norms; these expectancies are then violated (MacDorman & Ishiguro, 2006a, 2006b; (b) the brain's categorizations of the entity's features conflict when they are integrated during the perception and recognition of the entity as a whole-for example, human appearance coupled with nonbiological movement in an android, or living appearance coupled with coldness or stiffness in an embalmed body; (c) the human features may be processed by brain areas that are rapid, efficient, and specialized, such as the fusiform face area or the extrastriate body area, while the nonhuman features may be processed by brain areas that are slower and more general; this results in competition among brain areas (cf. Hanson et al., 2005;James et al., 2015); (d) if some features are processed more rapidly than others, information flows that are typically integrated simultaneously in the perception of the whole entity could lag; and (e) the 'overtraining' of neural networks for human face and body recognition through a lifetime of exposure to other people could sensitize them to even small deviations from human norms. Thus, based on (a) to (e), nonhuman features in a humanlike entity could elicit large feedback error signals, measurable as increased activity in the cortical hierarchy (Saygin et al., 2012).

Differences from Jentsch's theory
Jentsch attributes uncanniness to a person's general doubt about what something is (e.g., is it human or nonhuman?). The source of doubt could simply be missing information, such as when darkness obscures an object. Realism inconsistency theory instead attributes uncanniness to conflicting-not missing-perceptual cues, which cause prediction error in or between brain areas engaged in automatic, stimulus-driven perceptual processing (Moore, 2012;Pollick, 2010;Saygin et al., 2012). Thus, the origin of the eerie feeling could be preconscious and subpersonal.

Differences from cognitive dissonance
Given that perceptual prediction error can occur without engaging the perceiver's reflection and deliberation, the realism inconsistency theory differs from cognitive dissonance in which conflicting beliefs are aligned by deliberately adjusting them (Bell, 1967;Gawronski & Strack, 2004). Moreover, the uncanny valley has by definition the experiential qualities of the uncanny; it involves a characteristic range of affective and other responses, from uneasiness, strangeness, and creepiness to revulsion, fright, and terror (Mori, 1970(Mori, /2012. It is associated with, but not equivalent to, the basic emotions of fear, anxiety, and disgust  and may have eeriness and spine-tingling subfactors (Ho & MacDorman, 2010). By contrast, cognitive dissonance is psychological discomfort (operationalized as uncomfortable, uneasy, and bothered, Elliot & Devine, 1994). It does not necessarily lead to the cold, eerie feeling associated with the uncanny valley.

Not a general phenomenon
Contrary to competing theories, the uncanny valley is not a general phenomenon like category uncertainty (Burleigh et al., 2013;Yamada et al., 2013) or cognitive dissonance (Tondu & Bardou, 2011). These phenomena apply to all categories. However, in computer animation, the uncanny is identified with inadequately lifelike human features (e.g., 'dead eyes') and human characters (e.g., the children, not the train, in The Polar Express, 2004, and the baby, not the toys, in Tin Toy, 1988; Butler & Joschko, 2009;Freedman, 2012). Likewise, in everyday life people are more disturbed by lesions and deformities in human beings and their animal cousins than in less anthropomorphic animals and in plants (e.g., compare a leprous man with a rusty old train car or a blighted chestnut tree; MacDorman, Green, et al., 2009;MacDorman & Ishiguro, 2006a, 2006bMacDorman, Vasudevan, et al., 2009). Category uncertainty and cognitive dissonance can only address the uncanny valley if buttressed by additional constructs (e.g., self-identification). Nevertheless, the above observations are anecdotal and require systematic examination.
This study focuses on refuting category uncertainty as an explanation of the uncanny valley and offers realism inconsistency as an alternative. However, realism inconsistency is not meant to offer a complete explanation. For example, elsewhere we have proposed that a human replica could inadvertently elicit a threatavoidance mechanism, such as a mechanism to prevent intimate contact with infertile mates or dead and diseased bodies MacDorman & Ishiguro, 2006a, 2006b. Such a mechanism could, for example, activate the fear or disgust system (Curtis, Aunger, & Rabie, 2004;MacDorman & Entezari, 2015).

Hypotheses
Our hypotheses are based on the realism inconsistency theory, which predicts that the uncanny valley effect is caused by an entity possessing features, not all of which are perceived as belonging to a real living anthropomorphic being. The uncanny valley effect is not predicted to be caused by uncertainty about the category to which an entity belongs. We operationalize the uncanny valley effect as higher eeriness ratings and lower warmth (vs. coldness) ratings (Ho & MacDorman, 2010;MacDorman & Entezari, 2015).
From the standpoint of experimental control, it is difficult to investigate transitions along a human similarity dimension. For example, there is no particular way that something half camera and half man should appear, or half bird and half woman. Therefore, this study investigates animacy (i.e., living vs. inanimate) for human beings and nonhuman animals and realism (i.e., real vs. computer-animated) for human beings, nonhuman animals, and nonanimal objects (hereafter, humans, animals, and objects). The selection of these two dimensions, animacy and realism, enables a transition without image artifacts between photographs of real people, animals, and objects and their 3D computer models.
H1. Category uncertainty does not cause the uncanny valley effect.
H1 is operationalized as follows: The most ambiguous entities neither elicit the most eeriness nor the least warmth. This is the negation of the prediction of Yamada et al. (2013), except they only measured likeability. Both eeriness and warmth are measured because analogous concepts are stressed in Mori (1970Mori ( /2012., bukimi and shinwakan).
Although not a hypothesis in this study, human and animal models are expected to be eerier and colder than their real counterparts (MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009). There are at least two explanations of this: First, the computer modeling process itself introduces extraneous realism inconsistency, because the modeler has the goal of making facial features appear completely real but achieves it to varying degrees depending on the particular feature (e.g., because eyes are harder to model than skin). Second, we have not ruled out other explanations of why 3D human and animal models elicit unintended cold, eerie feelings (e.g., by activating a threatavoidance mechanism, MacDorman & Entezari, 2015).
H2. Reducing consistency in the realism of features increases the uncanny valley effect, but only in anthropomorphic entities.
H2 is operationalized as follows: If two sets of features are manipulated to reduce feature consistency, the entity elicits higher eeriness and lower warmth than without the manipulation. The effect is only predicted for high and intermediate levels of anthropomorphism (high: humans, intermediate: animals, and low: objects). H2 has already been supported by experiments showing that inconsistency in the human realism of an entity's features increases its eeriness (e.g., skin texture and eye texture or size, MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009;Seyama & Nagayama, 2007;face and voice, Meah & Moore, 2014;Mitchell et al., 2011). Nevertheless, H2 has not been explored in the context of category uncertainty: It is important to determine whether uncanny valley effects caused by reduced consistency in realism can be separated from category uncertainty.
The experiment used in this study presented representations lying on three transitions from each entity's 3D computermodeled replica (computer animated) to the original (real). For the control transition, all features were varied uniformly. For this transition, H2 predicts elicited eeriness to decrease, and warmth to increase, as representations approach fully real. Eeriness is lowest, and warmth highest, for fully real representations because they are also fully consistent in their realism. As shown in Fig. 1, for the control transition, eeriness is predicted to follow an inverse logistic function, based on a Bayesian model of the perceptual magnet effect (Feldman et al., 2009;Moore, 2012). The logistic function and its inverse have been found to fit eeriness and other evaluative ratings of humanlike faces along transitions from computer animated to real (Looser & Wheatley, 2010;Fig. 11 in MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009). For the two consistency-reduced transitions, the realism of one set of features was increased while the realism of a complementary set of features was decreased. This manipulation is predicted to increase eeriness and decrease warmth proportionally to the reduction in consistency. The increase in eeriness caused by consistency reduction is predicted to follow a Gaussian function, based on a Bayesian model of perceptual tension caused by conflicting cues (Moore, 2012). The Gaussian function has been found to fit eeriness ratings along other consistency-reduced transitions (Fig. 11, MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009). For the consistency-reduced transition, H2 predicts eeriness to follow a polynomial function-an additive combination of (1) the logistic function representing eeriness along the control transition and (2) a Gaussian function representing the increase in eeriness caused by consistency reduction.
H3. The more anthropomorphic the entity, the more reduced consistency in feature realism increases the uncanny valley effect.
H3 is operationalized similarly to H2. To investigate whether the uncanny valley effect is a general phenomenon or specific to anthropomorphic entities, we compared the effects of reduced consistency in feature realism at three levels of anthropomorphism on eeriness and warmth with the expectation of higher eeriness and lower warmth at higher levels of anthropomorphism. H3 is suggested by the anecdotal observation that physical defects in human beings are more disturbing than physical defects in other species or objects (MacDorman & Ishiguro, 2006a, 2006b).

Design
The experiment follows a within-group design, with unique participants observing one set of entities (4-5) in each of the three rounds of experiments. The experiment consists of a twoalternative forced choice categorization task and a survey in which the task stimuli are rated.

Participants
Participants were recruited by electronic mail from a randomized exhaustive list of undergraduates attending a Midwestern public university system. Based on the inclusion criteria, participants self-reported as fluent or native English speakers, aged 18 or older, with 20/40 vision or better with correction. Participants received no compensation. This study was approved by the Indiana University Office of Research Administration (IRB Study No. 1210009909).
A total of 548 participants were recruited. The task stimuli were distributed among unique participants in three rounds of experiments: In the first round, Zlatko, Ingrid, dog, and parrot were presented to 224 participants (Mdn age = 21, IQR age = 3, 67% female); In the second round, Clint, Emelie, Juliana, Simona, and Ferrari were presented to 181 participants (Mdn age = 21, IQR age = 4, 62% female); and in the third round, camera, washer, and water lily were presented to 143 participants (Mdn age = 22, IQR age = 4, 56% female).

Stimuli
The task stimuli were 600 by 600 pixel images derived from photographs of 12 entities: six human beings (high anthropomorphism group), two nonhuman animals (intermediate anthropomorphism group), and four nonanimal objects (low anthropomorphism group). Fig. 2 shows the right half of the real photographs and the left half of the 3D computer models developed from those photographs. The two men are Clint (30s) and Zlatko (80s). The four women are Emelie (30s), Ingrid (20s), Simona (20s), and Juliana (20s). The dog is a Saint Bernard, and the parrot is a lovebird. The car is a Ferrari 599 GTB, the camera is a Nikon D3100, the washing machine is a Servis W712F4W, and the flower is a water lily. Apart from Zlatko, young white people were selected for the human group to reduce outgroup effects, because the sample frame was mainly young and white. Simona and Julia were added to the study to determine whether professional character modeling and cosmetic artistry could reduce the eeriness of the human computer models relative to their originals. Ingrid was rendered with a shiny doll-like texture for the same reason. Zlatko was used to determine whether a very old man could conversely increase the eeriness of the original relative to the computer model. The heterogeneity of the animal and object groups relative to the human group was intended to represent the diversity of species and objects in those groups and, thus, enhance the generalizability of the findings.
The photographs of the people were cropped to show only the central region of the face, from the eyebrows to the mouth, and a roughly equivalent region of the animals. This region was selected to exhibit in detail the most salient features of the face.
The photographs were sourced from 3D.sk, 123rf.com, and a private collection. The first four computer models of humans were created by Singular Inversions FaceGen. The remaining models were created by hand using Autodesk Maya, Pixologic ZBrush, and Adobe Photoshop.
For each of the 12 entities a 3D replica was modeled. Fig. 3 illustrates for humans how a total of 17 representations were derived from each entity and its computer model: a transition by sixths from computer model to the original; a transition by thirds from computer model to computer model with original eyes, eyelashes, and mouth (upper left) to original; and a transition by thirds from computer model to original with computer-modeled eyes, eyelashes, and mouth (lower right) to original. The eyes, eyelashes, and mouth were selected as the foreground feature set because of their importance in judgments of animacy (Looser & Wheatley, 2010). The foreground feature sets for nonhuman entities are as follows: for the dog, the eyes, mouth, and tongue; for the parrot, the eyes, eyelids, beak, cere, and left nostril; for the camera, the lens mount, reflex mirror, and other parts visible behind the lens; for the car, the headlights and front grilles; for the washing machine, the door, door handle, and window; for the flower, the stamen, stigma, and style.
All representations were created by placing a photograph of a real human, animal, or object on a layer above a render of a precisely aligned 3D computer model of the same entity and then varying within the photograph the opacity of regions corresponding to the foreground and background feature set by thirds or sixths. Because the features of the computer model were aligned with those of the original, transitions were made without morphing, solely by changing image opacity.

Procedure
Participants completed each round of the experiment through an access-controlled website. Participants' activities were ordered as follows: (1)  (3) the categorization tasks; (4) a demographics survey; (5) task stimuli ratings (see Dependent Variables); and (6) a debriefing, which explained the purpose and methods of the study. Based on six test participants, each round was estimated to require approximately 15-30 min to complete. The categorization task divides into two blocks, each of which corresponds to a single dimension: either realism or animacy. For realism, the task is to categorize the face as either computer animated or real; and for animacy, as either inanimate or living. Each of these pairs of anchors was taken from the first item of the corresponding index of the face survey, described below. The first two trials of each block are practice trials. There are 136 actual trials per block (4 entities Â 17 representations Â 2 repetitions). For each participant, block order, trial order, and left-right anchor order are randomized. Given this study's lengthy and repetitive procedures, each participant was exposed to a subset of 3, 4, or 5 of the 12 entities to prevent fatigue effects and attrition.
Potential confounds caused by the presentation of only a subset of entities were tested and ruled out using a linear mixed-effects model with entity as a random factor (Supplementary Material, Section 1). Potential confounds caused by the categorization task preceding the task stimuli ratings were also tested and ruled out (Supplementary Material, Section 2). A rationale for the tradeoffs made in the research design is provided (Supplementary Material, Section 3).
Each trial begins with the presentation of a task stimulus. To the left of the stimulus appears an anchor (e.g., real) with the instruction ''Press e" and to the right the opposing anchor (e.g., computer animated) with the instruction ''Press i." Pressing e indicates the stimulus belongs to the category signified by the anchor on the left and i for the right. The participant is instructed to categorize the face as quickly and as accurately as possible. The stimulus remains visible until one of the following events occurs: The participant categorizes the stimulus by pressing e for left or i for right, pauses the task by pressing the spacebar, or allows 3000 ms to elapse. (A 3000 ms cutoff is used to discourage deliberation.) Next, three masks are presented in sequence for 100 ms each to suppress the afterimage of the stimulus: a 19-by-19 checkerboard, a 38-by-38 checkerboard, and a 50% gray panel. If an e or i key press is registered, the masks serve as a 300-ms intertrial interval, and the next trial begins. Otherwise, the participant resumes the task by pressing the spacebar, the stimulus is randomly inserted into the remaining trials, and the next trial begins. For each trial, response times (RTs) are recorded as the elapsed time in milliseconds from the onset of the face stimulus until an e or i key press is registered. Fig. 3. The diagonal depicts a consistent change in the objective realism (fraction of real) of all features of an entity, from the 3D model to the original. The lower-right and upper-left paths depict an inconsistent change in which the objective realism of one feature set (e.g., eyes, eyelashes, and mouth) changed first and then the other (e.g., skin, nose, and eyebrows).
The task representations ratings require each participant to rate the representations on four dimensions: realism, animacy, warmth, and eeriness (see Dependent Variables). There were 68 stimuli (4 entities Â 17 representations) in the first round of the experiment, 85 stimuli (5 entities Â 17 representations) in the second, and 51 (3 entities Â 17 representations) in the third.

Independent variables
The independent variables are anthropomorphism and fraction of real. Anthropomorphism has three levels: high for humans, intermediate for animals, and low for objects. Fraction of real operationalizes the concept of transition in human realism, a form of human similarity. Two kinds of transitions were tested: the control transition (the diagonal in Fig. 3) and the consistency-reduced transitions. Two kinds of consistency-reduced transitions were tested: with real foreground feature set at the midpoint (upper left) and with computer-modeled foreground feature set at the midpoint (lower right).
Each of these three transitions has seven levels, which are its fraction of real. These levels enable the objective realism of an entity to be determined. For the control transition, the levels are 0, ⅙, ⅓, ½, ⅔, ⅚, 1. The fraction indicates the proportion of the entity that is original (e.g., ⅙ means ⅙ original, ⅚ computer modeled). For the lower-right consistency-reduced transition, the foreground-background levels are 0-0, 0-⅓, 0-⅔, 0-1, ⅓-1, ⅔-1, and 1-1, and for the upper-right consistency-reduced transition, 0-0, ⅓-0, ⅔-0, 1-0, 1-⅓, 1-⅔, and 1-1. The pairing of control representations on the diagonal with corresponding consistencyreduced representations on the upper-left and lower-right transitions is justified because each control representation is identical to the average of its paired consistency-reduced transitions.
Because the three transitions shared endpoints, 17 representations of each entity (5 exclusively belonging to each of the three transitions and 2 common across all three transitions) were  Fig. 4. For humans, animals, and objects, the percentage of times the stimulus was categorized as real (vs. computer animated) is plotted against its fraction of real for the control (diagonal) and reduced-consistency transitions (lower right, upper left). The stimulus whose categorization is closest to 50% is defined as most ambiguous (unfilled circle) and whose categorization is closest to 0% or 100% most certain (diamond). The error bars in Figs. 4-8 indicate the 95% confidence interval of the true mean.
presented for both the categorization task and the task stimuli ratings. This ensured that the shared endpoints were not repeatedly presented. To minimize habituation the three different transitions were tested in one block instead of in three isolated blocks.

Dependent variables
For the real vs. computer animated categorization, the dependent variables for each stimulus are percentage categorized as real and response time. For the task stimuli ratings, the dependent variables are subjective ratings on indices of realism, warmth, and eeriness. For the living vs. inanimate categorization, which was used only in the first round of experiments, the dependent variables for each stimulus are percentage categorized as living and response time. In addition to realism, warmth, and eeriness, in the first round, task stimuli were also rated on an animacy index. Each index used three 7-point semantic differential items (i.e., left anchor term, very, moderately, slightly, neutral, slightly, moderately, very, right anchor term): For realism, computer animated-real, replica-original, and digitally copied-authentic; for animacy, inanimate-living, inert-alive, and without definite lifespan-mortal; for warmth, cold-hearted-warm-hearted, hostile-friendly, and grumpy-cheerful; for eeriness, ordinary-creepy, plain-weird, and predictable-eerie. Only three items with large factor loadings were selected from each of these previously developed indices (e.g., Ho & MacDorman, 2010), because this study required many stimuli to be rated. The categorization tasks preceded the task stimuli ratings to reduce bias from exposure to the warmth and eeriness indices.

Plots of percentage categorized as real, eeriness, warmth, and response times
For each group and transition, Fig. 4  at a given fraction of real for the foreground and background feature sets. For humans and animals, the percentage increased with fraction of real, approximating a logistic function. For humans, a lack of realism was more noticeable for the consistency-reduced transitions than for the control, as indicated by the rightward shift in the curve. Distinguishing real from computer animated was easier for humans and animals than for objects.
Figs. 5 and 6 show that for humans and animals on the control transition, the 3D computer models were always the eeriest and coldest; perceived eeriness fell markedly, and warmth rose, as fraction of real increased. For humans on the consistency-reduced transitions, less realism in the eyes, eyelashes, and mouth increased eeriness and coldness disproportionately compared with the skin, nose, and eyebrows (as indicated by the area below the solid line and above the dashed line in the lower-right and the upper-left transitions). For all groups, the eeriest and coldest stimulus was the most certain (diamond), not the most ambiguous (unfilled circle). Reducing consistency in realism increased eeriness, and tended to decrease warmth, for humans and animals but not for objects. Fig. 7 shows that response times were slower for more ambiguous stimuli. Response times were significantly faster for 3D computer models of humans and animals than of objects. Thus, in the high and intermediate anthropomorphism groups, a lack of realism was more noticeable than in the low anthropomorphism group.

Data analysis preliminaries
Test statistics were interpreted with a significance threshold of a = .05. Response times were log 10 -transformed to remove their positive skew before analysis. Linear Q-Q plots confirmed that the collected data were normal. Effect sizes for manipulations were calculated using partial eta-squared g 2 p and interpreted according to the following thresholds: small = .01, medium = .06, and large = .14; effect sizes for differences between means were calculated using Cohen's d and interpreted according to the following thresholds: small = .20, medium = .50, and large = .80; effect sizes for chi-squared tests were calculated using / and interpreted according to the following thresholds: small = .10, medium = .30, and large = .50 (Cohen, 1992). All reported pairwise comparisons reflect Bonferroni-Holm correction. When Mauchly's Test of Sphericity indicated that the assumption of sphericity was not met, we reported adjusted degrees-of-freedom using the Greenhouse-Geisser correction.
For data analysis we collapsed across each entity and investigated the three transitions: diagonal, lower right, and upper left. For testing H1 we conducted within-group, repeated-measures ANOVAs with fraction of real as the only factor. In a preliminary analysis, we modeled the data in each transition using a linear mixed-effects model with fraction of real as the fixed factor and entity as the random factor. The model showed entity had little effect (Supplementary Material, Section 1). Thus, we tested H1 and H2 with only the fixed factor-fraction of real. H3 was tested with a mixed-design ANOVA using two fixed factors-consistency in realism (reduced or control, the within-group factor) and level of anthropomorphism (low, medium, or high, the between-group factor). Because our preliminary analysis found similar responses on the animacy (living vs. inanimate) and the realism (computer animated vs. real) transitions, we only report responses for realism (see Appendix A.1). Correlations among the four indices (subjective eeriness, warmth, animacy, and realism) are reported in Appendix A.2, and a manipulation check of the effect of fraction of real on subjective realism ratings in Appendix A.3.

Human models
For the six human models in the high anthropomorphism group (n = 1172), three within-group, repeated-measures ANOVAs

Object models
For the four object models (n = 610), fraction of real significantly affected eeriness ratings for all three transitions with a small effect size: diagonal, F(5.04, 3040) = 14.14, MSE = 0.87, p < .001, These results support H1. For all three groups-humans, animals, and objects-the most ambiguous stimulus was neither the eeriest (Fig. 5) nor the coldest (Fig. 6). This pattern held for all three levels of anthropomorphism (humans, animals, and objects) and for all three transitions. Instead, the 0% real stimulus, which rated lowest in realism, was always the eeriest and coldest. It also had the fastest response time (Fig. 7). Appendix B reports how fraction of real affected response times.

Testing H2
Consistency reduced is defined as two (or more) sets of features that differ in their level of realism, operationalized in this experiment as fraction of real. Stimuli on the diagonal transition are control stimuli, while paired stimuli on the lower-right and upper-left transitions are consistency reduced relative to their controls. For each level of anthropomorphism (humans, animals, and objects), the main effect of reduced consistency in realism was tested. The effects of reduced consistency were also tested for paired stimuli on the diagonal and lower-right transition and on the diagonal and upper-left transition.

Human models
For the six human models, reduced consistency in realism significantly affected eeriness ratings for both the lower-right, t(5841) = 22.19, p < .001, d = 0.26, and upper-left transition, t(5846) = 11.22, p < .001, d = 0.13, compared with the diagonal transition with a small effect size. Pairwise comparisons showed that reduced consistency significantly increased eeriness for all paired stimuli in the lower-right transition and for three-fifths of the stimuli in the upper-left transition (Appendix C, Tables C1 and C2). Reduced consistency in realism also significantly affected warmth ratings for both transitions: lower right, t(5840) = 18.38, p < .001, d = 0.20; and upper left, t(5846) = 4.64, p < .001, d = 0.05. Pairwise comparisons showed that reduced consistency significantly decreased warmth for four-fifths of the stimuli in the lower-right transition, and three-fifths of the stimuli in the upper-left transition (Appendix C, Tables C3 and C4). In sum, reduced consistency in realism increased the uncanny valley effect for all consistency-reduced stimuli, except one in the lower-right transition (0-⅓ foreground-background fraction of real) and two in the upper-left transition (⅔-0 and ⅓-0).

Animal models
For the two animal models, reduced consistency in realism also significantly affected eeriness ratings for both the lower-right, t (2229) = 7.72, p < .001, d = 0.14, and upper-left transition, t(2224) = 11.42, p < .001, d = 0.24, compared with the diagonal transition with a small effect size. Pairwise comparisons showed that reduced consistency significantly increased eeriness in three-fifths of the stimuli in the lower-right transition and four-fifths of the stimuli in the upper-left transition (Appendix C, Tables C5 and C6). Reduced consistency in realism also significantly affected warmth ratings in both the lower-right, t(2234) = 2.62, p = .009, d = 0.04 and upper-left transitions, t(2229) = 10.79, p < .001, d = 0.21. Pairwise comparisons showed that reduced consistency in realism significantly decreased warmth in one-fifth of the stimuli in the lower-right transition and four-fifths of the stimuli in the upperleft transition (Appendix C, Tables C7 and C8). In sum, reduced consistency in realism increased the uncanny valley effect in only one consistency-reduced stimulus in the lower-right transition (0-1 foreground-background fraction of real) and in all but two consistency-reduced stimuli in the upper-left transition (1-⅔ and ⅓-0).

Object models
By contrast, for the object models, reduced consistency in realism did not significantly affect eeriness ratings in the lower-right transition, t(3043) = .37, p = .711, d = 0.01, but did affect them in the upper-left transition, t(3044) = 2.21, p = .027, d = 0.03, compared with the diagonal transition. Reduced consistency significantly affected warmth ratings in the lower-right transition, t(3043) = 2.55, p = .011, d = 0.04, but did not in the upper-left transition, t (3044) = 1.95, p = .051, d = 0.03. However, for all these conditions, the effect size was negligible. Furthermore, no pairwise comparisons showed significant effects of reduced consistency on eeriness or warmth in either the lower-right or upper-left transition.
These results support H2. Reduced consistency in realism caused a small uncanny valley effect in high and intermediate anthropomorphism groups (humans, animals); however, the effect in the low anthropomorphism group (objects) was negligible.

Testing H3
For testing H3 we analyzed outcomes of two entities (dog and parrot, n = 2240) from round 1, four entities (Clint, Emelie, Juliana, and Simona, n = 3620) from round 2, and three entities (camera, washer, and water lily, n = 2145) from round 3. Reduced consistency in realism was treated as a within-group factor, and level of anthropomorphism a between-group factor. A mixed-design ANOVA (anthropomorphism: low, medium, or high Â realism consistency: control or reduced) confirmed that the level of anthropomorphism significantly affected the difference in eeriness ratings between control stimuli (diagonal transition) and consistencyreduced stimuli for both the lower-right, F(2, 7983) = 51.76, MSE = 1.11, p < .001, g 2 = .03, and the upper-left transition, F (2, 7982) = 32.11, MSE = 1.12, p < .001, g 2 = .02, with a small effect size. These results indicate that reduced consistency in realism affected eeriness ratings of humans, animals, and objects differently (Fig. 8).
For the lower-right transition, a post hoc Tukey's honest significant difference (HSD) test found that reduced consistency in realism increased eeriness significantly more for humans than for animals and significantly more for animals than for objects, p < .001. For the upper-left transition, a post hoc Tukey's HSD found that reduced consistency increased eeriness significantly more for humans than for objects and significantly more for animals than for objects, p < .001. However, reduced consistency increased eeriness significantly less for humans than for animals, p < .001.
A mixed-design ANOVA also confirmed that the level of anthropomorphism significantly affected the difference in warmth ratings between control stimuli and consistency-reduced stimuli for both the lower-right, F(2, 7982) = 46.62, MSE = 0.70, p < .001, g 2 = .01, and the upper-left transition, F(2, 7982) = 56.87, MSE = 0.73, p < .001, g 2 = .01, with a small effect size. For the lower-right transition, a post hoc Tukey's HSD found that reduced consistency decreased warmth significantly more for humans than for animals and objects, p < .001. However, reduced consistency did not decrease warmth significantly more for animals than for objects, p = .483. For the upper-left transition, a post hoc Tukey's HSD found that reduced consistency decreased warmth significantly more for animals than for objects, p < .001. However, reduced consistency decreased warmth significantly less for humans than for animals, p < .001. Reduced consistency did not decreased warmth significantly more for humans than for objects, p = .067.
H3 states that the more anthropomorphic the entity, the more reduced consistency in feature realism increases the uncanny valley effect. H3 was supported for three of six comparisons (three levels of anthropomorphism Â two transitions). H3 was not supported in comparing humans and animals in the upper-left transition, humans and objects in the upper-left transition, and animals and objects in the lower-right transition.

Support for realism inconsistency, not category uncertainty
Several authors have proposed that the uncanny valley effect is caused by category uncertainty (Burleigh et al., 2013;Green et al., 2008;Jentsch, 1906;Kang, 2009;MacDorman, Vasudevan, et al., 2009;Yamada et al., 2013). However, this study could find no evidence to support that. In a categorization task on animacy (living vs. inanimate) and realism (computer animated vs. real), the eeriest and coldest stimuli were those categorized with the most certainty: the 3D computer models of humans and animals . They were also the stimuli the participants categorized the fastest (Fig. 7). The high processing fluency of uncanny stimuli is significant given that Yamada et al. (2013) attributed the uncanny valley effect to low processing fluency.
If category uncertainty were the cause of the uncanny valley effect, as objective realism increased, eeriness should have peaked at the most ambiguous stimulus-achieving a slope of zero-and then declined. On the contrary, at the most ambiguous stimulus, the slope was negative and relatively steep (Fig. 5). As the most ambiguous stimulus was neither the eeriest nor the coldest for all groups and transitions, H1 is supported. This study next tested the predictions of an alternative theory, that realism inconsistency causes the uncanny valley effect (MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009;Mitchell et al., 2011;Moore, 2012). Specifically, we proposed that, in anthropomorphic entities, features with inconsistent levels of objective realism cause brain processes to make conflicting inferences about the entity, resulting in large feedback error signals. Reduced consistency in feature realism increased eeriness and decreased warmth for humans and animals (Fig. 8). These effects were negligible for objects. Thus, H2 is supported.
We also proposed that the more anthropomorphic the entity the more reduced consistency in feature realism increases the uncanny valley effect. This hypothesis was supported for three of six comparisons. Thus, H3 is partially supported.

Features contributing to the uncanny valley effect
The eyes, eyelashes, and mouth increase eeriness and decrease warmth more than the rest of the face. This is a compelling result given that they cover a much smaller area. Participants may be more sensitive to the eyes, eyelashes, and mouth because of their role in (a) primate reproduction, (b) social communication, including joint attention, (c) the encoding of faces in memory, and (d) visual processing in the brain, and (e) because interlocutors spent more time looking at the eyes than at other facial features (Bateson, Nettle, & Roberts, 2006;Emery, 2000;Fox & Damjanovic, 2006;Looser & Wheatley, 2010;McKelvie, 1976;Puce, Allison, Bentin, Gore, & McCarthy, 1998;Quinsey, Ketsetzis, Earls, & Karamanoukian, 1996).

Why the 3D models were the eeriest
As expected, the 3D computer models were universally eerier and colder than photographs of their real counterparts. This pattern held even when the 3D computer models were made more attractive with makeup or when the actual person (81-year-old Zlatko) had more wrinkles and blemishes than the model (Fink, Grammer, & Thornhill, 2001;Fink et al., 2008;Jones, Little, Burt, & Perrett, 2004). As also expected, for humans and animals, eeriness followed an inverse logistic function for the realismconsistent transition (MacDorman, Green, et al., 2009;MacDorman, Vasudevan, et al., 2009). Our assumption is that the process of computer sculpting, texturing, and rendering 3D models introduces extraneous realism inconsistency. Although we were unable to eliminate this confound with professional artistry and digital makeup, the same holds for large film studios (Butler & Joschko, 2009;Freedman, 2012).
In addition, entities in the uncanny valley may appropriate an avoidance mechanism targeted at human and other animal threats (MacDorman & Ishiguro, 2006a; Mori, 1970/2012); for example, the 'dead eyes' of a 3D computer model, by resembling those of a corpse, could elicit aversive responses from a fear or disgust system to prevent contact with potential vectors of infection. MacDorman and Entezari (2015) found correlational support for the threatavoidance theory in a study on individual differences. Sensitivity to reminders of mortality predicted significantly higher eeriness and lower warmth ratings of realism-inconsistent android robots. In other words, the uncanny valley effect was stronger in individuals who felt more disturbed by such events as touching dead bodies, walking through graveyards, and sleeping in a dead man's room.
The uncanny valley hypothesis predicts a negative emotional appraisal of entities that appear and behave not quite human. The effect was in fact strongest when observing the least realistic faces. We do not interpret this as refuting the hypothesis because Mori positioned the uncanny valley to the left-or less human side-of entities that are clearly not human (e.g., a doll, okina mask, and bunraku puppet in Fig. 3 of Mori, 1970Mori, /2012. Mori never argued that to be uncanny an entity must appear ambiguously human. This view should instead be attributed to other authors (Burleigh et al., 2013;Green et al., 2008;Kang, 2009;MacDorman, Vasudevan, et al., 2009;Yamada et al., 2013).
Our findings indicate a need to explore broader continua of stimuli, from abstract human representations, to 3D computer models, to real people. There is also a need to investigate the interplay between category uncertainty, realism inconsistency, and animation. When the faces are animated, the bottom of the valley could shift rightward, toward stimuli that are physically more similar to human beings. But for still faces at least, the findings indicate human character modelers should strive for consistently realistic designs.
We are grateful to Ryan Sukale for developing and maintaining this study's website and experimental applications and to Tyler Burleigh, Marcus Cheetham, Ronald E. Day, Kurt Gray, Stevan Harnad, Wade Mitchell, Himalaya Patel, Roger K. Moore, Steven Sloman, Angela Tinwell, and several anonymous reviewers for their advice on revising the manuscript. This research is supported by the US National Institutes of Health (P20 GM066402).

A.1. Animacy vs. realism
In the first round of experiments, animacy (living vs. inanimate) and realism (computer animated vs. real) categorizations and response times followed a similar pattern. Plots of percentage categorized as real (vs. fraction of real) appeared nearly identical in shape to plots of percentage categorized as living. The same pattern appeared for the response times. Although a chi-square test of independence found the difference between percentage categorized as real and percentage categorized as living to be significant owing to the large number of trials, the effect size was negligible, v 2 (1, n = 28,832) = 12.69, p < .001, / = .02. Thus, to avoid redundancy animacy categorizations are not reported. Table A1 reports the correlations among the four indices: subjective eeriness, warmth, animacy, and realism. Animacy ratings were only available from the first round of the experiments.

A.3. Realism manipulation
As a manipulation check, we conducted three within-group, repeated-measures ANOVAs to investigate the effect of fraction of real on subjective realism ratings for each of the three transitions: diagonal, lower right, and upper left. (Participants rated stimuli on realism in the first and the third round of the experiment.) Subjective realism monotonically increased with fraction of real, approximating the shape of a logistic function. For the two human models, fraction of real significantly affected realism ratings for all three transitions with a large effect size: diagonal, For Tables C1-C8, per-test a was set using the Bonferroni-Holm correction. Boldface indicates the consistency-reduced stimulus was significantly eerier or colder for the compared foregroundbackground fractions of real.