Dubious decision evidence and criterion flexibility in recognition memory

Kantner, Justin; Vettel, Jean M.; Miller, Michael B.

doi:10.3389/fpsyg.2015.01320

ORIGINAL RESEARCH article

Front. Psychol., 08 September 2015

Sec. Cognition

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.01320

Dubious decision evidence and criterion flexibility in recognition memory

$\r\nJustin Kantner,*$ Justin Kantner^1,2*

Jean M. Vettel^1,2

Michael B. Miller²

¹U.S. Army Research Laboratory, Aberdeen, MD, USA
²University of California, Santa Barbara, Santa Barbara, CA, USA

When old–new recognition judgments must be based on ambiguous memory evidence, a proper criterion for responding “old” can substantially improve accuracy, but participants are typically suboptimal in their placement of decision criteria. Various accounts of suboptimal criterion placement have been proposed. The most parsimonious, however, is that subjects simply over-rely on memory evidence – however faulty – as a basis for decisions. We tested this account with a novel recognition paradigm in which old–new discrimination was minimal and critical errors were avoided by adopting highly liberal or conservative biases. In Experiment 1, criterion shifts were necessary to adapt to changing target probabilities or, in a “security patrol” scenario, to avoid either letting dangerous people go free (misses) or harming innocent people (false alarms). Experiment 2 added a condition in which financial incentives drove criterion shifts. Critical errors were frequent, similar across sources of motivation, and only moderately reduced by feedback. In Experiment 3, critical errors were only modestly reduced in a version of the security patrol with no study phase. These findings indicate that participants use even transparently non-probative information as an alternative to heavy reliance on a decision rule, a strategy that precludes optimal criterion placement.

Introduction

Decision making is often guided by bias, and bias is often adaptive. Indeed, the ability to take the same action freely in some situations and cautiously in others is essential to decision making in everyday life. For example, an individual must shift from a more accepting to a more skeptical stance in evaluating information from more or less reputable sources; a student readily answers questions while among friends, but is extremely cautious in a classroom setting; a basketball player passes on a shot while protecting a lead, but not when trying to catch up. Such criterion shifts – between liberal and conservative standards of evidence for a decision – tailor decision strategy to the situation and can be essential to avoiding errors, especially when decision evidence is ambiguous.

Criterion shifting has been of substantial interest in recognition memory (Hockley, 2011), a task domain well suited to studying the interaction of memory and decision processes. According to most models of recognition, judgments as to whether a person, place, or object has been encountered previously are based on whether its appearance elicits a criterial level of memory evidence (Parks, 1966). One can use (1) a liberal criterion, accepting items as old on the basis of relatively little memory evidence, (2) a conservative criterion, requiring substantial evidence before making an “old” judgment, or (3) a relatively neutral criterion, favoring neither response. When memory evidence is ambiguous, context-appropriate criterion shifting can produce accurate decisions. Consider the case of meeting a person who strikes us as vaguely familiar: if the encounter occurs during a vacation, where we would not expect to see many people we know, we should use a conservative decision criterion and conclude that the individual is new. If it occurs in our neighborhood, by contrast, we should be more likely to conclude that we know the individual.

Despite the adaptive value of criterion shifting for decision making, recognition memory studies demonstrate that individuals are limited in their ability (or inclination) to make such shifts. There is little doubt that participants can make appropriate shifts under some conditions: provided with corrective feedback and/or explicit instructions, participants can adapt bias to within-list changes in the prior probability of an old item (e.g., Rhodes and Jacoby, 2007; Aminoff et al., 2012) and the memory strength of old items (Verde and Rotello, 2007; Singer, 2009). They can also utilize a more liberal criterion for items tested after a long delay than after a short delay (Singer and Wixted, 2006), a more conservative criterion when distractors are highly similar to targets than when they are dissimilar (e.g., Benjamin and Bawa, 2004), and a more conservative criterion for recognizing familiar than unfamiliar scenes (Dobbins and Kroll, 2005). In the absence of feedback or highly salient differences between two item classes, however, criterion shifts are generally not observed (e.g., Rhodes and Jacoby, 2007; Verde and Rotello, 2007). Thus, research has investigated the circumstances that do versus do not elicit criterion shifts in order to determine the flexibility of the recognition system to change decision rules.

Although much research has examined the question of when criterion shifts occur, less is known about why they are often found to be inadequate to maximize accuracy or expected payoffs (e.g., Ulehla, 1966; Benjamin et al., 2009; Aminoff et al., 2012). A criterion that maximizes a recognizer’s proportion of correct responses must be calibrated both to the conditions of the task and to the individual’s ability to discriminate old items and new (see Green and Swets, 1966). If participants are told that 70% of test items are old, for example, an “old” response should be given whenever the recognizer is unsure; individuals with lower recognition sensitivity are unsure on a higher proportion of trials than those with greater sensitivity, and should set more liberal criteria to improve their percentage of correct responses [for further discussion of this point see Lynn and Barrett (2014)]. If old–new discrimination is at chance, for example, one should respond “old” on every trial, a strategy that would result in 70% accuracy. Aminoff et al. (2012) tested shifting between blocks containing 70 and 30% old items and used each participant’s sensitivity (d’) score to calculate the criterion with which s/he would maximize proportion correct. Though they found substantial individual differences in the magnitude of shifts, no participant shifted widely enough to maximize accuracy.

Researchers have advanced several hypotheses regarding the basis of inadequate criterion shifting (also called “conservatism”). Ulehla (1966, p. 569) noted that “subjective probability lags behind objective probability” in some choice domains and proposed that participants in signal detection tasks over/underestimate target base rates. Parks (1966) noted that a strategy of distributing “old” and “new” responses according to the base rates (i.e., “probability matching”) produced insufficient criterion placement, and Thomas and Legge (1970) reported data suggesting that participants do use such a strategy (though subsequent work disfavored probability matching as a full account of criterion placement; Thomas, 1975; Benjamin et al., 2009). Kubovy (1977) suggested that suboptimal criterion placement is driven by incorrect intuition as to the shape of the target and lure evidence distributions. Benjamin et al. (2009) presented evidence that trial-by-trial noise in the decision criterion – a consequence of the effort required to maintain and shift the criterion – can produce conservatism in criterion shifting when base rates are manipulated.

Also relevant to the question of inadequate criterion shifting are general models of criterion placement. According to the means model (Hintzman, 1994), participants estimate the mean of the “old” item distribution, perhaps by learning the average increment in memory strength afforded by study list presentation, and establish a criterion at a point between this “old” mean and the mean of a pre-experimentally familiar new-item distribution. According to the range model (Parducci, 1984), participants estimate the highest and lowest memory strength values to be encountered at test, perhaps by assessing the memory strength of easily recalled old items and new items, respectively, and establish their criterion between these endpoint values. Hirshman (1995) tested these two models in addition to one based on the probability matching strategy described above; his analyses weighed in favor of the range model. Recently, Lynn and Barrett (2014) outlined a “utilized” signal detection framework for modeling optimal criterion placement according to three environmental factors: target base rates, the costs, and benefits (financial or otherwise) associated with decision outcomes, and the similarity of targets to lures. According to this model, suboptimal criterion placement results from a failure to properly estimate one or more of these variables.

While any of the above mechanisms may help explain criterion placement for a given participant and/or a given criterion manipulation, an additional hypothesis with considerable explanatory reach has received relatively little attention. In their seminal treatment of signal detection theory, Green and Swets (1966) proposed that participants fail to place sufficiently extreme criteria because they are unwilling to abandon the use of decision evidence, even when that evidence leads to uncertainty. Green and Swets (1966) described this phenomenon as follows:

“The observer tends to avoid extreme criteria: when the optimal β is relatively large, his actual criterion is not so high as the optimal criterion, and when the optimal β is relatively small, his criterion is not so low as the optimal criterion. Although this pattern is consistent with studies of decision making under uncertainty which do not involve ambiguous sensory information, the significance of its appearance here is not totally clear. It may be suspected that the subject’s natural disinclination to make the same response on all trials is strengthened by his awareness that the experimenter’s principle interest is in a sensory process. He probably finds it difficult to believe that he would be performing responsibly if the sensory distinctions he makes are exactly those that he could make by removing the earphones in an auditory experiment or by turning his back on a visual signal. (p. 91)”

As Green and Swets noted, participants are aware that the experimenter is interested in a sensory process, a fact that may limit their willingness to abandon sensory/memory evidence in making judgments. To the extent that participants prefer to base recognition decisions on their own resolution of an ambiguous signal rather than defer to a decision rule (e.g., “When most of the items are old and I am unsure, I should respond “old”), their criterion will be suboptimal. This account is a powerful one in that it can be applied to any of the commonly used criterion manipulations (i.e., base rates, payoffs, instructional motivation). In addition, it avoids assumptions of potentially taxing mental computations such as target probabilities, response frequencies, or features of the distributions that subjects may be unable or disinclined to perform.

Despite its generality and relative simplicity, the hypothesis that inadequate criterion shifting is driven by an overreliance on memory evidence (and a consequent under-reliance on probative non-memory forms of evidence, such as payoffs or base rates) has gone largely unexamined. The present experiments were designed to test this hypothesis by assessing the extent of participants’ reliance on decision evidence in recognition memory. In a departure from standard recognition procedures, we tested the effects of three criterion manipulations using an extremely homogeneous stimulus set that yielded near-zero old-new discrimination, such that participants could easily perceive the ambiguity of the memory evidence. The magnitude of criterion shifts was measured as inverse evidence of reliance on this dubious memory evidence: to the extent that participants resist basing decisions on memory evidence, they should follow the decision rule mandated by the criterion manipulation and adopt extreme conservative/liberal criteria. To the extent that they avoid setting extreme criteria, we infer that they are using transparently non-probative evidence to make recognition decisions. Across three experiments and three types of criterion manipulation, our results suggest that the use of such evidence is frequent and a major influence on recognition decisions, whether memory evidence is entirely ambiguous (Experiments 1 and 2) or altogether absent (Experiment 3).

Experiment 1

The design of Experiment 1 varied from that of standard recognition memory paradigms in two respects. First, recognition experiments are typically designed such that participants have at least moderate old-new discrimination at test. The more participants are able to use memory evidence to make decisions, the less they benefit from criterion shifting. Additionally, even modest levels of discrimination may be sufficient for participants to believe that they can rely on memory evidence and eschew large shifts (Aminoff et al., 2012). To create a strong test of participants’ use of memory evidence, we used an extremely homogeneous stimulus set (described below) in which old and new items are so similar that discrimination is near chance. Such a transparently difficult recognition task should lead subjects to place minimal weight on memory evidence and yield to a decision rule as a basis for memory decisions.

A second feature of most recognition experiments is that the impetus for making accurate judgments is simply accuracy itself, except when financial incentives are used (e.g., Healy and Kubovy, 1978). Perhaps individuals can readily be induced to limit their use of memory evidence when the recognition task provides a compelling subjective reason to bias responses. We created such a task by converting a typical recognition paradigm into a “security patrol for suspicious persons.” Participants studied a list of “suspicious” individuals and, at test, were told to respond “suspicious” to anyone recognized from the study list and “innocent” to anyone not on the study list. In liberal blocks, participants were informed that calling an individual “suspicious” meant pulling that individual aside for questioning and search, and participants should respond “suspicious” whenever in doubt. In conservative blocks, calling an individual “suspicious” meant that they would be subject to “aggressive pursuit, probable injury, and capture,” and participants should respond “innocent” whenever unsure. To ensure that criterion shifts were motivated solely by the shifts in the nature of the patrols, the base rate of targets was fixed at 0.30 across blocks (a proportion that produced a minority of suspicious individuals but left enough target trials to allow reliable estimates of sensitivity and response criterion). Coupled with the extremely poor discrimination of suspicious and innocent probes, the importance placed on avoiding critical errors in the two types of patrols (see Method for full instructions) should provide an abundance of incentive to adopt extreme response criteria in both liberal and conservative blocks, shifting widely between the two scenarios.

As a comparison condition, we also tested shifting in a standard target probability manipulation. In the liberal (conservative) test blocks, 70% (30%) of test items were old; the stimuli were those used in the security patrol condition, but no mention of a patrol scenario was made. Each participant completed both tasks (Percent and Patrol) to enable within-subjects comparisons of criterion shifting, critical misses, and critical false alarms across different sources of motivation.

Finally, to further motivate criterion shifting, half of the participants in each group received trial-by-trial feedback at test. Feedback was tailored to each task (see Method) and not only conveyed the accuracy of responses but, in the case of errors, provided persistent reminders of the appropriate bias in a given test block.

Method

Participants

One hundred twelve undergraduates at the University of California, Santa Barbara participated for course credit. The feedback and no feedback conditions included 57 and 55 participants, respectively. All experiments reported in this article were approved by the Human Subjects Committee at the University of California, Santa Barbara.

Materials

Stimuli were 324 full-body human models created with 3ds Max software (Autodesk, Inc). Each item combined a unique head model with one of six hairstyles (created with FaceGen, Singular Inversions Ltd.) and one of eight body models (ES3DStudios) wearing one of six clothing styles in one of 13 clothing colors. Thus, each individual had a unique face, but non-face features overlapped. Half of the models were male.

Each model appeared against a white background for presentation during study and embedded in a realistic desert environment (ES3DStudios) for presentation at test. Sixteen scenes were used; half depicted the center of a city and half depicted the outskirts. Individual models were centered in each scene and faced forward. Test probes were presented against city backgrounds during one type of patrol and against outskirts backgrounds during the other (counterbalanced across participants). Thus, the backgrounds provided a visual context consistent with a liberal “city patrol” and a conservative “outskirts patrol,” or vice versa. Examples of the stimuli appear in Figure 1.

FIGURE 1

FIGURE 1. Examples of stimuli used in the current experiments: female with outskirts background (left) and male with city background (right).

Study and test lists for each of two recognition study-test cycles were created via random selection from the 324-item pool for each participant. The study list for each cycle was composed of 70 randomly selected individuals appearing against white backgrounds. The test lists for each cycle included a randomly ordered intermix of studied and non-studied individuals, each appearing against a city or outskirts background. No items were repeated between the two study-test cycles. The experiment was conducted with E-Prime software (Psychology Software Tools, Inc., Sharpsburg, PA, USA).

Procedure

Each participant completed two recognition study-test cycles, one a Percent task (target probability manipulation) and the other a Patrol task. Half of the participants received trial-by-trial feedback and half received no feedback. In each task, 70 items were studied for 2 s each with a 1-s inter-stimulus interval. Tests were divided into four 35-item blocks that interleaved liberal and conservative conditions. The assignment of tasks and feedback conditions to participants, task order, test block order, assignment of city/outskirts backgrounds to liberal/conservative patrols, and ordering of the both study and test lists were randomized anew for each participant.

Patrol Task

Participants were informed that they would be taking part in a simulated security patrol for suspicious persons and that they would begin by studying suspicious individuals to be recognized later. Test instructions informed participants that they would be presented with a mixture of suspicious people from the study list and innocent people they had never seen and that their task was to respond “suspicious” (by pressing the “1” key) to the former and “innocent” (by pressing “0”) to the latter. Participants were told that the nature of the patrol would vary according to the location of the individuals. For liberal blocks, participants received these instructions:

“While you are on patrol in the outskirts, identifying an individual as SUSPICIOUS means that individual will be pulled aside for questioning and search. It is VERY IMPORTANT not to miss any of the suspicious people you saw earlier. Those people are potentially dangerous and need to be questioned and searched. Remember, in the outskirts, you definitely do not want to miss any of the suspicious people you saw earlier. It’s fine if you mistakenly pull aside some people for questioning who turn out to be innocent (not seen before). This is a minor inconvenience for them. But make sure you don’t miss any of the suspicious people you saw earlier!”

For conservative blocks:

“While you are on patrol in the city, identifying an individual as SUSPICIOUS means that they will be hunted down like a dangerous criminal. This will include aggressive pursuit, probable injury, and capture to the people that you identify. It is VERY IMPORTANT not to mistakenly identify an innocent person as suspicious. It would be an injustice to subject an innocent person to this treatment. Remember, in the city, you definitely do not want to identify innocent people as suspicious. It’s fine to miss some people who turn out to be suspicious (that you saw before). It is expected that some suspicious people will escape capture on this patrol. But make sure you don’t mistakenly hunt innocent people!”

The eight city and eight outskirts locations were introduced for 3 s each, accompanied by the type of error to be avoided (misses in liberal blocks and false alarms in conservative blocks). Each test block was characterized as a city patrol or an outskirts patrol. The base rate of old items was 0.30 in each block. Instructions preceding each block reminded participants of the crucial importance of avoiding critical misses or false alarms. Test responses were non-speeded. A blank 2-s interval separated each trial.

Percent Task

Participants were informed that they would be studying a list of individuals to memorize. Test instructions explained that city or outskirts location was diagnostic of probability old. The locations were introduced alongside the corresponding base rate of old items. Test blocks were defined by the prior probability of an old item (70 or 30%). Instructions appeared before each test block reminding participants of the percentage of old items in that block and explicitly advising that because most of the individuals in the block were (were not) on the study list, one should respond “old” (“new”) whenever unsure.

Feedback

Except for the presence of trial-by-trial feedback at test and related instructions, the feedback and no feedback versions of the tasks were identical. Instructions stated that feedback would be presented after each response and encouraged participants to use it to improve their decisions. In the Percent task, correct answers were followed with the phrase “Correct! That individual was/was not studied!” in a blue font. In the Patrol task, feedback read “Correct! That individual was suspicious/innocent!” in a blue font.

Feedback to incorrect responses varied between liberal and conservative blocks. In liberal blocks, Percent feedback was “Okay, but that individual was not studied” in a black font following a false alarm and “Wrong! That individual was studied! Remember, 70% are OLD!” in a red font following a miss. Patrol feedback read “Okay, but that individual was innocent” following a false alarm and “Wrong! That individual was suspicious! Remember, don’t miss anyone SUSPICIOUS!” following a miss. Analogous feedback was given in the conservative blocks. Feedback was presented for 2 s.

Results and Discussion

Recognition sensitivity (d′) was calculated as z(H) – z(FA), where H and FA are the hit and false alarm rates, respectively. Response bias was measured with c, equal to –[z(H) + z(FA)]/2. Hit and false alarm rate values of 0 and 1 were adjusted via Macmillan and Kaplan’s (1985) method to enable calculation of d′ and c: rates of 0 were adjusted upward to 0.5/N and rates of 1 were adjusted downward to 1-(0.5/N), where N is the number of signal trials (for hit rates) or the number of noise trials (for false alarm rates).

The mean d′ and c-values in each condition appear in Table 1. As expected, old–new discrimination was very poor. Across the patrol and percent tasks, the mean d′ for all participants was 0.132 (corresponding to a mean hit rate of 0.51 and a mean FA rate of 0.46). This value was significantly greater than zero, t(111) = 7.925, p < 0.001, indicating minimal but statistically above-chance discrimination. d′ data were analyzed with a 2 (Task: Percent vs. Patrol) × 2 (Bias: Liberal vs. Conservative) × 2 (Feedback: Present vs. Absent) mixed factor ANOVA with Task and Bias as within-subjects factors and Feedback as a between-subjects factor. d′ scores were modestly but significantly higher during liberal test blocks (M = 0.177) than during conservative test blocks (M = 0.096), F(1,110) = 8.026, p < 0.01, $η_{p}^{2}$ = 0.068, but did not vary as a function of task (p = 0.62) or feedback (p = 0.26). There were no significant interactions (all ps > 0.62).

TABLE 1

TABLE 1. Mean d′ and criterion values in liberal and conservative blocks, Experiment 1.

Before comparing criterion shifting between the two tasks in an ANOVA, we sought evidence of the efficacy of our shifting manipulations, i.e., that shifts were significantly greater than zero in both the Percent and Patrol tasks. We submitted values of c in liberal and conservative test blocks to planned paired samples t-tests for each of the four Task × Feedback pairings (Percent and Patrol, with and without feedback). Differences in c between liberal and conservative blocks were highly significant in each case (all ts > 7, all ps < 0.001).

Mean criterion shifts (c_conservative – c_liberal) are displayed as a function of task and feedback condition in Figure 2. As is evident from the figure, criterion shifting was robust, and increased with feedback. A Task × Feedback ANOVA revealed a main effect of task, F(1,110) = 5.56, p < 0.05, $η_{p}^{2}$ = 0.048, and feedback, F(1,110) = 14.7, p < 0.001, $η_{p}^{2}$ = 0.118, indicating that participants shifted significantly more in the Percent task than in the Patrol task, and that feedback increased shifting across both tasks. The main effect of task, however, was driven by the feedback condition, reflected by a significant Task × Feedback interaction, F(1,110) = 15.9, p < 0.001, $η_{p}^{2}$ = 0.126. Without feedback, shifting was directionally (but non-significantly) greater in the Patrol task, t(54) = 1.42, p = 0.16, but participants receiving feedback shifted far more in the Percent task, t(56) = 3.91, p < 0.001.

FIGURE 2

FIGURE 2. Mean criterion shifts in the feedback and no feedback conditions of the Percent and Patrol tasks in Experiment 1. Error bars represent the SEM.

Criterion shifts, while substantial, were not sufficient to minimize critical errors in the Patrol task (defined as misses in the liberal condition and false alarms in the conservative condition; see Figure 3). Without feedback, the critical false alarm and miss rates were 0.39 and 0.31, respectively, despite the strong instructional motivation to consider such errors unacceptable. These error rates were statistically equivalent to those in the Percent condition (both ps > 0.18). Feedback moderately reduced critical errors in both tasks, with one exception: misses in the liberal condition were approximately 0.10 higher with feedback than without. Thus, liberal misses in the feedback condition were much more frequent in the Patrol than in the Percent task, t(56) = 6.681, p < 0.001. Conservative false alarms in the feedback condition were roughly equivalent across the two tasks, t(56) = 1.606, p = 0.11.

FIGURE 3

FIGURE 3. Critical errors in liberal test blocks (misses, top) and conservative test blocks (false alarms, bottom) across recognition tasks in Experiment 1. Error bars represent the SEM.

In order to confirm that greater criterion shifts are associated with reduced critical errors, we calculated the correlation of shift amount and critical error rates across all Task × Feedback conditions. Strong relationships were observed between shifting and critical false alarms, r(223) = -0.65, p < 0.001 and between shifting and critical misses, r(223) = -0.62, p < 0.001. The negative direction of these relationships indicates that participants shifting more between liberal and conservative blocks committed fewer critical errors.

The results of Experiment 1 suggest that insufficient criterion placement in recognition memory is not limited to manipulations of target probability: in the security patrol scenario, the avoidance of critical misses and false alarms was described as imperative, yet both types of errors were committed at high rates. Indeed, critical error rates in the Patrol task did not differ from those of the Percent condition, which specified no critical errors to be avoided and involved no justification for criterion shifting beyond the target base rates. As expected, trial-by-trial feedback was generally associated with larger criterion shifts and a reduction in critical errors. However, participants receiving feedback still committed such errors on approximately 25% of trials, and in the liberal condition of the Patrol task, feedback apparently made participants more conservative. Even with little or no diagnostic memory evidence to rely upon, a compelling subjective motivation to adopt a simple decision rule that would avoid critical errors, and pointed negative feedback each time such mistakes were made, participants’ rates of such errors were well above floor levels.

Why didn’t participants execute large enough criterion shifts to prevent critical errors? The ironic effect of feedback in the liberal blocks of the Patrol task may be revealing. While the base rate of old items in the Percent task shifted between 70 and 30%, the probability of a suspicious individual in the Patrol task was 30% throughout the test. As noted above, we chose this base rate in order to hold suspicious items to a minority of trials, intended as a realistic feature of a security patrol. However, feedback likely allowed participants to learn that targets were relatively uncommon (e.g., Kantner and Lindsay, 2010), countervailing the instruction to avoid misses in liberal test blocks. That participants were induced to adopt a more conservative criterion with feedback suggests that they adapted their criteria to the low probability of a target, an appropriate strategy for increasing the overall proportion of correct responses but one contrary to the central goal of minimizing misses. This possibility is consistent with work by Maddox and Bohil (1998, 2005) demonstrating that participants in a perceptual categorization task set suboptimal criteria when optimality (in maximizing financial rewards) is at cross-purposes with the maximization of accuracy.

Thus, participants appear to have prioritized attempts at accuracy over the consistent application of a task-relevant decision rule. An emphasis on accuracy in the liberal Patrol task would be consistent with hypothesis that participants fail to discount memory evidence: regardless of the extremely poor quality of the memory evidence for discriminating old and new items, participants may have persisted in using such evidence as a basis for judgments in the hope of discerning the correct response. As a result, easily avoidable critical errors were prevalent.

The Patrol task in Experiment 1 was designed to test the limits of participants’ reliance on memory evidence by producing both near-zero old–new discrimination and a clear and compelling subjective valuation on the avoidance of misses or false alarms. The finding that criterion shifts in that task were no greater than in a traditional target probability manipulation suggests a limit on criterion flexibility common to both tasks. A limitation of the Patrol task, however, is the lack of a personal consequence for critical errors. While participants clearly understood the task and shifted adaptively, they might have been induced to further constrain the use of memory evidence if they themselves were affected by their decisions. We tested this possibility by placing a financial consequence on critical errors in Experiment 2.

Experiment 2

The simulated nature of the Patrol condition in Experiment 1 left open the possibility that participants overused memory evidence because the consequences of critical errors were fictional. We addressed this possibility in Experiment 2 by using asymmetric payoff schedules to induce criterion shifts (e.g., Healy and Kubovy, 1978). Participants studied and were tested on the same materials as in the Percent and Patrol tasks, but with no cover story or shifts in target probability. Critically, money was awarded for each correct response, while the penalty for errors varied to drive either liberal or conservative responding. Experiment 2 also included the Percent and Patrol tasks from Experiment 1. Due to concerns about the length of the experiment (especially given the extremely difficult nature of the recognition tests), each participant completed only two of the three tasks. The question of interest was whether a personal financial incentive would drive criterion shifts exceeding those of the Percent and Patrol tasks.

The Payoff condition was also valuable in assessing one potential explanation for the poor criterion placement observed in Experiment 1: perhaps participants realized that they should disregard memory evidence in making their recognition decisions and that they should instead rely on the most probable outcome/patrol directive, but chose not to because such extreme criteria entail a monotonous response pattern (i.e., nearly all “old” responses in liberal blocks and nearly all “new” responses in conservative blocks). While the desire to intermix responses alone would likely not account for the high rates of critical errors observed in Experiment 1, it is possible that some participants deliberately avoided extreme criteria for that reason, demonstrating a limit on criterion flexibility specific to very low d′ situations (when d′ is higher, by contrast, optimality does not require extreme criteria). If so, the Payoff condition should mitigate this strategy: we expect that few participants would knowingly sacrifice bonus money in exchange for a more varied response pattern.

Experiment 2 also included a slight modification to the Patrol task, designed to address the unexpected tendency for feedback to increase the miss rate in the liberal condition. As discussed above, feedback likely drove this increase by conveying the low base rate of targets, drawing participants into a conservative guessing strategy. Eliminating the diagnostic value of the base rate for guessing correctly, then, should restrict the influence of feedback to dissuading misses, reversing its effect in liberal Patrol blocks. To test this hypothesis, we set the target base rate to 0.50 in each block of the Patrol task in Experiment 2.