Confidence of Older Eyewitnesses: Is It Diagnostic of Identification Accuracy?

Abstract Since the late 1980s evidence has been accumulating that confidence recorded at the time of identification is a reliable postdictor of eyewitness identification. Nonetheless, there may be noteworthy exceptions. In a re-analysis of a field study by Sauerland and Sporer (2009; N = 720; n = 436 choosers between 15 and 83 years old) we show that the postdictive value of confidence was reduced for participants aged 40 years or older. Different calibration indices and Bayesian analyses demonstrate a progressive dissociation between identification performance and confidence across age groups. While the confidence expressed following an identification remained unchanged across the lifespan, identification accuracy decreased. Young, highly confident witnesses were much more likely to be accurate than less confident witnesses. With increasing age, witnesses were more likely to be overconfident, particularly at the medium and high levels of confidence, and the postdictive value of confidence and decision times decreased. We conclude that witness age may be an important moderator to take into account when evaluating identification evidence.

also noted a large heterogeneity across studies, ranging between r pb = .10 and r pb = .74. Subsequent studies found even higher confidence-accuracy correlations (e.g., Lindsay, Nilsen, & Read, 2000;Read, 1995). Therefore, it seems worth exploring under which conditions these relationships may be particularly high or not so high.
Recently, in several re-analyses of previous research, Wixted and colleagues concluded that confidence -if measured and recorded at the time of an initial eyewitness identification -was a reliable marker of identification accuracy (Wixted, Mickes, Clark, Gronlund, & Roediger, 2015;Wixted, Read, & Lindsay, 2016). However, this conclusion may only hold under "pristine" conditions (i.e., fair lineup instructions, fair lineups, etc.: Wixted & Wells, 2017). As noted by Palmer, Brewer, Weber, and Nagesh (2013), "it remains an empirical question as to whether there are conditions under which the confidence-accuracy relationship does collapse entirely" (p. 66). Here we draw attention to the possibility that even under "pristine" conditions moderators of the confidence-accuracy relationship do exist and may affect accuracy independently of confidence, thus yielding a dissociation that reduces the probative value of confidence. In particular, we focus on the age of eyewitnesses -from juveniles to the aged -as a potential moderator. While there is a robust effect of age on recognition performance (see the meta-analyses by Fitzgerald & Price, 2015;Kocab, Martschuk, & Sporer, 2018;; see also the review by Bartlett & Memon, 2007), much less is known about confidence of older witnesses, and in particular the confidenceaccuracy relationship.

Metamemory Models of the Confidence-Accuracy Relationship
According to the direct-access view of metacognitive judgments, reported confidence can be understood as an indicator of the strength of a memory trace: The stronger the trace, the more likely a piece of information will be recalled or recognized. Followingly, in an identification decision confidence is expected to be higher when the witness experiences a strong sense of familiarity that the suspect in the lineup matches her or his image of the perpetrator. Signal detection theory models applied to eyewitness identification decisions still rely on this assumption but acknowledge that other types of information (e.g., feedback provided by a police officer) may also affect confidence (for a recent review, see Wixted & Wells, 2017).
An alternative meta-memorial explanation prevalent in the basic literature on memory is cue utilization theory (Koriat, 1997), which suggests that people use different types of cues to assign confidence judgments through analytic heuristics. However, metamemory studies have usually been conducted with paradigms that are quite different from those employed in eyewitness identification studies. Many studies have used word list or paired associate paradigms whereby words are either known or have varying degrees of associations with the stimuli-to-be-recalled or recognized. Somewhat closer to eyewitness studies are name-learning paradigms, which demonstrated monitoring accuracy for nouns as targets to be higher than for faces or names (Watier & Collin, 2011). In contrast, in eyewitness identification studies, the target is an unfamiliar person/face that has to be recognized among a list of distractors (fillers) that are usually preselected on the basis of some similarity to the target (as in the study presented here).
In studies on ease of learning, judgments of learning, or feeling of knowing, participants are asked before a memory test if particular items will be recalled or recognized (for review, see Dunlosky & Tauber, 2014). In eyewitness identification studies, this would correspond to the police asking a witness before an identification task, if he/she believes to be able to recognize the target in a lineup (referred to as predecision confidence vs post-decision confidence which is assessed after a lineup; e.g., Sporer, 1992). There is ample evidence that pre-decision confidence is unrelated to the accuracy of a later identification (see the meta-analysis by Cutler & Penrod, 1989). Hence, in more recent studies, researchers have no longer included pre-decision confidence as an assessment variable.
There is also a large body of research on metacognitive control and memory reporting (e.g. Goldsmith, Pansky, & Koriat, 2014;Koriat & Goldsmith, 1994, which has been fruitfully applied to eyewitness recall (e.g., . The major difference to identification studies is that in Koriat and colleagues' studies the focus is on quality vs quantity of recall, not on the accuracy of recognition (but see Perfect & Weber, 2012, for a noteworthy exception).
Thus, while we do not question the importance of these metamemory studies for eyewitness recall, we are reluctant to apply the underlying theories to face recognition or person identification paradigms. In person identification studies, there is (usually) only one unfamiliar target-to-be recognized which has been seen once, usually without instructions to commit the face to memory (i.e., an incidental learning paradigm).
When eyewitness researchers applied Koriat's (1997) cue utilization approach to an identification paradigm (Perfect & Weber, 2012), important differences emerged that further support our claim that metamemory theories developed in verbal learning paradigms may not yield comparable results. Whereas the order for "forced reports" and "free reports" did not matter in Koriat and Goldsmith's (1996, Experiment 2) study, for eyewitness identification decisions the order was crucial (Perfect & Weber, 2012). Also, in an identification study by Weber and Perfect (2012) free-report decisions were significantly more diagnostic than forced-report decisions, with no significant reduction in the quantity of correct identifications obtained. That is, there was a benefit to memory accuracy that was accompanied by a negligible cost to memory quantity, in an apparent contradiction of the memory control framework. These data indicate that a simple version of Koriat and Goldsmith's (1996) control framework may not apply to identification decisions when a "Don't know" option is provided. Unfortunately, most identification studies do not offer a "Don't know" option, thus, complicating the comparison between studies in the basic metamemory literature and the eyewitness domain even further (see also the recent discussion on the occurrence of positive, null, and negative confidence-accuracy relationships by Roediger & DeSoto, 2015).
In an attempt to test these rival views within an identification paradigm, Busey, Tunnicliff, Loftus, and Loftus (2000) provided evidence for both the direct access and cue utilization models. When they showed photographs with varying exposure time (Experiment 1) and different levels of illuminations (Experiment 2) the findings were consistent with the trace access memory provided. However, when illumination levels between the exposure and test were varied (Experiment 3), the findings were consistent with the cue utilization approach. Generally, Busey et al. (2000) showed a high confidence-accuracy relationship, unless the conditions were not optimal at the test. This is an example where confidence and accuracy rely on mainly the same information under some circumstances but may also yield a dissociation between confidence and accuracy under different conditions.

The Confidence-Accuracy Relationship across the Lifespan
The confidence-accuracy relationship has been well researched for over 40 years, yet to our knowledge only three studies investigated the issue over the lifespan in an eyewitness identification paradigm in a field setting (Palmer et al., 2013, age range 14-87;Sauer, Brewer, Zweck, & Weber, 2010, age range 15-85;Sauerland & Sporer, 2009, age range 15-84). However, in all three studies the analyses were conducted without separating the age groups. In Sauer et al. (2010) only 6% of the participants (n = 64) were over 60 years of age (Sauer, 2016, personal correspondence2). Palmer et al. (2013) conducted additional analyses excluding participants younger than 18 years and older than 65 years. They found no differences in results, except for the influence of exposure time on resolution and of retention interval on calibration. When all participants were included, the resolution and calibration scores were significantly better following a longer exposure duration and a shorter retention interval. By contrast, when adolescents and older participants were excluded from the analyses, no effect of exposure time and retention interval was found. These analyses, however, did not specifically address age-related changes in the confidence-accuracy relationship. In Sauerland and Sporer (2009) more than 9% of the participants (10.8% of choosers) were over the age of 60.
Several laboratory studies that compared different age groups directly suggest age-related changes 2 We greatly appreciate the authors for providing us with their age distribution data. in the confidence-accuracy relationship: In a study with children, confidence was not diagnostic of identification accuracy (Keast, Brewer, & Wells, 2007), and in three studies the confidence of older adults was less diagnostic of accuracy than confidence of young adults (Searcy, Bartlett, & Memon, 1999;Searcy, Bartlett, Memon, & Swanson, 2001;Wright & Stroud, 2002). These results provide some evidence that the confidence-accuracy relationship varies by age, at least in laboratory settings.
A recently published online study compared eyewitness identifications of 890 young (18-30 years), 890 middle-aged (31-59) and 890 older adults (50-95 years) in an online experiment (Colloff, Wade, Wixted, & Maylor, 2017). Participants watched a 30-second video depicting a mock crime and were presented with a fair or unfair target-present or a target-absent lineup. Following their decision they rated their confidence. Analyses revealed an age-related decline in participants' ability to discriminate between guilty and innocent suspects, and to adjust their confidence accordingly using signal detection theory modeling. These results seem to suggest that the confidence-accuracy relationship is stable across the lifespan. However, in an online study the risk of a preselection bias is increased as participants are required to access and use a PC. Therefore, it is possible that a higher proportion of older people as opposed to young people might have not been able to participate (i.e., those who are not fit enough or confident to use a computer). By contrast, participants in the present study were citizens who were around a downtown shopping area during the day, and did not need to have access to a PC (as in Palmer, David, &Fleming, 2014, andColloff et al., 2017).
Further, the major focus of the Colloff et al. study was on how best to construct fair lineups by modifying all lineup faces when the suspect has a striking mark on his face (e.g. a scar or a large bruising around the eye), using one single male target. As is evident from the literature on face recognition, adding lines to a face or blocking parts of a face disrupts holistic processing (e.g., Ellis, Davies, & Shepherd, 1978;Hancock, Bruce, & Burton, 2000;Richler, Cheung, & Gauthier, 2011;Richler, Mack, Gauthier, & Palmeri, 2009). It is an open question whether the results obtained by Colloff et al. (2017) regarding age effects would replicate with targets without any highly distinguishing marks and without editing lineup fillers to match this mark.
Taken together, there seems to be ample evidence that identification accuracy may decline at old age but we still do not know enough whether there is a parallel decline in the confidence-accuracy relationship.
Here we present a re-analysis of the confidence-accuracy relationship from adolescence to old age using an incidental encoding task in a field experiment, originally published by Sauerland and Sporer (2009) without any consideration of participants' age. While the original study investigated identification decision processes independently of age separately for choosers and non-choosers, the present study focused on the influence of age on the confidence-accuracy relationship in choosers only. We refer to results from the earlier study only to the extent that they are contrasted with the present analysis.

Metamemorial Deficits of Older Eyewitnesses
Metamemory, defined as "knowledge and awareness about one's own memory and how memory works more generally" (Castel, McGillivray, & Friedman, 2012, p. 246), refers to both prospective and retrospective metacognitive judgments. The present research focused on retrospective cognitive judgments, namely the expressed confidence that an identification judgment was correct. Two reciprocal processes of metamemory have been suggested: (1) metacognitive monitoring and (2) metacognitive control (Nelson & Narens, 1990;Price, Mueller, Wetmore, & Neuschatz, 2014). Metacognitive monitoring evaluates the current state of retrieval and provides information about the learning status. Thus, monitoring consists of subjective perceptions about own introspections or mnemonic strength. Metacognitive control is a type of behavior that is used to optimize performance at a meta-level without direct access to the current state, representing an analytic factor of metacognitive judgments. Both monitoring and control are important for metacognitive judgments, and dissociations between behavior and beliefs may become more marked across the lifespan (Palmer et al., 2014).
Previous research demonstrated that while monitoring of well-learned information remained unaffected by aging, metacognitive monitoring of recently encountered information was more likely to be impaired with increasing age (Dodson, Bawa, & Krueger, 2007;Dodson & Krueger, 2006). For instance, Dodson et al. (2007) showed that older participants recognized statements as well as younger participants but made more source monitoring errors and displayed more high-confidence errors (Experiment 1). In Experiment 2, which used a cued-recall task with word lists also showed more high-confidence errors in older compared to younger adults. Perhaps more relevant to the eyewitness literature, older eyewitnesses also showed more high-confidence suggestibility errors than younger participants after watching a video (Dodson & Krueger, 2006). However, more recent studies using word lists containing related lures suggest that positive, null, and negative confidence-accuracy relationships may be observed as a function of the stimulus material and paradigm . Applied to a lineup task, an innocent suspect strongly resembling the perpetrator may lead to high-confidence false identifications (Roediger & DeSoto, 2015, citing anecdotal case evidence from Buckhout, 1974). Whether this varies as a function of participant age remains an empirical question.
Metacognitive control -which is considered important in an identification task -is impaired in older age (Price et al., 2014). In other words, compared to young adults, older adults are less likely to effectively use available information when making decisions (Hertzog & Hultsch, 2000;Price et al., 2014) and have difficulties retrieving information (Shing, Werkle-Bergner, Li, & Lindenberger, 2009). Cross-sectional metacognitive studies showed that metacognitive efficiency improved during adolescence, reached a plateau during adulthood (age range 11-41 years; Weil et al., 2013), and declined in older adulthood (age range 18-84 years; Palmer et al., 2014). These studies together suggest that, over the lifespan, metacognitive efficiency resembles an inverted U: Adolescents and older adults have a stronger tendency towards overconfidence than young and middle-aged adults. However, most of this research was concerned with learning of word lists or semantic memory, not with face recognition or identification tasks. It has long been known in the memory literature that verbal and visual information, and faces in particular, are encoded and processed differently (e.g. Meissner, Sporer, & Schooler, 2007;Paivio, 1971;Tulving, 1985). Some of these differences may even be more pronounced in older adults (e.g. Jenkins, Myerson, Hale, & Fry, 1999). The present study extended these theoretical notions to the eyewitness identification task by analyzing the confidence-accuracy relationship as a function of age from adolescence to older adulthood.

Decision Time-Accuracy Relationship across the Lifespan
Theoretically, face recognition is considered a holistic process, which occurs rather automatically, even after very short exposures (Richler, Mack, Gauthier, & Palmeri, 2009). Self-reports of witnesses who claim that a face just "popped out" from a lineup and who are more likely to be correct than slow responders (e.g. Sporer, 1992) support this notion. On the other hand, witnesses deliberating and engaging in a relative decision strategy spend more time on taking a decision and are more likely to be wrong (Sauerland & Sporer, 2007). Applied to lineup decisions of choosers, it can be predicted that decision times (response latencies) for hits are faster than for false alarms (Dunning & Stern, 1994;Sporer, 1994). By now, quite a few studies have shown that decision time -a more objective indicator than confidence -is associated with identification accuracy, that is, eyewitnesses made correct identifications faster than incorrect identifications (e.g. Dunning & Perretta, 2002;Sporer, 1992Sporer, , 1993Sporer, , 1994. This is also in line with the dual-process model, which proposes that accurate decisions are made rapidly and automatically, and require no cognitive effort, while inaccurate decisions are time-consuming, conscious, and deliberate (Dunning & Perretta, 2002;Dunning & Stern, 1994). Similar tendencies have been shown for self-reported estimated decision times, namely that correct identifications were estimated by witnesses to be faster than false identifications (Sauerland & Sporer, 2009).
The question arises whether or not measured decision-times as well as self-reported decision times show the same stable relationship with identification accuracy across the lifespan. In other words, do older adults encode and retrieve faces differently and do they use different metacognitive strategies than younger adults? Sporer (2001) has proposed an in-group/out-group model which postulates that in-group faces, for example, of one's own ethnic group ("race"), gender, and age are processed relatively more holistically than faces of out-group members. Regarding age, a recent meta-analysis of face recognition studies comparing younger and older adults has found support for this prediction . Considering that all studies reported above used only young targets in their research, the findings about decision times and their association with identification accuracy may be limited to young witnesses observing young targets.
In the light of the recent demographic developments which entail that more and more witnesses will be older it is an important question whether older witnesses observing young targets show the same pattern. To the extent that older adults do not engage in the type of holistic processing mentioned, or employ different metacognitive strategies than younger adults, this association may be weakened in older age. Some insight into lineup decision processes of different age groups were provided by Colloff et al. (2017, supplemental material) in their additional analyses. Particularly, older adults took longer to make a lineup decision (i.e., to identify or reject a lineup) and to render a confidence judgment than middle-aged and young adults. These analyses, however, were only conducted across choosers and non-choosers. It remains an open question whether these relationships will be stronger when focusing only on choosers who are particularly important in criminal cases.

Aims and Hypotheses
The aim of this study was to assess the confidence-accuracy relationship for different age groups across the lifespan. We predicted that the confidence-accuracy relationship deteriorates with increasing age. More specifically, we expected accuracy and confidence to dissociate across the lifespan, with older participants being relatively more overconfident, leading to poorer calibration (Hypothesis 1). To investigate the postdictive value of confidence as a function of age at different prior base rates of target-presence (Sauerland, Sagana, Sporer, & Wixted, 2018;Wells, Yang, & Smalarz, 2015;Wixted & Wells, 2017), we also conducted Bayesian analyses. While most lab studies employ a base rate of 50% target-present lineups and 50% target-absent lineups, presumably, base rates vary considerably across different legislations and police departments. Importantly, the postdictive value of confidence following a lineup decision varies as a factor of this prior base rate (Wixted & Wells, 2017), with higher target-presence base rates being associated with higher postdictive value compared to lower target-presence base rates (Hypothesis 2). In addition to replicating this finding, we expected the postdictive value of confidence to decrease as age increases. Finally, as proposed with the dual-process model of the decision process, we expected that correct identifications would be made faster and would be estimated to be faster than false identifications. To the extent that older participants rely less on holistic processing or differ in their metacognitive strategies, they may show lower decision time-accuracy associations than younger adults (Hypothesis 3). Considering the age-related decline of metacognitive monitoring and control, we assumed the self-reported decision time-accuracy associations to be similarly or even more impaired in older age (Hypothesis 4).

Method Participants
This study comprised a re-analysis of data collected by Sauerland and Sporer (2009) with 720 citizens (50% female, 50% male) who agreed to participate in the experiment. Four hundred and thirty-six participants made a positive identification decision (called "choosers"), whereas 284 rejected the lineup.

Design
The study was a 2 (target-present vs absent) x 6 (age groups 1 to 6) between-participants design. Dependent measures were identification accuracy and post-decision confidence. We also report some results on decision times (in s) and self-reported decision times.

Procedure
Ten different targets aged between 20 and 37 years were used to fulfil the requirement of stimulus sampling (Wells & Windschitl, 1999). Targets recruited participants in pairs, while one had the role of the target and the other of the research confederate. Each target was instructed to approach male and female participants from different age groups, including old-age people, so targets would be distributed approximately equally across gender and age groups of participants. Chi-square analyses confirmed that targets were equally distributed among age groups (all participants: χ 2 (5, N = 720) = 52.58, p = .204, Cramer's V = .270; choosers: χ 2 (5, n = 436) = 50.31, p = .271, Cramer's V = .34).
Following a predetermined script, the target approached a pedestrian on the street and asked for directions to a certain location nearby. After the target walked away, a research confederate approached the pedestrian and explained the true purpose of the interaction (a face recognition study). Once participants agreed, they were randomly assigned to a target-present or a target-absent photo lineup. Unbiased lineup instructions were used, and the research confederate was blind to target presence. Immediately after the identification task participants provided their confidence on an 11-point rating scale (0% to 100%, at 10% intervals) and estimated the time they spent on making their decision on a paper-and-pencil answer sheet. The confederate inconspicuously stopped the time when the participant entered her or his response to the lineup task.

Lineups
Ten target-present and ten target-absent photo lineups were prepared for the study. Each lineup contained six frontal 9 x 13 cm photographs that were arranged in two rows of three pictures. The target or target replacement was always placed on Position 3. The lineups were constructed following a pilot study with 55 mock witnesses that determined which portrait photographs fit the general target description (Wells, Rydell, & Seelau, 1993). Those who received most selections were used as foils or replacements.

Analyses
The present study used different types of calibration analyses following suggestions to apply alternative analyses, which are more diagnostic of the confidence-accuracy relationship than point-biserial correlations (e.g., Juslin, Olsson, & Winman, 1996;Olsson, 2000;Weber & Brewer, 2003). Calibration curves plot confidence level against the proportion of witnesses who made correct decisions. However, as in the present study sample sizes for the separate age groups were too small for traditional calibration curves, we relied on graphic comparisons of mean accuracy and confidence levels using "confidence-accuracy characteristic" (CAC) analysis (Mickes, 2015), along with 95% confidence intervals, and on point-biserial correlations between confidence and accuracy. To assess curvi-linear relationships as a function of age, we calculated correlations for age and age-squared (age 2 ).
Further statistical calibration analyses were included as follows. Calibration (C) is the extent to which an eyewitness's confidence or certainty judgment (measured in %) corresponds with the probability that the decision is correct (from 0 = perfect calibration to 1 = weak calibration). Over-/underconfidence (O/U) describes participants' tendency to be over-(positive score) or underconfident (negative score) in comparison to their accuracy. The adjusted normalized resolution index (ANRI) indicates the discrimination of confidence judgments (from 0 = no discrimination to 1 = perfect discrimination) and can be interpreted like the effect size η 2 (eta-squared). Hence, ANRI values corresponding to η 2 values of .010, .059, and .138 indicate small, moderate, and large effects (Cohen, 1988). It is important to consider different measures as good calibration does not imply good discrimination (Yaniv, Yates, & Smith, 1991). For a detailed description of the formulae used, see Brewer and Wells (2006), and Yaniv et al. (1991). Finally, the sensitivity measure d' was calculated from hits and false identifications, which describes discriminability between targets and non-targets (Macmillan & Creelman, 2005). To calculate d', the false identification rate was divided by lineup size. Note, that the results for d' were derived from the entire data set, including non-choosers.

Mean accuracy and mean confidence across the age groups
To test Hypothesis 1 that the confidence-accuracy relationship would deteriorate with increasing age, we calculated how confidence and accuracy relate to each other as a function of age group. Overall, 51.6% of choosers made a correct identification, whereas 48.8% made a false identification. The mean confidence was M = 63.07 (SD = 26.47), ranging between 0% and 100% (none was 10% confident). Chi-square analyses revealed that the effect of age group on identification accuracy was non-significant, χ 2 (5, 436) = 10.58, p = .060, Cramer's V = .16. Notably, the performance measure d' showed a strong increase in discriminability performance from adolescence (d' = 1.48) to young adulthood (d' = 1.91), followed by a steady decrease until older adulthood (d' = 1.35) as shown in Table 1. Correlational analyses showed that while age and confidence were not related with each other, r(434) = -.01, p = .883; age 2 : r(434) = -.02, p = .748, the relationship between age and identification accuracy was curvilinear, r pb (434)= -.09, p = .071; age 2 : r pb (434) = -.10, p = .039, showing an increase in identification accuracy from adolescence to young adulthood, followed by a decrease until older adulthood. Figure 1 displays the mean accuracies in the six age groups and the associated levels of confidence. Mean confidence varied little across the six age groups while identification accuracy showed a small increase followed by a decrease with increasing age. Across all age groups, confidence was significantly associated with identification accuracy, χ 2 (2, N = 436) = 49.96, p < .001, Cramer's V = .33. However, the relationships were not consistent across age groups. With increasing age, confidence was less indicative of identification accuracy: The older the participants, the smaller the difference in confidence judgments between accurate and inaccurate responses. Figure 2 shows the regression lines of correct and incorrect responses and their associated 95% confidence bands, which crossed at the age of 55 years, indicating that adults were less likely to distinguish correct and incorrect identifications at the age of 55 years or more.

Confidence-accuracy characteristic analysis for each age group
To further explore the confidence-accuracy relationship across the lifespan (Hypothesis 1), we conducted a CAC analysis, a graphic presentation which resembles the calibration curve. Figure 3 displays the mean identification accuracies for each age group with their respective 95% CIs at the three levels of confidence (low, medium, high). Values along the diagonal indicate well-calibrated witnesses, values above the lineunderconfidence, and values below the diagonal -overconfidence3. As shown in Figure 3, high confidence was diagnostic of accuracy for only the two youngest age groups. Although low confidence was predictive of poor identification accuracy across all groups, with increasing age medium and high confidence became less indicative of identification accuracy. In particular, high confidence was diagnostic of accuracy in the 21-30 years age group only, and to some extent in the 15-20 years age group. Figure 3. Confidence-accuracy characteristic analysis for choosers as a function of age group for low (0-50%), medium (60-80%), and highly (90-100%) confident witnesses. Error bars are 95% confidence intervals. Labels on the X-axis are placed under the means of the respective confidence range.

Calibration, over-/underconfidence and the discrimination of confidence judgments
To test Hypothesis 1 statistically, in addition to the graphic representation, is to assess the statistics Calibration, Over-/underconfidence, and Adjusted normalized resolution index (Brewer & Wells, 2006). These indices were calculated by applying a bootstrap method (Efron, 1981;Efron & Gong, 1983) with n = 1000 replications to correct for a potential bias and to assess standard errors (SE) for the calculation of inferential confidence intervals (ICI; Tryon, 2001). The bootstrap analyses were conducted for each age group separately to calculate separate SE for them (Efron & Gong, 1983). The reduction factor E, defined as "the ratio of the SE of the difference between the two groups to the sum of the standard errors of both groups" (Tryon, 2001, p. 375), was averaged across all pairwise comparisons between the age groups. Non-overlapping ICIs showed significant differences between the age groups at the α = .05 level. A similar approach was applied by Palmer, Brewer, and Weber (2010) who tested the confidence-accuracy relationship of witnesses following post-identification feedback.
Results for calibration, over-/underconfidence and the ANRI, and their respective 5% ICIs are displayed in Table 1. These indices support the aforementioned findings: Although all age groups were overconfident for low and medium confidence (see Figure 2), calibration analyses revealed differences between the groups.
Young adults (21-30 years old) were well calibrated (C = .01, 5% ICI [.00, .03]); they showed, on average, very little overconfidence (O/U = .07, 5% ICI [.02, .12]), and their discrimination was above the average of the entire sample (ANRI = .23, 5% ICI [.15, .32]). By contrast, with increasing age, average confidence-accuracy correlations, calibration and discrimination decreased whereas overconfidence increased. In fact, adults older than 60 years had significantly weaker calibration and significantly higher overconfidence (C = . Interestingly, all age groups older than 31 years showed weak to no discrimination as indicated by the inferential confidence intervals that included zero (see Table 1). Furthermore, discrimination of young adults aged from 21 to 30 years was significantly higher than the discrimination of 31 to 40 year old and 51 to

Base Rates of Target-Presence
Next, we addressed the possible concern that the moderation of the confidence-accuracy relationship by age may hold only for certain base rates or prior probabilities that the suspect is the perpetrator (Hypothesis 2). Figure 4 maps the probability that a suspect identification was accurate (i.e., that the suspect is the perpetrator) across all possible base rate values from 0% (none of the lineups included a guilty suspect) to 100% (all lineups included a guilty suspect; see Wixted & Wells, 2017). One curve was created for each age group. The identity line shows where the data would fall if an identification was non-diagnostic.
The Figure 4 shows that all curves are above the identity line, demonstrating that identifications were diagnostic of guilt. Generally, as expected, the heights of the curves for confident and younger participants are far above those of the curves for less confident or older participants. Strikingly, less confident decisions made by 21-30 year olds were still more diagnostic of guilt than highly confident decisions made by 51-83 year olds. In all, the Figure suggests that the moderating effect of age on the confidence-accuracy relationship prevails across different prior base rates of target-presence.

Observed and Self-reported Decision Time and Accuracy across the Lifespan
Observed decision times and self-estimated decision times showed positive skewness. Followingly, inferential analyses were conducted with log-transformed data (log-base 10) to approximate normal distributions. Means, SDs, and 95% CIs are reported for back-transformed values.

Combined Postdictive Value of Age, Confidence and Decision Time
Logistic regression analyses were conducted to combine the postdictors age, observed decision time, selfreported decision time, and postdecision confidence. The aim was to assess their respective associations with identification accuracy while controlling for the other postdictors. Adolescents were excluded from these analyses to control for curvi-linearity, which is difficult to detect in regression analyses.
First, a logistic regression analysis was conducted with the postdictors age, observed decision time, and accuracy. The full model was statistically significant, χ 2 (3, 392) = 64.80, p < .001, and explained 15.2% (Cox & Snell R2) and 20.3% (Nagelkerke R2) of the variance, respectively. Participants' age and decision time were negatively associated with identification accuracy while confidence was positively associated with it, as shown in Table 2a. The postdictive value was the strongest for decision time while controlling for age: Participants were four times more likely to correctly identify a person when the identification was fast (OR = 0.26, 95% CI [0.15, 0.46]; reciprocal OR = 3.85, 95% CI [2.16, 6.85], p < .001). Similarly, participants with higher confidence were almost twice as likely to correctly identify a person than those with lower confidence (OR = 1.93, 95% CI [1.46, 2.54], p < .001). Age remained a significant postdictor of identification accuracy showing a small but significant effect: The older the participants, the less likely they were to correctly identify the target person (OR = 0.98, 95% CI [0.97, 0.99], p = .018).  Table 2b. Notably, when observed and self-reported decision times were combined in the same logistic regression model analyses, self-reported decision time did not contribute to the improvement of the model in Table 2a, that is, it did not explain any additional variance in the model (adding 0.0% to the Cox and Snell R 2 , and 0.0% to Nagelke R 2 , respectively).

Discussion
For more than three decades, studies have assessed the postdictive value of eyewitness confidence following a suspect identification. While earlier meta-analyses concluded that confidence is an unreliable indicator of identification accuracy (Bothwell et al., 1987), a focus on choosers led to more optimistic conclusions (Sporer et al., 1995). More recent re-analyses of previously published data further emphasized the postdictive value of confidence, particularly for high confidence levels and under "pristine" identification conditions (Wixted et al., 2015;Wixted & Wells, 2017). These analyses, however, did not consider some potentially important moderators like race of target and witness (Wright, Boyd, & Tredoux, 2001), or old age of a witness. The present study aimed to fill this gap by analyzing the confidence-accuracy relationship as a function of witness age.
Our re-analyses of a large-scale field study (Sauerland & Sporer, 2009) revealed that age was a potentially important moderator of the confidence-accuracy relationship. As expected, the confidenceaccuracy relationship was strong for younger participants (15-20 years) and young adults (21-30 years) and declined as age increased, confirming Hypothesis 1. This difference prevailed across different prior base rates of target-presence (Hypothesis 2), although the differences between age groups were strongest for small to moderate prior base rates of target-presence. Further, correct identifications were observed to be faster regardless of participants' age. However, the relationship between self-reported decision time and accuracy varied by age, showing little discrimination between correct and false identifications in older age (Hypothesis 3).
Similar to the studies re-analyzed in Wixted et al. (2015), the present study was an experimentally controlled field study. The identification instructions were fair (i.e. the participants were informed that the person might or might not be present in the lineup, and the interviewer was blind to target-presence and position in the lineup). Thus, our study fulfilled what Wixted and Wells (2017) described as pristine conditions. Moreover, participants did not know that they were part of a study until after the target walked away (unlike e.g. in Palmer et al., 2013). The latter factor is important for the legal system, thus enhances the ecological validity of our findings as in many criminal cases eyewitnesses do not anticipate that they will have to identify the person while they witness an event unless it is obvious from the beginning that they witness a crime. In support of this point, prior research showed that identification performance was poorer under incidental rather than intentional encoding conditions (Read, Lindsay, & Nicholls, 1998;Sporer, 1991). Figure 3 shows that only the youngest two age groups were reasonably well calibrated. Particularly the results of young adults (21-30 years old) are in line with the findings by Wixted et al. (2015), namely that high confidence is indicative of identification accuracy (93.2% correct identification at high confidence level). This age group performed well and above the average of the entire sample on measures of calibration, over-/ underconfidence, and the discrimination indices ANRI and the SDT performance measure d' (see Table  1). By contrast, with increasing age, confidence-accuracy correlations and calibration decreased whereas overconfidence increased. Moreover, ANRI discrimination was nearly zero for participants who were older than 30 years suggesting that the age-related decline may begin much earlier than at old age. Discrimination between targets and non-targets (d') decreased with increasing age as has been shown across many studies (see the recent meta-analysis by . Taken together, age had a robust effect on every measure of identification performance while confidence remained unchanged across the lifespan. The analyses in the present study differed from previous studies in the following aspects. Instead of averaging across large data sets we separated the age groups and showed a curvi-linear relationship between age and the postdictive value of the confidence-accuracy relationship. Further, the present study focused on choosers -those who would end up testifying in court. Although Key et al. (2015) analyzed the confidence-accuracy relationship between young, middle-aged and older adults, their results were not directly comparable to ours because the point-biserial correlations they reported were not separately calculated for choosers and non-choosers. As already pointed out in the introduction, the study by Colloff et al. (2017) differed in many ways from the current work. In particular the online methodology may have led to a self-selection bias in the older participant groups. Further, many analyses were conducted across choosers and non-choosers in Colloff et al. (2017) thus not making them directly comparable.
Decision time has often been found to be negatively associated with confidence and identification accuracy among choosers (e.g., Sporer, 1992;Sauerland & Sporer, 2007. As postulated by the dual-process model, correct decisions are automatic and require little cognitive effort whereas incorrect decisions follow a conscious and time-consuming process (Dunning & Perretta, 2002;Dunning & Stern, 1994). The present findings confirmed that shorter decision times were associated with correct identifications, regardless of age. Moreover, the objectively measured decision time was a much stronger postdictor of identification accuracy (reciprocal OR = 3.85) than confidence (OR = 1.93). By contrast, self-reported decision time did not improve the postdictive value of observed decision time. It was, however, a reliable but less strong postdictor (reciprocal OR = 1.63) when it was substituted for measured decision time. In real cases, where decision times may not have been measured, self-reported decision time may be informative (but perhaps not with older witnesses over 50).
This further supports our theoretical assumptions that metacognitive processes vary by age. These findings demonstrate the importance of recording the entire identification procedure on video to not only capture the expressed confidence but also to be able to assess decision-time at the time of the original identification (Sporer, 1992(Sporer, , 1994. Confidence and self-reported decision times assessed retrospectively at later interviews or even in court may not only be affected by metacognitive deficits as a function of delay but, more troublesome, distorted by feedback effects (Steblay, Wells, & Douglass, 2014).

Limitations and Implications
The present study faced the following limitations. The participants in this study were not witnesses of a crime; instead, they were instructed to identify a person who approached them to ask for directions. Hence, the participants did not experience the level of stress that people might experience when they witness a crime. Nevertheless, the present findings are comparable to many real-life criminal cases, as the participants did not know that the person who approached them was a target until after the person walked away. Witnesses of crimes frequently do not anticipate that they will have to identify a person. Further, the scenario in the present study resembles a police call-out for people who saw or spoke to a person at a particular time and place, with the difference of a much shorter retention interval than in most criminal cases.
A further limitation of our study was the relatively small sample size per age group. Calibration curves require at least 200 participants or multiple judgments by each participant per condition to produce reliable results (Weber & Brewer, 2003). However, averaging across large data sets can obscure the results when moderators are not considered. To overcome this problem and approximate the calibration curve, we plotted the relationship between confidence and accuracy at fewer levels of confidence (low, medium, high). In addition, we applied different types of statistical analyses (as proposed Brewer & Wells, 2006). To obtain more reliable estimates of inferential confidence intervals using bootstrap analyses were conducted. This allowed the examination of the confidence-accuracy relationship from different perspectives. Our results (particularly the youngest age group) demonstrated the necessity of applying different types of analyses that yielded different coefficients that are not redundant. The calibration graph of adolescents showed robust calibration and a discrimination index that was the highest of all age groups. Nevertheless, their identification accuracy was as low (see Figure 3 above) and their overconfidence score was as high as that of people aged 61 years and over, attributable to the high confidence level despite low identification accuracy.
Further research is needed to assess the postdictive value of confidence of other samples of older people as well as of children, teenagers, or vulnerable witnesses. In addition, procedures should be explored that can improve the metacognitive processes at retrieval. For instance, Thomas, Gordon, and Bulevich (2014) suggested that metacognitive processes could be improved with supportive instructions that guide conscious processing of retrieved information, which results in more stringent metamemorial processes.
Finally, the targets in the present study were young to middle-aged adults. Previous research repeatedly demonstrated a robust effect of an own-age bias in eyewitness identification and face recognition (Kocab et al., 2018;Rhodes & Anastasi, 2012). Thus it is possible that the age effect decreased if older targets were likewise included. Future research should assess whether this effect holds for older targets.

Conclusion
The results demonstrated that objective decision time had a greater postdictive value of identification accuracy than confidence and that the confidence-accuracy relationship depended on witness age. Whereas the high confidence expressed by young adults was indeed a reliable indicator of identification accuracy, the increased overconfidence of adolescents, middle-aged and older people, implies that we should take age and perhaps also the difference in age between the target and witness into account when postdicting accuracy from confidence.
Using age as a moderator, this study showed that factors which negatively influence identification accuracy do not necessarily affect post-decision confidence in the same way, leading to a decrease in the confidence-accuracy relationship. Thus, when evaluating eyewitness identification and post-decision confidence, decision-makers should be aware of factors that moderate identification accuracy, such as the cross-race or the old-age effect. Other factors might yield similar patterns and should be explored further. Finally, to better assess witness responses and how their postdictive value is evaluated by factfinders (Kaminski & Sporer, 2017), a lineup decision should be videotaped, which also provides information on decision times (response latencies), as suggested by Sporer (1992;Sporer et al., 1995) and many other researchers thereafter (e.g., Wixted & Wells, 2017). If, however, objective decision time is not available, the self-reported decision time together with confidence can assist in the evaluation of the likelihood of a correct identification, at least for young adults.
Ethics statement: This research is a re-analysis of previously published data that did not require ethics approval.

Conflict of Interest:
The authors do not declare any conflict of interest.
Financial Support: The authors did not any funding for this study.