Effectiveness of immediate vs. delayed recall in detecting invalid performance in coached and uncoached simulators: Results of two experimental studies

Objective: Two experimental studies were conducted to compare the ability of immediate and delayed recall indicators to discriminate between performances of simulators and full-effort clinical and nonclinical participants. Methods: Three groups of simulators (uncoached, symptom-coached, and testcoached), one group of community controls, and one group of cognitively impaired patients were assessed with four experimental memory tests, in which the immediate and delayed recall tasks were separated by three other tasks. Results: Across both studies, delayed recall demonstrated higher accuracy than immediate recall in classifying simulated performances as invalid, as compared to performances of bona fide clinical participants. ROC curve results showed sensitivities below 50% for both indicators at specificities of ≥ 90%. Computing performance curves across recall trials revealed descending trends for all three simulator groups indicating a suppressed learning effect as a marker of noncredible performances. Among types of coaching, test-coaching proved to decrease differences between simulators and patients. Discussion: The effectiveness of such indicators in clinical evaluations and their vulnerability to information about test-taking strategies are discussed.


Introduction
Decades of research show that standard performance validity tests (PVTs) assessing short-time memory (e.g., recall, recognition) are especially effective in detecting invalid performances (Bigler, 2014;Larrabee, 2003). A perspective on detecting noncredible performance is offered by looking into failures on specific tasks (e.g., recognition vs. recall), which would confirm a diagnosis of invalid performance. For instance, failure in forced-choice tasks is known to indicate invalid performance even in impaired samples (Bigler, 2014;Larrabee, 2003) and worse performance in recognition than in recall indicates noncredible responding (Greiffenstein et al., 1996;Suhr & Gunstad, 2000). Also, forced-choice testing has been deemed the best method to discriminate invalid responses from genuine performances by numerous empirical studies (Denning, 2012;Gunner et al., 2012;Inman & Berry, 2002;Strauss et al., 2002), reviews (Leighton et al., 2014), and metaanalyses Sollman & Berry, 2011). Despite their effectiveness, forced-choice measures are not infallible, as some tests are vulnerable to coaching (i.e., coached simulators producing above-chance performances on recognition tests), and online information is available about them (Rüsseler et al., 2008;Strauss et al., 2002). In addition, there are a few studies that support the idea that indicators based on recall might also be effective in detecting noncredible performance, yet not so identifiable by feigners (Strauss et al., 2002;Suhr & Gunstad, 2000). Hence, we designed two experimental studies to test the effectiveness of recall tasks in discriminating simulated from genuine cognitive impairment.

Types of recall: Immediate vs. delayed recall
Results of a recent meta-analysis on types of detection strategies moderated by types of stimuli and coaching revealed instruments relying on recognition to be most accurate in classifying invalid performances across experimental and criterion-group designs . Concerning recall, both stand-alone and embedded indicators were found to generate modest effects in simulation designs compared with known-groups studies (Cohen's d = .66 vs. d= 1.05 for standalone indices, d = .65 vs. d = 1.21 for embedded indices). Another interesting finding for embedded measures based on recall was that test-coaching reduced differences between simulators and clinical participants, as opposed to methods relying on other strategies, in which case symptom-coaching was superior in reducing differences between means.
In their review on methodological characteristics of PVTs, Leighton and colleagues (2014) argued that, from the multitude of methodological moderators that require more investigation (e.g., types of stimuli, number of learning trials, trials characteristics), the influence of learning trials on performance in tasks eliciting recall has received far less attention than recognition. The authors also noted a need for more research concerning delayed recall performances in noncredible groups.
Regarding learning characteristics of PVT items, although there is evidence suggesting that one single exposure of test material (i.e., learning trial) may be enough to ensure recognition (Gunner et al., 2012;Denning, 2012), other findings prove that multiple encoding in recall sessions further facilitates recognition and long-term retention in cognitively impaired 33 samples -or what is known as the test-effect (Leighton et al., 2014). The suppression of this effect would therefore be an indicator of invalid performance. When multiple retention trials are used, the learning effect may be displayed as a performance curve, showing ascending trends in the case of full-effort participants and descending trends for noncredible respondents (Bender & Rogers, 2004;Suhr & Gunstad, 2000;Rose, Hall & Szalda-Petree, 1998;Wogar et al., 1998).

Types of coaching: symptom-coaching vs. test-coaching
Currently, it is a well-known fact that coaching has a moderating effect on simulated performance (Bender & Rogers, 2004;Gorny & Merten, 2006;Suhr & Gunstad, 2000;Brennan et al., 2009). Still, the influence of types of coaching on the classification accuracy of assessment measures remains an ongoing issue in research, yielding some controversies. On the one hand, numerous empirical studies have found either test-coaching alone (i.e., instructing participants about detection strategies used by tests; DiCarlo et al., 2000;Bender & Rogers, 2004;Powell et al., 2004;Weinborn et al., 2012) or a mix of test-coaching and symptom-coaching (Rose, Hall & Szalda-Petree, 1998;Rüsseler et al.;2008;Lau et al., 2017) to be more effective than symptom-coaching alone (i.e., supplying information about symptoms of the condition to be feigned) in reducing differences between scores of simulators and bona fide patients. On the other hand, two meta-analyses on validity indicators revealed symptom-coaching to be superior to test-coaching in reducing differences between groups Sollman & Berry, 2011). Therefore, more research on the differences between types of coaching is needed. In addition, as most studies used PVTs with good face validity and high classification accuracies (e.g., standard forced-choice tests), we propose investigating other experimental indicators' ability to classify performances moderated by coaching.
To conclude, we set the following research objectives for the present studies: (1) To investigate the accuracy of immediate vs. delayed recall indicators in detecting noncredible performance in simulators compared with clinical patients and community controls. In this regard, we hypothesized that delayed recall would be superior to immediate recall.
(2) To compare performances of uncoached, symptomcoached, and test-coached simulators with performances of full-effort patients on the delayed recall task. We hypothesized that test-coached participants would show scores closer to clinical patients than the other two groups.

Participants
The general sample was composed of 190 participants. Experimental participants were 90 psychology undergraduates (27 males and 63 females) who volunteered to take part in the study and received course credits for their involvement. Participants were randomized into three groups of simulators (uncoached N = 23, symptom-coached N = 22, and test-coached N = 22) and one full-effort group (N = 23). The undergraduate full-effort group was aggregated with 30 community volunteers (15 males and 15 females), recruited from the acquaintances of the researchers, to form the nonclinical control group, with no reported history of mental illness or cognitive dysfunction and no current involvement in lawsuits. Clinical patients were 70 neurological outpatients (38 males and 32 females) with cognitive impairment of heterogenous etiologies: traumatic brain injury (TBI) (N = 10); cerebrovascular accident (CVA) (N = 25); dementia of various etiologies (N = 35). Patients were included in the study if they had intact perceptual functions and reading and writing abilities. Two female CVA patients had to be excluded because of severe dysgraphia, leaving 23 patients in this group. No patient was involved in litigation at the time of the assessment, and none expressed interest regarding external benefits. All clinical participants were treated at an outpatient clinic specialized in cognitive and motor dysfunctions and were assessed with the Mini-Mental State Examination (MMSE; Folstein et al., 1999). The minimum score for inclusion was 15. Scores in our sample ranged between 18 and 29, with an average of 25.26 ± 1.87. The only significant difference between the three clinical groups was related to the patients' age (F = 15.470, p = .001), with TBI patients being younger than dementia and CVA patients.
In the analysis, groups of simulators, full-effort nonclinical, and genuine clinical controls were aggregated into three groups that showed significant differences in age, gender, and education between them (see table below).

Procedure
The experimental procedure was described in full in a different article (see . After being recruited and randomized into groups, experimental participants received via email the simulation instructions, adapted from previous studies (Brennan & Gouvier, 2006;Rüsseler et al., 2008; see table 1 of the appendix). The participants in the full-effort group were asked to react to tasks putting in their best effort. All participants read and signed an informed consent form with information about the study and indications of not disclosing experimental instructions. An extra incentive was provided: They were told that an unspecified sum of money would be awarded to the most credible simulated performance or the best performance in the case of the full-effort group. After collecting the data, the equivalent of $25 was given to one random participant from each group.
Participants in each experimental group were assessed individually by a licensed clinical psychologist who was unaware of the feigning conditions. After completing the test, all simulators had to complete a post-test questionnaire with manipulation checks and items referring to malingered performance and employed strategies. One male participant from the uncoached group was found uncompliant with experimental instructions and had to be excluded from the study, leaving a total sample of 89 experimental participants (22 in each simulator group and 23 in the full-effort group).
All control and clinical participants were assessed by a licensed psychologist and were asked to put their best effort into their test performance. They were not monetarily rewarded.

Assessment instruments
All participants were individually assessed using a battery with five memory tasks, of which the present paper concerns only indicators used to assess immediate and delayed recall performance. The analysis of the other indicators was presented in a different paper .
First, 12 pictures of common objects were shown to the participant whose task was to name and memorize each picture. After being presented with all 12 objects, the participant was asked to recall all the memorized objects in any order. For the second trial, the procedure was repeated with identical instructions. A mean of correctly recalled items across both trials was computed as an immediate recall indicator.
The next tasks consisted of two forced-choice trials where participants had to choose each of the 12 memorized items from pairs with similar foils, and a word completion task where participants had to complete word stems first by including the 12 items, then by excluding them. Next, the BVRT -Benton Visual Retention Test, set A, was used as a distractor (i.e., a task with Detecting invalid memory performance 33 34 different stimuli, inserted before the delayed recall phase to divert the participant's attention from the original set of 12 items and provide the timeframe for the delay).
Finally, the participant had to recall the 12 items presented in the first recall phase. The total of correctly recalled items represented the delayed recall indicator.
Measures for internal consistency were computed for each type of stand-alone indicator: Cronbach's alpha was .894 for the immediate and delayed recall tasks, .913 for recognition tasks, and .893 for the process dissociation indicator.

Data analysis
Overall data were analyzed using IBM SPSS Statistics 25. Descriptive statistics for the individual and aggregated groups on the recall indicators are displayed in Table 2.
Manipulation checks -differences between groups As our participants came from three different populations, one-way ANCOVAs were conducted between scores of the three aggregated groups on the immediate and delayed recall indicators whilst adjusting for age, gender, and education, at a 99% confidence interval. We found significant differences between the three groups on the delayed recall indicator [F (2, 181) = 53.448, p = .001, Eta² = .371], and on the immediate recall indicator [F (2, 181) = 38.547, p = .001, Eta² = .299]. LSD post hoc tests showed significant differences (p = .001) between simulators and controls and between controls and patients on both indicators, but significant differences between controls and patients were only found in delayed recall performance (p = .004). Comparing the estimated marginal means further showed that the simulator group produced significantly lower scores on this indicator than the control group and the neurological patient group. The immediate recall indicator failed to produce  197 .912 significant differences between simulators and patients at a 99% confidence interval. The observed statistical power of 1.000 for both indicators would allow us to conclude that the study was well designed for the examination of our hypotheses.

Immediate vs. delayed recall
Next, we wanted to determine the ability of immediate vs. delayed recall indicators to discriminate between scores of simulators and patients at cutoffs with specificities of ≥ 90%. Results are shown above (Table 3).
Results showed marked differences between the two contrast categories concerning the classification accuracy of the two indicators. Both immediate and delayed recall demonstrated high to excellent abilities to discriminate between simulators and non-clinical controls, classifying noncredible performances with 66.7% sensitivities at cutoffs ≤ 8.00. However, in the simulator vs. patient contrast, only the delayed recall indicator demonstrated a significant AUC value and a moderate effect size, thus confirming our first hypothesis.
The indicator for immediate recall failed to significantly discriminate between simulators and patients (as previously indicated by the results of the ANCOVA). Still, both indicators' failure to produce acceptable sensitivities in simulators vs. patients limits their accuracy in this type of contrast.

Differences between simulators: Performance curves of recall
To assess the influence of learning trials on performance in recall and to see whether the suppression of the learning effect was characteristic of simulators' performances, we conducted independent samples t-test comparisons between the means of correctly recalled items from the first two immediate recall trials and the correctly recalled items from the delayed recall phase. At this stage, we took each experimental condition of feigning separately for comparison. Full-effort controls and neurological patients were again considered as aggregated groups. Results are shown in table 4. We obtained significant differences between performances of each feigning group contrasted with full-effort controls, as demonstrated by very large effect sizes (Cohen's d > 1.5) generated by both indicators of recall. There were no significant differences between simulators and patients in terms of immediate recall performance. However, significant differences (p < .05) were found in delayed recall performance, with moderate effect sizes irrespective of the feigning condition. The smallest effect for delayed recall was observed between test-coached simulators and patients (Cohen's d = .53) and the largest difference was obtained for uncoached simulators vs. patients (Cohen's d = .819). We then computed individual differences between group means, reflecting the change in recall performance across the entire test battery.
Observing the individual differences between group means, we noted that full-effort participants, either healthy or impaired, demonstrated an increase in cognitive performance across recall trials. In other words, the positive difference between the score of the delayed recall phase and the first two recall trials showed that full-effort respondents retained significantly more words across tasks, irrespective of the distraction provided by the BVRT, thus proving a learning effect. Interestingly, this increment in performance seemed to be maintained in neurological patients despite their implicit cognitive impairment (0.99), closely matching the improvement in performance of healthy controls (1.16). On the other hand, all groups of simulators demonstrated a negative difference between recall scores, attesting a suppressed learning effect across test trials, thereby supporting a diagnosis of invalid performance. Of note, the highest decrease in performance was scored by the uninstructed (-0.86) and symptom-coached groups (-0.81), while test-coached participants showed a moderate difference (-0.54). These results supported our second hypothesis, indicating that test-coaching decreased the differences between delayed recall performances of simulators and clinical patients. Comparing the performance curves that accounted for the presence or absence of learning effects across groups showed descending trends for all three simulator groups and ascending trends for both full-effort groups, regardless of their clinical status.

Discussion
Results of the first study confirmed both of our hypotheses: delayed recall was found more effective than immediate recall in discriminating between simulators and patients, although both of these indicators' classification accuracies failed to reach acceptable sensitivities at ≥ 90% specificities. Comparing performances across recall trials in simulators and full-effort participants showed the suppression of the learning effect in all three simulator groups, displayed as a descending performance curve which is a marker for invalid performance (Bender & Rogers, 2004;Wogar et al., 1998). Consistent with our second hypothesis, test-coached simulators demonstrated the smallest decrement in recall performance which supports the impact of test-coaching on simulated performance in the sense of increasing its credibility. A second study was designed to verify our findings.

Participants
A total sample of 108 participants was used. Experimental participants were 48 psychology undergraduates (17 males and 31 females), randomly allocated to three groups of simulators of 16 participants each (uncoached, symptomcoached, and test-coached). All volunteered to take part in the study and received course credits for their participation. Control participants (12 males and 18 females) included ten undergraduate participants and 20 community volunteers recruited from the social networks of the researchers, with no reported history of mental conditions or cognitive impairment, and no present involvement in any type of litigation.
The clinical group was composed of 30 outpatients (15 males and 15 females), with psychiatric diagnoses like major depressive disorder (MDD) (N = 2), panic disorder (N = 2), chronic alcoholism (N = 1), and delusional disorder (N = 1); and neurological diagnoses, such as polyneuropathy (N = 2), traumatic brain injury (TBI) (N = 2), cerebrovascular accident (CVA) (N = 7), Alzheimer's (N = 5) and Parkinson's dementia (N = 8). All patients had intact perceptual functions and reading and writing abilities. All were psychiatrically and physically treated at a rehabilitation clinic specialized in cognitive impairment. They were tested as part of standard neuropsychological assessment, which included the Mini-Mental State Examination (MMSE), with the minimum score for inclusion of ≥ 15. Scores ranged between 21 and 29, with an average of 25.80 ± 2.64. Like in the first study, the only difference was related to the patients' age (F = 13.680, p = .001), with dementia and CVA patients being older than the other participants.
The demographic characteristics of the individual and aggregated groups are displayed in the table below. Significant differences in age and education were observed between the three aggregated groups, simulators being younger than the other two groups, and patients having a lower average of education.

Procedure
The same protocol as in the first study was used, but the assessment procedure was adapted for online testing (see Assessment instruments below).

Assessment instruments
The same memory tasks as in the first study were administered, but instead of the BVRT, we used a distractor task consisting of recalling and recognizing two rows of digits, because it was considered more suitable for online assessment and its duration was similar to the BVRT in the first study (i.e., approximately 5 minutes). Similar to the BVRT, participants' scores on these tasks were not included in the analysis. Measures for internal consistency were again computed for each type of stand-alone indicator and were found to closely match our first study: Cronbach's alpha was .890 for the immediate and delayed recall tasks, .897 for recognition tasks, and .886 for the three process dissociation indicators.

Data analysis
Overall data were analyzed using IBM SPSS Statistics 25. Descriptive statistics for the individual and aggregated groups on the recall indicators are displayed in Table 5.

Manipulation checks -differences between groups
To test differences between participants, one-way ANCOVAs were conducted between scores of the three aggregated groups on the performance validity indicators whilst adjusting for age, gender, and education, at a 99% confidence interval. Results yielded significant differences for both the immediate recall indicators [F (2,102) = 46.904, p = .001, Eta² = .479] and the delayed recall indicator [F (2, 102) = 48.517, p = .001, Eta² = .488]. LSD post-hoc tests largely confirmed our initial findings: both indicators yielded significant differences between simulators and controls (p = .001) and between controls and patients (p = .001), but while immediate recall failed to discriminate between simulators and patients at acceptable probabilities (p = .039), the delayed recall indicator produced significant differences between these groups (p = .001). In addition to our first study, no significant differences were observed between delayed recall performances of community vs. clinical controls (p = .068). Comparing the estimated marginal means further showed that the simulators produced significantly lower scores on these indicators than controls and neurological patients. Moreover, Again, the observed statistical power was 1.000 for both indicators.

Immediate vs. delayed recall
Results of ROC curve analyses matched our previous study, confirming our first hypothesis: while both indicators demonstrated high discrimination ability between simulators and community controls and very large effect sizes between group means, only the delayed recall indicator generated an AUC value in the fair range that discriminated simulators from patients. Still, in this contrast, both indicators failed to produce acceptable sensitivities at specificities of ≥ 90%

Differences between simulators: Performance curves of recall
Independent samples t-test comparisons between scores of immediate recall and delayed recall across types of contrasts yielded no significant differences between simulators and patients in immediate recall performance. Congruent with initial findings, moderate to large effects for the delayed recall indicator showed noncredible performance in the case of all three simulator groups as compared to genuine patients.
Next, differences between performances in immediate vs. delayed recall were computed across contrasts between the three simulator groups vs. controls and patients, to generate curves of recall performance. The results matched our previous findings: while both full-effort groups (controls and patients) showed an increase in recall performance across the test (of approximately 1 point) and ascending curves, all three groups of simulators displayed a decrease in performance (indicating a suppressed learning effect) with descending curves. In contrast to the first study, symptom-coached feigners produced the largest difference (-1.31), and consistent with initial findings, the smallest difference was observed for test-coached simulators (-0.46). Therefore, across both studies, participants receiving coaching about test-taking strategies demonstrated performances that were closer to genuine patients, and an effect of testcoaching on making performance more credible could be inferred.

General Discussion
The present studies compared the effectiveness of immediate vs. delayed recall indicators in discriminating between performances of simulators vs. full-effort clinical and non-clinical comparison groups. We used groups of uncoached, symptom-coached, and test-coached simulators to explore the impact of coaching on recall performance. We hypothesized that (1) delayed recall would be superior to immediate recall at detecting invalid performance in the general simulator sample and (2) test-coached simulators would display smaller differences in performance than the other two groups, as compared to genuine patients.
The results of both studies confirmed our first hypothesis. Both indicators yielded significant differences between simulators and non-clinical controls, whilst controlling for age, gender, and education, but delayed recall was more accurate than immediate recall in distinguishing between simulators and patients. As results on this type of contrast are more salient to clinical settings (Vickery et al., 2001), using indicators based on delayed rather than immediate recall would be more appropriate for discriminating noncredible from impaired performance in the assessment of clinical participants . Still, ROC curve analyses showed marked differences in the classification accuracy of both indicators across types of contrasts. While in simulators vs. controls, both immediate and delayed recall showed high to excellent AUC values of similar ranges, their accuracy decreased in simulators vs. patients, and sensitivities for both indicators failed to reach the "Larrabee limit" at cutoffs of ≤ 4 (i.e., ≥ 50% sensitivity at ≥ 90% specificity). In both studies, modest AUC values that were statistically not significant for the immediate recall indicator showed that it was less reliable than delayed recall in distinguishing invalid from genuinely dysfunctional responses (see Suhr & Gunstad, 2000;Strauss et al., 2002). However, its limited effectiveness in clinical comparisons imposes caution on interpreting failures in delayed recall as a single indicator of invalid performance, recommending the association with other types of indicators for a more rigorous assessment. Of note, although cutoffs set at ≤ 8 for both indicators discriminated between simulators and non-clinical controls with sensitivities between 66% and 83%, reaching up to 100% specificity in the second study, they had to be significantly lowered to achieve acceptable specificities (≥ 90%) in the simulator vs. patient contrast. This finding was expectable given the presence of cognitive impairment in the clinical group. At this point, we stress the importance of setting differential cut scores for indicators according to the level of impairment of the full-effort comparison groups. Therefore, besides including a community control group, the presence of a clinical group performing with full-effort is mandatory, as the scores of these patients set a threshold for real impairment below which invalid performance might be suspected (Bender & Rogers, 2004;DiCarlo et al. 2000;Kanser et al., 2018). Therefore, in comparisons with impaired populations, cutoffs must be lowered to keep false positives to a minimum (Green et al., 2011;Merten et al., 2007). Unfortunately, this was achieved at the expense of low sensitivities for both recall indicators. Of note, a cut score of ≤ 4 for the delayed recall indicator yielded the highest sensitivity in the second study (45.8%) pointing to limited effectiveness in simulator vs. patients, which proved nonetheless superior to immediate recall.
We analyzed differences between immediate recall and delayed recall scores across contrasts, to verify the test-effect (i.e. how multiple learning trials and recall sessions of test material influenced retention throughout the test). These differences were graphically displayed as performance curves. Results across both studies showed a decrement in recall performance for all simulator groups that varied across studies, while both patients and normal controls revealed a performance increment of approximately 1 p. These results suggested that in full-effort groups, regardless of the presence of cognitive impairment, a learning effect occurred, despite confrontation with a distractor with different stimuli. On the other hand, in groups of feigners, the performance decrement suggested the intentional withholding of memorized items thus indicating noncredible responding. Our findings thus offer input on how the difference between immediate and delayed recall in memory tasks might be a useful indicator in assessment, addressing the lack of evidence noted by Leighton et al. (2014). Our studies also provide a new method of computing performance curve indicators, therefore contributing to knowledge in this field (Rose, Hall & Szalda-Petree, 1998;Wogar et al., 1998;Suhr & Gunstad, 2000;Bender & Rogers, 2004).
All three groups of simulators were discriminated from both full-effort groups by demonstrating descending curves of recall performance, regardless of the coaching type. The fact that test-coached simulators showed a smaller decrement in performance than the other two groups suggested the moderating influence of test-coaching on performance in the sense of bringing simulated test presentations closer to bona fide impairment. In this regard, our findings appear consistent with studies that support the superiority of test-coaching over other types of coaching (Bender & Rogers 2004;Powell et al., 2004;Rüsseler et al., 2008;Weinborn et al., 2012). Our results also indicate the vulnerability of recall measures to testcoaching, and therefore their association with more robust measures (e.g. forced-choice) would be more suitable in the assessment of noncredible performance.

Limitations
Several limitations may be attributed to our studies. Firstly, the employed indicators are experimental and need to be further tested in research before issuing any final statements about their classification accuracies. Secondly, the small sample sizes in both studies and the fact that simulators were recruited among psychology restrict the generalizability of results to other populations. Thirdly and most importantly, the absence of a standard PVT to determine whether the clinical and non-clinical comparison groups performed with full-effort is another limitation. Without it, it could not be stated with certainty that the presentations of clinical patients and community controls were genuine. Therefore, the replication of findings with a criterion-PVT to account for response validity appears mandatory. Future studies should address these limitations by including simulator samples recruited from various populations and replicating experimental results in comparisons of clinical criterion groups.

Conclusion
The present paper compared the accuracy of two experimental validity indicators based on recall memory in detecting noncredible performances of simulators compared with full-effort controls and clinical patients. Results of two experiments highlighted delayed recall as superior to immediate recall in distinguishing simulators from patients, yet low sensitivities pointed to this indicator's limited classification ability in clinical assessment. Comparing immediate vs. delayed recall performances of simulating vs. full-effort participants showed a suppression of the learning effect in all three simulator groups, with test-coached participants exhibiting the smallest difference from scores of genuine patients. Observing the absence of a learning effect or a descending performance curve in examinees may provide some information about an examinee's response validity, however, caution is recommended when interpreting such results in cases where bona fide impairment is present.

Availability of data and material
Data are available upon request.