What does competently delivered neuropsychological assessment feedback look like? Development and validation of a competency evaluation tool

Abstract Objective: Neuropsychological assessment (NP) feedback helps patients and caregivers understand assessment results to maximise their utility and impact in everyday life. Yet feedback practices are inconsistent and there are no evidence-based guidelines for how feedback should be most effectively delivered. The aim of our study was to develop a psychometrically sound feedback competency checklist, the Psychology Competency Assessment Tool – Feedback (PsyCET-F), for use in research, training, and clinical settings. Method: The Delphi method of expert consensus was used to establish checklist items that clearly described competencies important for NP feedback. To examine the inter-rater reliability of the checklist, two experienced neuropsychologists rated the competencies demonstrated by trainee neuropsychologists across four feedback sessions. Results: After two Delphi rounds, consensus was reached on the 20-item checklist. Consensus was defined as at least 80% agreement amongst the panel of 20 experts. Four item categories resulted from the Delphi: (a) Opening the Session; (b) Applying Specific Feedback Techniques; (c) Engagement, Collaboration, and Alliance; and (d) Structuring and Ending the Session. Inter-rater reliability was moderate (κW = 0.79, p <.001, 80.52% agreement) when using a simple coding system, coded as Beginner, Intermediate, Competent, and Skilful; and strong (κW = 0.82, p <.001) when competency level was coded using an 8-point, detailed coding method. Conclusions: The PsyCET-F is psychometrically sound and fit-for-purpose for measuring competencies in giving NP feedback. It can be used in the training of clinicians to develop effective feedback skills. International benchmarking and usability testing will be conducted in a future study.

Neuropsychological assessment involves the evaluation of a person's cognitive, psychological, and behavioural functioning to aid in diagnosis, highlight cognitive strengths and weaknesses, and guide management, rehabilitation, and intervention (gass & Brown, 1992).Neuropsychological assessment feedback (hereafter referred to as NP feedback) is crucial for translating the results of neuropsychological assessment into meaningful outcomes for patients and caregivers, but the skills required for effective delivery of feedback are complex and can be difficult for trainees to learn.There are several benefits of NP feedback, including helping patients and caregivers understand the implications of test results on daily functioning, facilitating the discussion of effective symptom responses and management, diagnostic clarification, supporting patient understanding of the presenting issues, and allowing clinicians the chance to address patient concerns and provide reassurance (Rosado et al., 2018).Despite its benefits, the practice of NP feedback remains inconsistent, and models of training are lacking (Bennett-Levy et al., 1994;gruters et al., 2022;Smith et al., 2007).
NP feedback has been found to improve patient outcomes, although the quality of evidence is mixed (gruters et al., 2022).Patients find NP feedback and the recommendations received during sessions to be helpful (Donofrio et al., 1999).NP feedback sessions can result in improvements in patient quality of life, their understanding of their condition and perceived cognitive functioning, mood, self-efficacy and coping abilities, as well as decreased stress levels of both the patient and caregiver (Longley et al., 2022;Rosado et al., 2018).Conversely, receiving inadequate information about their condition and performance in assessment can leave patients feeling distressed and concerned (Bennett-Levy et al., 1994).Distress may be further exacerbated following the second-hand delivery of results, such as when the referral source summarises results to the patient, but no direct feedback is provided by the neuropsychologist (Foran et al., 2016).
For many patients who do receive NP feedback, its delivery often fails to satisfy their desire for information regarding their condition (Bennett-Levy et al., 1994).This may be due to a lack of established practice standards or evidence-based models for effective NP feedback delivery.gass and Brown (1992) recommend several NP feedback components, including reviewing the purpose of assessment and describing both strengths and weaknesses.gorske and Smith (2009) Collaborative Therapeutic Assessment model recommends eliciting patient concerns and implications of results on everyday life, and determining a personal skills profile.Postal and Armstrong (2013) also outline several helpful recommendations for how to explain assessment results in a way that is meaningful for each individual client, based on informal interviews they conducted with a range of clinicians about their NP feedback practices.other models of feedback provision have also been proposed for neurological rehabilitation (Waldron-Perrine et al., 2021) and older adult populations (Wynn et al., 2022).however, these models of NP feedback practice are yet to be empirically evaluated.This stands in contrast to other forms of psychological interventions such as cognitive behaviour therapy and motivational interviewing, where checklists of essential clinical competencies have been developed and comprehensively evaluated (Miller et al., 2004;Roth & Pilling, 2008).These checklists have been used as measures of treatment fidelity in clinical trials, allowing rigorous evaluation of the interventions according to best-practice trial guidelines, and facilitating clinical implementation efforts by providing a method for evaluating whether clinicians demonstrate the competencies required for effective intervention delivery (Wong et al., 2020).The checklists are also used to guide practice across clinical settings and evaluate trainee competencies during training programs (Miller et al., 2004).
Internationally, there have been significant efforts made over the past 20 years to shift the focus of training programs away from specifying the content of training (or 'input') to instead specifying the competencies (or 'output') considered essential for professional psychologists (Falender & Shafranske, 2007;gonsalvez & Crowe, 2014;Roberts et al., 2005).The move to competency-based training models means there is a need for more valid, reliable tools to assess these competencies.Measurement of trainees' competency strengths and weaknesses allows the identification and development of skill gaps (gonsalvez et al., 2013; hatcher et al., 2013).In clinical neuropsychology, recent efforts have begun to implement this cultural shift through the identification and evaluation of key competencies necessary for effective neuropsychological practice (Carrier et al., 2022;heffelfinger et al., 2022;Smith & CNS, 2019;Wong et al., 2019;2020;2021Wong et al., 2023)).
Currently, international recognition of clinical neuropsychology qualifications is limited due to differences in training requirements and role definitions (Ponsford, 2016).For example, in Australia, clinical neuropsychology training is at minimum a two-year postgraduate neuropsychology-specific Masters degree; whereas in the uSA and much of Europe, clinical neuropsychologists must have doctoral-level training, though a significant portion of this is in clinical psychology rather than neuropsychology specifically.These differences in training structure and curriculum can limit the exchange of ideas to a national level (hessen et al., 2018).An international agreement in training a set of core competencies in neuropsychologists can promote the exchange of ideas, lead to greater international consistency of graduate skills regardless of training program structure, and thus enhance the development of clinical neuropsychology practice worldwide.Adherence to a set of standard competencies, in skill domains such as assessment, NP feedback and report writing, could also enhance the quality of evidence-based practice.
The aim of this study was to develop a psychometrically sound checklist of competencies for giving effective NP feedback, to establish consistent standards for clinical practice, guide skill development in trainees, and measure fidelity of NP feedback delivery in research.Specifically, the study aimed to 1) validate a checklist of essential NP feedback competencies through consensus amongst experts, and 2) determine the checklist's inter-rater reliability when used to rate the competency of clinical neuropsychology trainees.We conducted the current study in the Australian context, with the aim of establishing its international relevance and usability in a future study.

Method
Ethical approval for this study was granted by the La Trobe university human Research Ethics Committee (hEC18168).Data collection took place from May-September 2018.

Design
To address the first aim, the Delphi method was used to obtain expert consensus on the proposed items of a newly developed NP feedback competency checklist, called the Psychology Competency Evaluation Tool-Feedback (PsyCET-F).given the lack of existing tools or frameworks for NP feedback, the validation of new tools requires establishment of expert consensus on the competencies essential for effective NP feedback.The Delphi Method is one such method for obtaining expert consensus, which is useful for clarification of phenomena with little known about them (Skulmoski et al., 2007).The Delphi method involves a multi-round series of iterative surveys to work towards consensus from a panel of experts on the importance and clarity of the checklist items and structure (hsu & Sandford, 2007).
After content validity was established, the second aim of calculating inter-rater reliability was addressed by using the checklist to independently rate neuropsychology trainees' NP feedback competencies from video recordings and calculating inter-rater agreement.

Development of the PsyCET-F
The PsyCET-F is one of a suite of tools that members of the research team are developing to measure competencies in professional psychology practice, including a tool focussing on assessment (the PsyCET-A; Carrier et al., 2022) and group-based rehabilitation interventions (Wong et al., 2019).The PsyCET-F was first developed by neuropsychologists DW and AM, based on (a) a review of the available literary evidence on NP feedback practice; (b) their own experience in providing NP feedback; and (c) their experience in training, assessing and supervising students in providing NP feedback.This 13-item scale was then refined based on responses from a focus group of 21 neuropsychologists who were attending a neuropsychology supervisor masterclass co-led by DW and AM.During the focus group session, participants were shown a video-recorded simulated NP feedback session conducted by DW.They used the PsyCET-F to rate the neuropsychologist's competencies in conducting NP feedback.The focus group participants were then asked to complete a questionnaire asking their opinions about the tool's usefulness, relevance, and ease-of-use, and whether any items should be removed or added.These responses were collated, and the PsyCET-F was then updated to incorporate additional items (e.g., giving feedback in multidisciplinary teams, attending to family dynamics), features, and formatting components suggested by the focus group.
This updated version was used in the Delphi survey for this study.It was a 20-item checklist of NP feedback competencies.Items were grouped into four themes under the following subheadings: (a) Opening the Session; (b) Applying Specific Feedback Techniques; (c) Engagement, Collaboration, and Alliance; and (d) Structuring and Ending the Session.Two items relating to introducing the session and goal setting were grouped into Opening the Session.Eight items fell under Applying Specific Feedback Techniques.These related to the ability to discuss results, implications, and strategies at the client's level of understanding, using assistive tools if appropriate, and ensuring the applicability to daily functioning is clearly discussed.Seven items were grouped into Engagement, Collaboration, and Alliance.These covered competencies such as engaging with and building rapport and a working alliance with the client and family members, as well as collaborating with other professionals.Structuring and Ending the Session comprised three items, relating to appropriately pacing the delivery of the information throughout the session, maintaining a coherent structure, and acknowledging the end of the session.
Each item could be rated on a 9-point scale, reflective of a standard of competence.Ratings of 2-3 indicated a Beginner level; 4-6 indicated an Intermediate level, and ratings of 7-9 denoted Competent.The competency levels were defined using a brief descriptive vignette.Assessors could also give a rating of 1, indicating Not Observed, where the assessed individual had not demonstrated the competency, despite opportunity.The "Not Applicable" rating indicated that the competency was not relevant for that NP feedback session, for a rating of 0. For example, the item relating to collaboration with other multidisciplinary professionals would not be relevant in all NP feedback settings.At the end of the checklist there was an Overall Remarks section, which included a 9-point scale for overall competency level, and space for supervisors to indicate the trainee's strengths and areas to further develop.

The Delphi survey
Inclusion criteria for expert panel members were i) qualifications and registration as a clinical neuropsychologist, ii) practiced clinically as a neuropsychologist for a minimum of two years, iii) provided neuropsychological assessment NP feedback for over 50% of patients on average and iv) at least moderately experienced in supervising neuropsychology trainees in giving NP feedback, according to self-report.We aimed to recruit at least 20 panellists, in line with recommendations for Delphi studies (hsu & Sandford, 2007).Participants were recruited from the Neuropsychologists in Australia (NPinoz) email list.Interested participants completed a brief survey about their qualifications and experience to ensure eligibility for the study.Twenty-three eligible participants consented to be panellists, of whom 21 (91.3%) had 10 or more years of clinical experience.The characteristics of the panel are detailed in Table 1.
Panellists completed the survey via Qualtrics and were asked to rate the relevance of each PsyCET-F item for evaluating competency in NP feedback on a 5-point scale (1 = very irrelevant to 5 = very relevant).Panellists then rated the clarity of each item (1 = very unclear to 5 = very clear).All ratings from 1-3 included an additional request for a rationale for their response and suggestions for improvement.Following this, panellists were asked to rate the characteristics of the PsyCET-F, including: the appropriateness of the beginner, intermediate and competent skill and error descriptions, the overall appropriateness of the competency levels, the usefulness of the 9-point rating scale, the subheadings, 'overall remarks' section, and the PsyCET-F as an overall training and supervision tool.Panellists additionally evaluated the ease of use of the 9-point rating scale and the PsyCET-F overall.At the conclusion of the survey panellists could provide additional suggestions and comments.Response scales ranged from 1-5 (very inappropriate to very appropriate; not useful at all to very useful; very difficult to very easy).The survey was piloted by three experienced neuropsychologists, and all technical and phrasing issues were addressed prior to study commencement.
Item consensus was determined using a criterion where >80% of experts endorsed the two highest ratings (4 or 5 on the 5-point scale), for the relevance and clarity of the item.Items that did not reach this consensus were either removed or revised.
New items were also added in response to suggestions made by the expert panel.Additional revisions were made to items that achieved consensus but that also received suggestions to improve wording and clarity.Survey rounds continued until >80% consensus was achieved for every item.Following the initial survey round, new and revised items were sent to the expert panel for review, using the same rating scales.In these subsequent rounds, panellists were presented with an individualised survey that allowed them to view their previous response, alongside panel feedback which showed the percentage of panel members who had selected each response on the 5-point scale.In line with Delphi methodology recommendations (Rowe & Wright, 1999), panellists were provided with summaries of qualitative information containing panellists' rationales for their ratings and judgements (including the prevalence of these opinions), as well as reasons for revisions to items and the reasons for new items.These steps aimed to enhance the considered exchange of information between panellists.

Evaluation of inter-rater reliability
Following evaluation of the content validity of the PsyCET-F, the inter-rater reliability was analysed.Two clinical neuropsychologists (DW and SP) independently evaluated three, first-year neuropsychology trainees, across four separate video-recorded client NP feedback sessions using the finalised PsyCET-F (one trainee conducted two NP feedback sessions).Each neuropsychologist had extensive experience in clinical practice (>15 and >9 years' clinical experience for DW and SP respectively) and supervising neuropsychology trainees in NP feedback.Client and neuropsychology trainee consent was obtained.To calibrate their standards, raters met following independent review of the first video-recorded session.Raters discussed the skills displayed and their justification for given ratings, making alterations as necessary.Subsequent video-recorded sessions were reviewed without further consultation between the raters.Each rater completed four checklists, totalling 80 individual ratings per rater.
A third researcher (RP) then analysed the consistency of responses.Two different scoring methods were used (consistent with Wong et al., 2019): 1. Simple coding, a scoring system which discriminates the level of competence in which a skill was displayed.In this system, the scoring was coded as follows: 0 (not observed despite opportunity), 1 (Beginner), 2 (Intermediate), 3 (Competent), and 4 (Skilful).2. Detailed coding, a scoring system which enables more detailed tracking of NP feedback competencies over time.In this system, the full 8-point rating scale is coded from 0-8.Where both raters indicated Not Applicable for an item, it was not included in the analysis.

Data analysis
For the Delphi study, descriptive statistics (i.e., frequency of responses at each level of the 5-point Likert scale) were calculated to determine whether expert consensus had been reached on each item and the other PsyCET-F characteristics (appropriateness of the rating scale, item categories, etc).
For the inter-rater reliability analyses, Cohen's weighted kappa (κ W ; Cohen, 1968) was calculated for both raters using simple and detailed coding for the pre-calibration (i.e., the scores given by the two independent raters before meeting to discuss and resolve discrepancies), post-calibration (i.e., the scores given by the two independent raters after meeting to discuss discrepancies) and the remaining PsyCET-F ratings.This was calculated using the irr package in R (gamer et al., 2010).Where variability of response categories selected by raters was low, Cohen's weighted kappa was supplemented by the calculation of a percent agreement (gisev et al., 2013).Reduced response variability occurs when one score is frequently observed (e.g., in the simple coding system).The probability of chance agreement in this score becomes inflated, which decreases Cohen's kappa, even when instances of agreement are high and purposeful (Zec et al., 2017).Therefore, percent agreement was considered a more appropriate index of inter-rater reliability in this case (gisev et al., 2013).
Internal consistency was also calculated using Cronbach's alpha on the detailed coding scores.

Delphi survey
All 20 items met criteria for consensus after the first round, though some revisions were suggested.The final expert agreement for the relevance and clarity of the PsyCET-F items is summarised in Table 2. Minor modifications to the wording of 10 items were made to improve clarity, but these changes did not warrant further review in Round 2. Two items (3 and 4) were retained without modification or further panel review.Items 14 (showing positive regard) and 15 (maintaining a collaborative working alliance) were merged as feedback indicated significant overlap of these competencies.The panel's feedback also led to the addition of a new item; "Briefly discusses how tests are scored and interpreted (i.e., compared to age-matched peers)".
Six items were substantially modified based on the panel's feedback (items 1, 2, 5, 10, 12, and 16).Elements of item 2 (agenda setting and client views) were integrated with item 1 (introducing the session), to instead expand on introducing the session by also reviewing the purpose of the assessment and the client's key concerns or issues.hence, item 2 reflected the client's views on the assessment only: i.e., "Elicits client's view on how they performed during assessment".Examples of the biopsychosocial model were added to item 5 to support trainee understanding: i.e., "Discusses formulation appropriately, acknowledging relevant biological (e.g., brain changes), psychological (e.g., cognition, emotion), and social factors (e.g., social support, cultural context)".Item 10 was modified to reflect that NP feedback summaries may be provided in formats other than written (e.g., an audio summary may also be provided).For item 12, panellists raised concerns about checking client understanding where language and comprehension are an issue.This was addressed by including the caregiver's understanding into the item.Furthermore, "any issues raised" was added to ensure client issues/concerns about test results raised at the beginning of the NP feedback session had been addressed.Item 16 was revised to better reflect the integration of information with the broader multidisciplinary team opinion, as opposed to consistency with the team opinion, to acknowledge that at times feedback information may vary among team members.The final expert agreement for the appropriateness, usefulness, and ease of use of the PsyCET-F characteristics are detailed in Table 3.All four section subheadings met consensus for usefulness, i.e., Opening the Session, Applying Specific Feedback Techniques, Engagement, Collaboration, and Alliance, and Structuring and Ending the Session, and were retained without further review from the panel.The PsyCET-F overall met consensus for usefulness and ease of use, indicating strong agreement between panellists that it was a suitable tool for rating NP feedback competencies.Subsequent changes to the checklist were therefore based on qualitative suggestions for its individual components.
The three skill and error descriptors (i.e., Beginner, Intermediate, and Competent) did not meet consensus.however, the panellists did not provide qualitative feedback for the Beginner and Competent descriptors, so they were included unmodified in Round 2. The Intermediate descriptor received feedback that the wording of the error description "skill may be undermined by later errors" was unclear.This was removed from the Round 2 Intermediate error description.one suggestion recommended the inclusion of an additional skill and error descriptor that could reflect the skill level of early career/internship-level neuropsychologists.For Round 2, a new Expert descriptor was included which indicated a level of skill reflective of a practicing clinician, with minor and very rare errors or gaps.
The 9-point rating scale did not meet criteria for consensus.Feedback indicated that it was difficult to distinguish between the points within descriptors (e.g., between a 4 and a 6 within the Intermediate standard of competence).Instead, it was suggested that including 2-points per descriptor could overcome this.With consideration for the new Expert level descriptor, a total of 8-points were included in the revised scale.Additional feedback indicated that certain items were not relevant for all NP feedback scenarios, and the Not Applicable and Not Observed items were not presented clearly enough to reflect this.An additional explanation for when to use each response option was included in the PsyCET-F: Round 2 to address this.For Round 2, the Not Observed descriptor reflected a score of 0. Finally, the Overall Remarks section did not meet consensus but was left unmodified, due to a lack of feedback.Round 2 of the Round 1 panellists, 86.96% participated in Round 2. of the Round 2 new and revised items (n = 8), all items reviewed by the panel met criterion for consensus.The final version of the checklist had 20 items.Additional adjustments were minor and were made for items 3, 11, 13, and 15, for succinctness and clarity of wording.All Round 2 checklist characteristics met consensus criteria for appropriateness, usefulness and ease of use.Following panellist feedback, the competency descriptor Expert was changed to Skilful as feedback suggested this better represented highly skilled registrars and acknowledged that all clinicians continue to develop with practice throughout their careers.Additional adjustments were minor and made only for further clarification.overall qualitative feedback for the PsyCET-F tool was positive.Some panellists indicated a keenness to begin using the PsyCET-F, commenting: "I feel very positive and excited about this new tool", "great initiative", and "I think this is very worthwhile endeavour".
The final version of the PsyCET-F can be found in Supplemental Material.

Inter-rater reliability
Inter-rater reliability statistics for both the simple and detailed scoring methods are presented in Table 4.For the simple and detailed coding system, the pre-calibration scores were non-significant and indicated no level of agreement between raters.Following the calibration meeting, ratings for the first NP feedback session resulted in a significant level of agreement between raters, for both simple and detailed coding systems, in the weak range.Following this, the scores for the remaining students had a moderate and strong level of agreement between raters, for the simple and detailed coding systems, respectively.As the variability of raters' scores was low for the simple coding system (thereby affecting Cohen's kappa calculations), percent agreement was calculated.The percent agreement of the pre-calibration scores was 57.89%, 73.68% post-calibration, and for remaining scores was 80.52%.Cronbach's alpha was 0.96, indicating very high internal consistency.

Discussion
our goal was to develop a psychometrically sound checklist of NP feedback competencies to establish consistent standards for clinical practice, guide skill development Table 4. inter-rater reliability of two raters using the psyCeT-F: Final simple coding system and the detailed coding system across the pre-calibration session for student 1, the post-calibration session for student 1, and the post-calibration sessions for all students.Note.For student 1, one student was rated across 20 items and in overall competency, in both the pre-and postcalibration sessions.Two items were rated as not applicable by both raters and were not included in the inter-rater reliability analyses.There were 19 ratings given for each session.
in trainees, and measure NP feedback fidelity in research.using the Delphi technique, expert consensus on the items, checklist structure and rating scale was reached after two rounds, resulting in 20-items across four main checklist categories: (a) Opening the Session, (b) Applying Specific Feedback Techniques; (c) Engagement, Collaboration, and Alliance; and (d) Structuring and Ending the Session.Averaging across all PsyCET-F items, there was 97.57% agreement on the relevance of the tool to NP feedback skills, and 93.80% agreement on the clarity of the items, indicating excellent content validity (Polit & Beck, 2006).Qualitative feedback for the tool was positive overall, and panellists indicated interest in using the tool in the future, particularly given the lack of pre-existing tools tailored specifically to NP feedback.The final inter-rater reliability of the checklist was of a moderate and strong level of agreement between raters, using the simple and detailed coding systems, respectively.Internal consistency was very high.The PsyCET-F therefore appears fit for purpose as a measure of NP feedback competencies.
The inter-rater reliability of the PsyCET-F improved considerably following a calibration meeting between raters.Consequently, where more than one rater is using the PsyCET-F and consistency is important (e.g., practical examinations such as objective Structured Clinical Examinations or oSCEs; or ratings of treatment fidelity for research trials), benchmarking of rating standards should occur on an initial feedback session first.As recommended by Lombard et al. (2002) and Wong et al. (2019), a "criterion" rater should be determined, to be used as a benchmark for the second rater.This, generally, should be the more experienced rater.These recommendations have been included in the instructions on the first page of the PsyCET-F (see Supplemental Material).
There were 23 Delphi panellists in Round 1.This is a sufficient sample size (hsu & Sandford, 2007), and a strength of our study.Participant attrition was low, with 20 or 86.96% of panellists participating in Round 2. As all items achieved consensus in Round 1, it is unlikely that participant attrition affected the final consensus outcomes.

Limitations
There were several limitations which should be considered.There was a high proportion of female panellists, with only 10% of the panel identifying as male.however, a gender imbalance was expected considering the high ratio of female-to-male psychologists registered with the Psychology Board of Australia (PsyBA, 2022; 80.39% identify as female, 19.58% as male, and 0.03% as not stated/intersex/indeterminate).Furthermore, there is no existing evidence about gender differences in feedback delivery, therefore it is unlikely this was a significant source of bias.
A positive bias has been observed when supervisors rate student competencies (Dennhag et al., 2012;Wong et al., 2020).This may have influenced inter-rater reliability of the supervisors' ratings.In Wong et al. (2020) supervisors rated their supervisees' skills in cognitive behavioural therapy adapted for brain injury (CBT-ABI) higher than an independent, blinded rater.Dennhag et al. (2012) similarly found supervisors rated therapist adherence and competency in cognitive therapy and individual drug counselling more positively than an independent rater.Possible reasons for supervision bias include expectancy of positive supervision effects (Wong et al., 2020), feelings of positive affect or loyalty toward the trainee, and supervisors' greater level of contextual knowledge of the observed case leading to more accurate judgements of which skills are relevant to use with the client (Dennhag et al., 2012).Supervisors' prior knowledge of student competencies that were not directly observed in the rated session could also influence more positive ratings.A potential supervisor bias was observed in the pre-calibration session for Student 1.When using the detailed coding system, supervisor SP consistently gave Student 1 ratings that were more positive than independent rater DW.The discrepancy dropped substantially following the calibration session and was not observed when rating any other students' sessions.This highlights the importance of creating clear benchmarking standards and undertaking a calibration process.Additionally, one of the two raters (DW) co-developed the checklist, therefore may have had an internal representation of the checklist items that might not have been fully captured by the written descriptions.This points to the need for further evaluation of inter-rater reliability with new independent raters in future research.
There was somewhat limited variability in student competencies as all those rated were on their first clinical placement, and some of the students had not previously given NP feedback.This may have positively influenced inter-rater reliability.If there had been more rated sessions involving mid-range competency levels, inter-rater discrepancies may have been more frequent.Future studies should further assess the psychometric properties of the PsyCET-F with a wider range of competency levels.

Future directions
While the PsyCET-F was developed in the Australian context, it is also intended to be used internationally.To facilitate this, we plan to conduct pilot or usability testing with clinical neuropsychologists and trainees in other countries, to establish the international relevance of the checklist.It is possible that usability testing could lead to edits and/or additions to the checklist; for example, it has been noted after completion of the Delphi study that the sequence in which information is presented can be important (gorske & Smith, 2009) and this may not be explicit enough in the item pertaining to the session having a "coherent structure".Following usability testing, we then plan to i) evaluate inter-rater reliability on a larger international sample with independent raters, ii) conduct a factor analysis to establish factor structure and item/ category validity, and iii) conduct a benchmarking exercise that would lead to internationally agreed vignettes or video snippets describing or demonstrating each competency at each level.These vignettes and videos could then be used to establish internationally consistent benchmarks of NP feedback competencies and included in a comprehensive PsyCET-F instruction manual and/or training program.
The checklist may also be relevant for other forms of psychological or cognitive assessment feedback; to assess this, it could be evaluated (and adapted if necessary) by a panel of psychologists from other areas of practice (e.g., clinical psychology; educational and developmental psychology).Additionally, it would be useful to determine if the PsyCET-F requires adaptation for feedback delivery with clients who may have conditions or needs that affect their response to feedback, such as clients who exhibit suboptimal effort (see Martin & Schroeder, 2021, 2022), impaired self-awareness or anosognosia, severe aphasia, or severe cognitive impairment affecting comprehension of verbal information.In these circumstances, it may be that the family or support network play a more central role as the recipients of feedback (Postal, 2019), similar to when the client is a child (Dolan, 2019;Kuentzel, 2022), and therefore that the emphasis of some PsyCET-F items should shift to focus more on the family's goals, views and potential strategy implementation.Furthermore, the suitability of the PsyCET-F for feedback provision to clients from more collectivist cultural backgrounds, such as many First Nations communities, should be evaluated.
given one of the aims of the PsyCET-F is to guide training of feedback skills, it would be useful to evaluate whether educators, supervisors, trainees and clinicians find that using the checklist facilitates acquisition of the competencies and/or confidence in delivering feedback.Anecdotally, we have found that neuropsychology trainees find it useful as a guide for skill development and to 'know what they are aiming for' , however we have not yet conducted a formal evaluation of the impact of checklist use on competency development.
Another important step in this research will be to assess whether there is a relationship between NP feedback competencies and patient outcomes.Such studies could measure whether understanding of presenting concerns and how to manage them are more likely to improve when NP feedback is delivered with a higher level of competence.other relevant NP feedback outcomes include patient and caregiver satisfaction, implementation of recommendations, mood, and quality of life (Donofrio et al., 1999;Longley et al., 2022;Westervelt et al., 2007).Future studies could also determine whether any specific NP feedback competencies have a greater effect on patient outcomes than others, or which competencies are more difficult to train.These findings could guide supervision practices by placing greater emphasis on skills which have more impact on patient outcomes and by guiding supervisors to provide more support when training skills that are more difficult to learn.The PsyCET-F could also be used to determine whether certain training methods are more effective for competency development.These studies could also include concurrent evaluation of neuropsychological assessment competencies using the PsyCET-A tool (Carrier et al., 2022).

Concluding remarks
We set out to develop an innovative and psychometrically sound tool for use in assessing competencies in giving NP feedback.Expert panellists considered the PsyCET-F to be useful tool that they would be likely to use when supervising neuropsychology trainees.This tool can be used as a guide for training NP feedback competencies and to objectively measure competencies throughout and at the completion of training.This can help to identify competencies requiring additional attention and can ensure competence is met at the completion of training (hatcher et al., 2013).The use of the PsyCET-F in this manner can improve clinician adherence to best practice procedures in NP feedback, which will ensure patients receive consistent, high-quality services regardless of their practitioner (hessen et al., 2018).Following international benchmarking of the PsyCET-F competencies, the checklist can also be used to enhance international graduate consistency, allowing graduates to work and exchange ideas worldwide (Ponsford, 2016).It can also measure treatment fidelity in clinical trials evaluating feedback outcomes, which would enhance research rigour.Enhancing evidence-based practice in clinical neuropsychology requires clearly defined practice standards, which have the potential to improve the services we deliver for people living with neuropsychological conditions.

Table 1 .
Characteristics of the panel.

Table 2 .
Final expert agreement on the relevance and clarity of the psyCeT-F items.

Table 3 .
Final expert agreement on the appropriateness, usefulness, and ease of use of psyCeT-F characteristics.Consensus refers to expert agreement on either a 4 or 5 of the likert scale.