Introducing an Interview-Based Cognitive Assessment Tool for People with Schizophrenia in Ethiopia

Assessment of cognitive impairment in people with schizophrenia (PWS) is limited in low and middle-income countries due to lack of context-appropriate measures. This study aimed to select, adapt, and evaluate an interview-based cognitive tool for PWS in Ethiopia. The study was carried out in three phases. In the first phase, we followed a rigorous instrument selection procedure to select a tool for adaptation. We then applied a rigorous instrument adaption procedure, including interviews with 24 participants. Finally, we evaluated the psychometric properties of the adapted tool with 208 PWS and 208 matched controls. The Cognitive Assessment Interview was selected as the appropriate tool for adaptation. This tool is practical and tolerable, with short time of


Introduction
Schizophrenia is a heterogeneous mental health condition, and cognitive impairment is one of its symptom dimensions including problems with memory, attention, and executive function (Bowie and Harvey, 2005).It has been argued that schizophrenia could be primarily a cognitive illness rather and psychosis symptoms may be less fundamental (Kahn and Keefe, 2013).However, cognitive assessment is not conducted routinely, and cognitive difficulties are not often targeted by existing psychological and pharmacological interventions.Cognitive difficulties are an important phenomena in people with schizophrenia (PWS) as they are associated with poor functional (Chang et al., 2016;Kalache et al., 2015;Kitchen et al., 2012;Tsoutsoulas et al., 2016) and recovery outcomes (de Bartolomeis et al., 2013;Frydecka et al., 2016).
Traditionally, cognitive impairment in PWS is assessed with performance-based measures, but recently interview-based measures have been introduced (Chang et al., 2015;Keefe et al., 2006;Ventura et al., 2008).Each assessment method has its own advantages and disadvantages; studies demonstrated that interview-based measures correlate with objective cognitive and functional measures (Keefe et al., 2006).In addition, they have additional benefits, such as having a short duration of administration and requiring fewer resources (i.e., test materials and training).This makes them suitable, particularly in resource-scarce settings.
The Measurement And Treatment Research to Improve Cognition in Schizophrenia (MATRICS) initiative (Marder and Fenton, 2004) recommended two interview-based measures as potential co-primary outcome measures for intervention studies in cognitive impairment in PWS (Green et al., 2008).These are the Schizophrenia Cognition Rating Scale (SCoRS) (Keefe et al., 2015;Keefe et al., 2006), and the Clinical global impression of Cognition in Schizophrenia (CGI-CogS) (Ventura et al., 2008).Ventura et al. developed the Cognitive Assessment Interview (CAI) by combining these two tools and selecting the best-performing 10 items through a two-parameter logistic (2pl) Item Response Theory (IRT) model (Reise et al., 2011;Ventura et al., 2010).The CAI has been adapted and validated in Italy (Palumbo et al., 2019), Spain (Sánchez-Torres et al., 2016b), Indonesia (Ardiningrum et al., 2019), and Turkey (Bosgelmez et al., 2015), confirming its applicability in cross-cultural settings.However, this tool has not been adapted/validated in lowincome settings.
Cognitive assessment in PWS is overlooked in Ethiopia.In routine clinical visits, clinicians predominantly assess positive symptoms.One of the reasons for this is the lack of an appropriate assessment.In low-income countries, such as Ethiopia, adapting an interview-based cognitive assessment measure would be beneficial because this method is associated with lower resources than traditional performance-based cognitive tests.In addition, in low-income countries, psychiatrist to patient ratio tends to be very high, and a simple interview-based assessment tool could be administered by most clinicians with minimal training and with short administration time.
This study will fill this gap by introducing an adapted interview-based tool for assessing cognition.The study will have significance for both patients and clinicians as it would help to characterize cognitive difficulties associated with PWS and consider treatment options and management strategies.This study will use rigorous instrument selection and adaptation procedures involving experts, clinicians, and PWS.This study has three objectives: i) to select an appropriate interview-based tool for adaptation, ii) to adapt the selected tool, and iii) to conduct a preliminary psychometric evaluation of the adapted tool.

Phase I: Measure selection
In this study, we followed the recommendations by Prinsen et al. (Prinsen et al., 2016) to select the best test for adaptation.
1) Identifying domains of the construct to be measured.We conducted an umbrella review of most affected domains of cognition in PWS (Gebreegziabhere et al., 2022) and identified five cognitive domains.
2) Exploring the existing instruments through literature or systematic review.We systematically reviewed cognitive measures validated in low-and middle-income countries (LMICs) (Haile et al., 2021).This process identified three interview-based tools, and using pre-determined criteria, the CAI (Ventura et al., 2010) ranked first, followed by the SCoRS (Keefe et al., 2006) and the Self-Assessment Scale of Cognitive Complaints in Schizophrenia (SASCCS) (Johnson et al., 2009).
3) Evaluating instruments quality.We evaluated the quality of each study using Consensusbased Standards for the selection of health Measurement INstruments (COSMIN) criteria (Mokkink et al., 2018) and the quality of each measure using criteria for good measurement properties (Terwee et al., 2007) (Supplementary material 1).4) Selecting the appropriate instrument, including recommendations of experts.We conducted two expert meetings to select the best tool for adaptation.The two expert consensus meetings were conducted in February 2021, including junior and senior researchers and clinicians with varying years of experience working with PWS in Ethiopia.We used the COSMIN (Mokkink et al., 2018) feasibility checklist in the expert meetings to rate each tool (Supplementary material 2).

Phase II: The adaptation process
Once the tool to be adapted was selected, a formal adaptation of the identified tool was conducted following standardized guidelines (Beaton et al., 2000;Sousa and Rojjanasrirat, 2011;Wild et al., 2005).

The translation process
Forward translation into Amharic was done by five translators independently (Amharic is the official and most spoken language in Ethiopia).Three of the translators were familiar with the concepts pertaining to mental health and cognitive status, and the others were language experts.
The translators discussed and agreed on one forward-translated version.Back translation was done by four other translators independently, who were blind to the English version.Two were familiar with the concept but had no previous experience with the selected tool, and the other two were language experts.This group of translators also discussed and agreed on one backward translated version.
Comparison between the forward-translated version, the backward-translated version, and the original tool was discussed by a panel of nine experts from different disciplines.The panel agreed on one Amharic version of the tool.

Cognitive interview (pre-testing)
The objective of this phase was to check the extent to which the translated version was easy and understandable by the target population.For this purpose, we interviewed purposively selected PWS (n=16) and their caregivers (n=8) from the outpatient department of Amanueal Mental Specialized Hospital (AMSH) in November and December 2021.We recruited participants with different characteristics to capture potential variations in sex, age, residence, duration of illness, and educational background.
The first author conducted the interviews in Amharic.A semi-structured interview guide was developed for this purpose.The topic guide assesses the participant's understanding of each item, the easiness of the response categories, the number of times an item is repeated, and the structure of the items in general.Participants were also asked to rate each item as "clear" or "unclear." Respondents who rated an item as "unclear" were asked to suggest alternative ways to improve it, and their suggestions were used to modify the item.Items rated "unclear" by over 20% of participants were modified (Topf, 1986).
The interviews lasted between 27 and 57 minutes (average 39 minutes).All the interviews were audio recorded, and notes were taken.The recordings were transcribed, translated into English, coded, and analyzed in Microsoft Excel.Thematic analysis was used to highlight significant issues that emerged from the data.

Expert's consensus meeting
An expert panel was convened to improve the tool further and resolve issues identified in the cognitive interviews.This included ten experts from different mental health disciplines (the mother tongue of the experts was Amharic, except one whose mother tongue was English).This panel reached a consensus and produced an Amharic version of the tool.

Study area, population, and period
Participants for this phase of the study were recruited from AMSH and Zewditu Memorial Hospital (ZMH) in Addis Ababa, Ethiopia, who are taking part in the Neuropsychiatric Genetics of African Populations -Psychosis (NeuroGAP-Psychosis) study.NeuroGAP-Psychosis is a multi-center study exploring genetic risk factors for schizophrenia and bipolar disorder (Stevenson et al., 2019).This study aimed to recruit at least 200 PWS and 200 matched controls.
We also planned to conduct repeated assessments in 50 PWS one month apart.The study was conducted from 29 th March to 5 th August 2022.

Eligibility criteria
The inclusion criteria for cases were a diagnosis of schizophrenia as ascertained by the NeuroGAP-Psychosis project.Those younger than 18 or older than 65, not fluent in Amharic, unable to read letters and/or numbers, and with communication difficulty and comorbidity were excluded.While for the controls, the absence of known past and current mental illness, as ascertained by the NeuroGAP-Psychosis project, was an additional requirement.

Measures
After collecting socio-demographic data and clinical characteristics, we administered the selected interview-based tool (i.e., CAI) followed by a performance-based cognitive battery.
The CAI is a ten-item semi-structured interview-based tool to assess six cognitive domains (Reise et al., 2011;Ventura et al., 2010).In addition, the tool has one item called the clinical global impression of cognition in Schizophrenia.In the CAI, information is collected from the patient, an informant, and the clinician.Each item is scored on a seven-point Likert scale, where a higher score reflects worse cognitive function.The CAI also has a Global Assessment of Functioning (GAF)-Cognition in Schizophrenia section to be scored from 0-100, parallel to the scoring of GAF, where a higher score reflects better cognitive function.
We used the Ethiopian version of Cognitive Assessment battery in Schizophrenia (ECAS) as a performance-based cognitive tool to test concurrent validity (Gebreegziabhere et al., 2023).
ECAS has seven tests to assess six domains of cognition.The details of the tests have been reported elsewhere (Gebreegziabhere et al., 2023).
For tolerability (the easiness of the tool for the participants), we used a five-point Likert scale item where the rating of "1" is for "very boring" and "5" is for "very engaging."For practicality (the easiness of administering the tool), we also used a five-point Likert scale where "1" is for "very difficult to apply," and "5" is for "very easy to apply."There is an open-ended follow-up question for suggestions for improvement.

Data Management and analysis
We checked each questionnaire for completeness and appropriateness of responses.We coded and double-entered the data into EpiData version 4.6.0.6 software.Stata version 17 and MedCalc statistical software's were used for analysis.
We evaluated the normality of the data through visual examination of the histogram and analyzing the Kurtosis and Skewness of the items (we considered normality to be satisfied when univariate skewness was between -2 and +2 and univariate kurtosis is between -10 and +10) (Collier, 2020).We used an independent sample t-test to determine if differences exist between PWS and controls for continuous variables and a chi-square test for categorical variables.Mann-Whitney U-test was used in variables that did not fulfill the normality assumption.
We conducted descriptive statistics and item-level analysis and decided items to have item-total correlations r > 0.3 and item-item correlations r < 0.9, a prior.Internal consistency was calculated using Cronbach's alpha (α); α value of 0.7 was considered acceptable (Streiner et al., 2015).Floor effect was considered present when over 15% of the participants achieved the minimum score.Similarly, ceiling effect was considered present when over 15% of the participants scored the maximum score.
We used a two-way mixed effect Intra-Class Correlation Coefficient (ICC 3, k) to determine Inter-Rater Reliability (IRR) and test-retest reliability (Koo and Li, 2016).The outcomes was classified based on the recommendation of Koo and Li (Koo and Li, 2016).
We used a mean difference of dependent samples (paired samples t-test) to quantify changes between the first and second assessments to examine practice effect.We used an independent sample t-test to determine whether there was a significant difference between the clinical sample and the controls as a test of known group validity.We also computed Cohen's d (effect size), and the outcomes are classified as small, medium and/or large based on the recommendation of Cohen (Cohen, 2013).
We conducted Exploratory Factor Analysis (EFA) to examine the tool's structural validity among the clinical samples.We confirmed that the correlation matrix was factorable using Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy (MSA) (Hair et al., 2014).Then we fitted EFA based on Costellon and Osborne recommendation (Costello and Osborne, 2005).We used common factor analysis with maximum likelihood extraction and Oblique rotation with the PROMAX approach.We decided the number of factors to be extracted using a combination of latent root criterion, parallel analysis, and percentage of variance explained by the factors.Furthermore, for the factor to be retained, there should be as many factors as possible, with at least three non-cross-loading items with an acceptable loading score (Samuels, 2017).We established prior criteria for determining factor loading adequacy, including a practical significance level of 0.3 and a statistical significance level dependent on the sample size (a factor loading of ≥ 0.4 for this study) (Hair et al., 2014).We also considered the following points in deciding the final pool of items per factor, i.e., the factors to have at least a minimum of three significant loadings, internal consistency reliability of ≥ 0.70, commonality of > 0.5 for each item, and theoretical meaningfulness.We used the Spearman correlation coefficient ρ (rho) to evaluate concurrent validity through a correlation between the items of the selected tool and the performance-based battery tests.
We examined the performance of the items to discriminate PWS from controls using Receiver Operating Characteristic (ROC)-curve analysis (Akobeng, 2007).We used all PWS in the study as "cases" and controls as "non-cases."We used the Youden index (J) to set the cut-off point (Carter et al., 2016;Perkins and Schisterman, 2006).The overall ability of the tests to discriminate between the outcomes was evaluated through Area Under the Curve (AUC) (Carter et al., 2016).An AUC of > 0.7 is considered "acceptable" (Carter et al., 2016).
Finally, we conducted an IRT analysis to determine where in the latent threat that the tool function best and to identify the difficulty and discrimination parameters.The three assumptions of IRT model analysis, i.e., Uni-dimensionality, local independence, and monotonicity were checked (Hambleton et al., 1991).We selected the Graded Response IRT Model (GRM) (Hays et al., 2000) and reported the Item parameters (difficulty and discrimination), Boundary Characteristics Curve (BCC), Category Characteristics Curve (CCC), Test Characteristic Curve (TCC), Item Information Function (IIF), and Test Information Function (TIF).

Phase I: Measure selection
We conducted two expert meetings with a total of 18 experts, including psychiatrists (n=6), mental health professionals (n=5), clinical psychologists (n=2), public health professionals (n=2), a neurologist, a psychometrician, and a creative artist.The expert group recommended CAI for adaptation.We then conducted an in-depth interview with purposively selected PWS (n=16) and their caregivers (n=8).A summary of the cognitive interview findings is presented in Supplementary Material 3.

Phase II: The adaptation process
Following the cognitive interview, the experts made adaptations, i.e., having only one section to be scored by the rater based on all the available information.They also decided to recategorize the response categories from seven into three, where "1" is for "no impairment," "2" is for "mild to moderate impairment," and "3" is for "severe impairment."Experts also suggested removing the GAF-Cognition in Schizophrenia section.Finally, changes in the probes and main statements of the items were also made.This procedure yielded third draft of CAI-A.A summary of the expert's meeting findings is presented in Supplementary Material 3.

Socio-demographic and clinical characteristics
We recruited 416 participants (208 PWS and 208 matched controls).Additionally, we randomly selected 48 PWS for repeated assessment conducted four to eight weeks apart (average of six weeks).There was no significant difference between PWS and controls in terms of most of the demographic and clinical characteristics, including age, sex, and years of education (p-value > 0.05) (Table 1).

Inter-Rater Reliability (IRR)
The ICC values ranged from 0.66 to 0. 89 in group one and 0.69 to 0.88 in group two, suggesting good to excellent IRR for most items.However, in group one, for items 6, 10, and global impression (item 11), the ICC coefficient was between 0.15 and 0.44; for group two, the ICC coefficient for item 9 was 0.52, considered moderate to poor.Since the IRR of these items was less than the accepted cut-offs, we provided additional training focusing on these items to interviewers (before the actual data collection started) (Table 2).

Test-retest reliability and practice effect
Items of the CAI-A and the total score for the first assessment were significantly correlated with the second assessment except for item 6 (p-value > 0.05).The ICC of the CAI-A items was moderate, ranging from 0.49 (for item 10) to 0.71 (for item 11), except for items 6 and 7.The ICC of the CAI-A total score was high (ICC = 0.79) (Table 2).
There was no significant change in the mean score of most items between the two assessments suggesting no practice effect, except for items 4, 6, and 7.However, the effect size for these items was medium (Cohen's d between 0.46 and 0.58).The mean total score for the tool dropped from 15.29 in the first assessment to 13.92 in the second assessment (p-value < 0.001, d = 0.46) (Table 2).

Item level analysis
Among PWS (n=208), all the items had an item-total correlation of ≥ 0.57 (Table 3).Similarly, the item-item correlation of all the items is ≤ 0.68, which is in the acceptable range.All the ten items of the CAI-A were significantly but weakly to moderately correlated to one another (ρ = (0.16, 0.58), p-value < 0.001).Only two pairs of item-item correlation were < 0.3 (item 2 and item 7 (ρ=0.2),and item 2 and item 8 (ρ = 0.2)).The correlation between each item and the global impression section (i.e., item 11) was significant with moderate correlation (ρ = (0.37, 0.68), p-value < 0.001).Similarly, the correlation between each item and the total score was good, where a significant and moderate to strong correlation were found (ρ = (0.55, 0.86), pvalue < 0.001) (Supplementary material 4).
There was no ceiling effect, but the minimum score was endorsed by over 15% in all items (suggesting a floor effect).A higher floor effect was seen for item 6, in which 78.85% (n=164) of the participants endorsed the minimum score.The tool (as a scale based on the sum of the 11 items) had a lower floor effect, on which only 21.15% (n=44) endorsed the minimum score and no ceiling effect.The mean score of most items is around the second response category (i.e., mild cognitive impairment) (Table 3).

Practicality and tolerability
CAI-A required a mean administration time of 10.84 ± 3.85 minutes in PWS and 9.68 ± 4.52 minutes in controls (p-value = 0.005) during the initial assessment, and 9.38 ± 4.47 minutes in PWS during the second assessment (p-value = 0.016).
Overall, the tool was rated to be practical and tolerable.From all the administrators (n=8), only one reported the tool as challenging to administer and score.Through open-ended questions, the administrators recommended making changes considering the following points.One, there is some similarity in the probing questions, for example, the probing questions for items 1, 2, 3, and 4. Two, the probing questions for items 6, 7, and 8 are very easy that they may not allow us to differentiate participants with different severity levels.They suggested removing those probing questions and adding equivalent/modifying the existing ones.
There was a significant difference between PWS and controls in rating the test as tolerable.
While only 5.29% (n=11) of controls rated the tool below the "neither boring nor engaging" category, 18.75% (n=39) of PWS rated the tool below the "neither boring nor engaging "category.However, 70.91% (n=295) of the total sample reported the tool as "interesting to respond to" (Figure 1).

Structural validity
Bartlett's test of sphericity and KMO MSA confirmed the factorability of the tool (i.e., Bartlett's test of sphericity < 0.01 and KMO MSA = 0.909).A common factor analysis with a maximum likelihood extraction method and oblique rotation yielded two factors with an eigenvalue greater than one, explaining 77.01% of the total variance (47.07%by the first factor and 29.94% by the second factor).Horn's parallel analysis also suggested a two-factor solution.However, theoretically, we expected one latent factor of cognition in PWS.Therefore, we examined the two and one-factor solutions sequentially.
The two-factor solution was not found appropriate because items 4, 9, and 10 did not load onto any of the factors at the 0.4 statistical significance level factor loading.We changed the statistical significance level to the practical significance (0.3), where item 4 showed crossloading.Then, we calculated Cronbach's alpha for all the items and items in each factor separately.We found that α for all the items is higher than α for factors one and two (0.88 for the whole, 0.86 for factor one, and 0.72 for factor two).We also noticed that α for the second factor is near the lower point of acceptable α level (0.7).Therefore, we conducted a one-factor structure and performed the analysis again.
In a one-factor solution, we found that each item significantly loaded at the theoretical and statistical significance levels, explaining 41.28% of the variance (suggesting a dominant factor).
Except for items 3, 5, and 11, all the items had a commonality of less than 0.5.Changing the rotation and the extraction method did not increase the commonality; we included all items for theoretical reasons.The α for the one-factor solution was 0.88 and given these results, we decided that the one-factor solution is an adequate structure for this tool.

Concurrent validity
We found a significant but weak correlation between items of the CAI-A and tests in the performance-based battery designed to assess similar domains (p-value <0.005).However, we did not find any significant correlation between items designed for assessing attention and vigilance and reasoning and problem-solving domains and any of the performance-based tests or the composite score (p-value > 0.05) (Supplementary Material 5).
Overall, the clinical global impression item and the total score of CAI-A had a significant and weak correlation to the performance-based battery composite score (ρ = -0.23 and -0.25, respectively, p-value <0.001).Furthermore, the clinical global impression item and the total score of CAI-A were significantly and weakly correlated with each test of the performance-based test, except for Corsi Block Taping Task (CBTT) (a test for visual memory) and Animal Naming Test (ANT) (a test for verbal fluency).

Known group and criterion validity
Compared to matched controls, PWS had significantly higher mean scores with medium to large effect size differences on each item and the total score (d ≥ 0.48, p < 0.001) (except for item 8, d = 0.37).The global impression section and the total score showed a very large difference between the two groups (d=1.06 and 1.00, respectively, p-value <0.001).
Regarding domain-specific impairment, among PWS, the highest impairment was observed in the domain of working memory (average mean score = 1.49), and the lowest impairment was in the domain of speed of processing (mean score = 1.29).Whereas in controls, domains of working memory and attention were impaired most (average mean score = 1.14), and the social cognition domain was the least impaired (mean score = 1.04) (Table 4).

Item Response Theory (IRT)
There was no item with a discrimination value of >4, implying the fulfillment of the local independency assumption.The discrimination ability of most items is within the recommended level of 0.8 to 2.5.Regarding the difficulty parameter, a 50% probability of endorsing higher than the second category for the items range between 0.41 and 1.43 (Supplementary Material 6).
CCC and TCC showed that the tool functions better in participants with a higher latent ability (i.e., severely impaired group) (Figure 3).The IIF and TIF have shifted to the right suggesting that the tool assesses information from participants with a higher latent ability (severely impaired group).The standard error is higher among participants with lower latent ability suggesting that the tool is not well functioning among participants in this group (Figure 4).
[Figure 3 and 4 here] We checked that the model we chose (i.e., GRM) fits the data better than a more restrictive model (i.e., Partial Credit Model (PCM)) using a loglikelihood ratio test and comparison of Akaike's Information Criterion (AIC).We found that the chi-square test for the loglikelihood difference is significant (p-value = 0.003), and AIC is lower for the GRM (i.e., 3926.82 for PCM and 3919.72 for GRM).Therefore, we rejected the null hypothesis and concluded that the GRM better fits the data.The detailed results of the IRT analysis are provided in supplementary material 6.
Following this phase, we made changes to some items which showed poor properties; especially, we made more changes in the probing examples of items 6, 7, and 8.This procedure yielded the fourth draft of the CAI-A, which will be used in the upcoming validation study.Details of the changes we made to each item following the pilot testing are presented in supplementary material 7.

Discussion
Cognitive impairment in PWS is one of the core symptoms; however, assessment and treatment are often unavailable.In LAMICs, there are very few validated measures, partly due to limited attention given to cognitive difficulties, shortage of resources, and the training-intensive nature of cognitive testing.This study has tried to address this issue by selecting and adapting an interview-based cognitive measure in Ethiopia.
Following a rigorous procedure, we selected the CAI as an appropriate tool for adaptation in the Ethiopian context.The adaptation was primarily based on participants suggestions to reduce the response categories, and we decided to have only one rating score based on all available sources.This was regarded as the most important feedback to consider in the adaptation process.The changes were supported by previous studies, which reported a lower correlation between informants and patients' ratings, no improvement in the rater score by having additional information from informants, and valid results from the patient score alone (Sánchez-Torres et al., 2016b;Ventura et al., 2010;Ventura et al., 2013;Ventura et al., 2016).Previous studies also suggested minor adaptations of the content and structure of the tool (Gonzalez et al., 2013).
The CAI-A is found to be practical and tolerable and has shorter duration for administering and scoring.CAI-A took an average of about 11 and 10 minutes for administration and scoring among PWS and controls respectively, which is relatively shorter than the original study (Ventura et al., 2013) and the adaptation in Turkey (Bosgelmez et al., 2015), which took about 16 and 19 minutes in patients, respectively.The possible reason may be that the changes made the CAI-A more tolerable and practical.This finding implies that the tool can easily be integrated into routine service, particularly in limited resource settings.A short and acceptable tool is particularly important in the context of the current study.
The CAI-A is a reliable measure with good IRR for most items, except for items 6, 10, and the global impression section.As a result, we adapted probes, especially for item 6.A comparable result was found in a previous adaption study from Italy, where the IRR for item 6 was the second last of the ten items (Palumbo et al., 2019).Providing focused training on these items may be necessary to improve the IRR.The Cronbach's alpha coefficient of CAI-A was 0.88 suggesting excellent internal consistency as with previous studies, which reported α values between 0.87 and 0.91 (Bosgelmez et al., 2015;Palumbo et al., 2019;Sánchez-Torres et al., 2016b;Ventura et al., 2010;Ventura et al., 2013).
Item-total and item-item correlations are acceptable for most items, and there is no ceiling effect.However, we found higher floor effects for all items.The possible reasons might be that the CAI-A has only three categories, and most of the participants in this study (70%) were in remission.Based on the sum of the 11 items, the tool has a very low floor effect (21% endorsed the lower category).
Test-retest reliability for most items was within the acceptable range with ICC values for the total score similar to the CAI original validation study (ICC for the total score = 0.79) (Ventura et al., 2013).This suggests that it would be appropriate to use this tool in interventional and/or longitudinal studies or clinical practice where repeated assessment is necessary.
The total score of the CAI-A was found to have a weak but significant correlation with the composite score of the objective measure of cognition used (Spearman's rho = -0.25,p-value <0.001).Similarly, previous studies reported significant but weak to moderate correlation with different objective measures of cognition (r = -0.28 to -0.54) (Sánchez-Torres et al., 2016a;Sánchez-Torres et al., 2016b;Ventura et al., 2010;Ventura et al., 2013;Ventura et al., 2016).
Discrepancies between objectively and subjectively measured cognitive function are widely reported, and differences are thought to be influenced by affective components and insight (Cella et al., 2014).It may be relevant to assess these aspects alongside cognition and consider how, for instance, low self-esteem and poor insight might influence cognitive difficulties, perception, and description.
We found that all the items can significantly differentiate PWS from matched controls with a larger effect size for the global impression section and the total score (d =1.06, 1.00, respectively).The tool had good sensitivity and specificity at the cut-off value of > 13 (0.62 and 0.82, respectively).Since the specificity (true negative value) is higher than the sensitivity (true positive value), the tool may be more appropriate in studies looking for high specificity.This also suggests that this tool may be helpful as a screening/preliminary assessment tool for studies aiming to conduct more in-depth assessments of those likely to have the condition.Our findings align well with previous studies which reported that CAI could reliably differentiate PWS compared to controls in cognitive impairment (Sánchez-Torres et al., 2016a;Sánchez-Torres et al., 2016b;Ventura et al., 2016).
The IRT-based analyses showed comparable results to the CTT-based findings.According to the IRT analyses, the tool works better among participants with moderate to severe impairment, unlike the original study, where the items are thought to give more information among mild to moderately impaired participants (Reise et al., 2011).In the original study, more chronic participants were involved, while two-thirds were in remission in our case.In addition, we collapsed the response categories from seven to three in the adaptation process.Even though IRT analysis is not item or person-dependent, future studies might help to clarify the difference.
Compared to previous studies, the current study has strengths, including involving a larger sample and employing advanced statistics.We also matched our controls to the clinical sample for age, sex, and years of education.Another strength of this study is that we involved experts at different stages.Furthermore, we followed strict tool selection and adaptation procedures guided by standard guidelines (Beaton et al., 2000;Prinsen et al., 2016;Sousa and Rojjanasrirat, 2011;Wild et al., 2005).
The following limitations should be considered when interpreting the findings.This study recruited participants from hospital attendees, and the findings may not apply to all PWS in the community.We have not triangulated the information from attendants; as a result, the results might be influenced by the level of insight of the participants (since the tool is an interviewbased measure).Language is crucial in cognitive assessment; participants of this study are only Amharic speakers (the official language of Ethiopia).However, multiple languages are spoken in Ethiopia, so that we may need further translations for broader use.Lastly, the cut-off score we reported is on the tool's ability to differentiate cases from controls based on diagnosis, not necessarily based on cognitive impairment.
Our results support using the CAI-A as a reliable, valid, and pragmatic tool to estimate cognitive difficulties in PWS.This tool has multiple advantages in LAMIC, including short administering time, low resource use, and limited reliance on participants' education.The tool's training scalability and implementation potential may make it a first choice for clinicians and researchers interested in evaluating cognitive difficulties in PWS in Ethiopia.Future studies might assess the relationship between the tool and clinical and functional measures.We converted all the atypical antipsychotics dose into chlorpromazine equivalent Defined Daily Dose (DDD) based on the recommendation of Woods (Woods, 2003).For the typical antipsychotics, we used the study by Devis (Davis, 1974).1

Authors' statement
All the authors read the manuscript several times and have given their final approval for publication.

Forward
translation and the subsequent reconciliation meeting yielded a first draft of the Amharic version of Cognitive Assessment Interview (CAI-A).Backward translation and the subsequent reconciliation meeting yielded the first draft of the back-translated version.Finally, the harmonization meeting yielded the second draft of CAI-A, which was then used in the cognitive interviewing.

Figure 2 :
Figure 2: ROC curve of the Amharic version of Cognitive Assessment Interview (CAI-A) total score for its prediction probability of correctly identifying people with schizophrenia and controls.

Figure 3 :Figure 4 :
Figure 3: Test Characteristic Curves (TCC) for the Amharic version of the Cognitive Assessment Interview (CAI-A) among the whole sample Bold is for a significantly associated variable at a p-value of less than 0.05 between PWS and controls or between PWS from the total sample and those involved in test-retest † 1 USD = 51.76ETB during the study period † † Assessed for the last two year; Bold is for p-value <0.05

Table 2 :
Inter-rater and test-retest reliability of the Amharic version of Cognitive Assessment Interview (CAI-A) among clinical samples

Table 3 :
Item-level analysis of the Amharic version of Cognitive Assessment Interview (CAI-A)) among

Table 4 :
Mean score, Area Under the Curve (AUC), sensitivity, and specificity of each item of the Amharic version of Cognitive Assessment Interview (CAI-A), and total score for its prediction probability of correctly identifying people with schizophrenia and matched controls.
*** p-value <0.001 Figure 1: A cluster bar chart showing the tolerability of the Amharic version of Cognitive Assessment Interview (CAI-A) for the study participants.