Chronic pain interference assessment tools for children and adults who are unable to self‐report: A systematic review of psychometric properties

To identify and evaluate psychometric properties of assessment tools for assessing pain interference in children, adolescents, and adults with chronic pain and the inability to self‐report.

et al. 4 found only 33 published articles specific to pain in people with intellectual or developmental disability over a 5year period, whereas for pain in people without intellectual or developmental disability there were more than 134 000 publications. 4 Self-report is the criterion standard for assessing pain in children, adolescents, and adults. However, this can be challenging in populations where individuals may be unable to self-report owing to communication, motor, or cognitive limitations. 4,8 It is also well accepted that verbal reporting of pain is only one of many indicators of pain. 9 For this review, 'unable to self-report' is the inability to self-report by any method, including verbal responses, written responses, and the use of assistive and augmentative communication devices. Individuals who are capable of self-reporting using a method other than verbal response should be supported to self-report where possible and are not the focus of this review. There are currently no clear recommendations on which pain assessment tools to use in clinical practice for individuals who are unable to self-report. 4,9 Assessments covering functional dimensions of the pain experience are needed to capture information in a biopsychosocial model. 10 Core outcome sets have previously been developed for paediatric and adult chronic pain intervention trials, with 'pain interference with daily living' specified as a mandatory domain. 11,12 The recommended measures to complete within these outcome sets rely primarily on self-report versions of patient-reported outcome measures, with the occasional parent-report tool included for paediatric populations. 13 Unfortunately, these options are insufficient when trying to assess pain interference in individuals of varying ages who are unable to self-report.
This systematic review focuses on pain interference assessment tools currently available for children, adolescents, and adults with chronic pain and the inability to self-report in any setting. As there is limited previous research in this area for individuals who are unable to self-report, this review aims to be inclusive and provide assessment recommendations across all age groups. 4 The inclusion of the adult population recognizes that children and adolescents who are unable to self-report become adults who are unable to self-report, seeking to enhance continuity of care. Given the historical under-servicing of this population once they transition from paediatric services, it also seeks to provide greater access to pain assessment. 14 It is also possible that tools may be identified that have the potential to be used across age groups, or in future adapted for use in the adult population (and vice versa). This review will use the International Classification of Diseases, 11th Revision definition of chronic pain: 'persistent or recurrent pain lasting longer 3 months'. 15 Pain interference is defined by Karayannis et al. as 'a measure of the extent to which pain hinders engagement with physical, cognitive, emotional, and recreational activities, as well as sleep and enjoyment in life'. 16 For this review, pain interference is defined as referring to functional status, not symptom status or general health perception of overall quality of life, and does not include measures of emotional functioning such as pain coping, pain anxiety, or pain catastrophizing. 17 Tools included in this review will not be limited by which level of the International Classification of Functioning, Disability and Health (ICF) model they ascribe to, as long as they meet the definition of pain interference. 18 Several systematic reviews have previously been completed on observational pain assessment tools for those with cognitive impairment, [19][20][21][22] communication impairment, 23 intellectual/developmental disability, 10,24,25 and dementia. [26][27][28][29][30] However, these studies have primarily focused on acute and post-surgical pain, and on the domains of pain intensity, pain behaviour, and pain frequency. Pain intensity is an insufficient indicator of the impact of pain on an individual, limiting evaluation of management options. 31 Pain behaviour alone is also a challenging construct for individuals who are unable to self-report. Although pain behaviour may be appropriate for acute pain, it does not sufficiently capture chronic pain. 32 Many commonly observed pain behaviours also frequently occur for other reasons in individuals with disability, such as movements due to abnormal tone in those with cerebral palsy. 32 To our knowledge, there has not been a review completed on assessment tools for the domain of pain interference in individuals with chronic pain and the inability to self-report.
The objectives of this systematic review were the following: (1) identify the available tools for assessing pain interference in children, adolescents, and adults with chronic pain and the inability to self-report; (2) determine the psychometric properties of the retrieved tools including validity, reliability, and responsiveness; (3) determine the feasibility and clinical utility of the retrieved tools for use in clinical practice.

M ET HOD
The protocol for this systematic review was prospectively registered on PROSPERO (CRD42022310102) and the JBI systematic review database; it can be accessed online (at https:// www.crd.york.ac.uk/prosp ero/displ ay_record.php?Recor

What this paper adds
• The Paediatric Pain Profile is the most appropriate tool for children and adolescents who cannot self-report. • Limited tools are available for adults who cannot self-report. • Three tools (PROMIS PPPIS, BAPQ-P, mBPI) show promise for children and adolescents who cannot self-report. • Three tools (PROMIS PIS-proxy, Doloplus-2, BPI-proxy) show promise for adults who cannot self-report. dID=310102). The review was designed in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) statement updated guidelines. 33 Two reviewers (MGS, LCF) completed title/abstract screening, full-text screening, data extraction, and quality assessment for all of the identified articles. They double-screened all the articles and both completed data extraction independently for all of the identified articles. Any disagreements between reviewers were resolved by discussion and a third author (ARH).

Search strategy
A comprehensive search strategy was designed by all authors and a medical librarian. An initial search was used to identify assessment tools that met the inclusion criteria, which also identified some articles on the psychometric properties of these tools for appraisal. A second subsequent search was then undertaken to ensure all articles assessing psychometric properties of the included tools were identified. Both searches included an adapted version of the translated COnsenus-based Standards for the selection of health Measurement INstruments (COSMIN) sensitive search filter for Ovid MEDLINE, developed to precisely retrieve articles on studies of psychometric properties. 34 The adapted filter was translated for Ovid Embase and PsycInfo. The first search was completed in MEDLINE, Embase, and PsycInfo from database inception to 15th February 2022. The second subsequent search was completed in the same databases, from database inception to 29th March 2022. Citation searching and targeted reference scanning were also used to minimize the likelihood of missing key articles. A 'late breaking' search was completed on 29th July 2022 to identify any recently published, relevant articles. The individual search strategies for each stage are provided in Appendix S1. The electronic databases were searched by one author (MGS) and exported to an online systematic review management software, Covidence (Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia; available at www.covid ence.org).

Inclusion and exclusion criteria
The focus of this review is on individuals unable to self-report owing to limited expressive language, which may be the result of a communication, motor, or cognitive impairment. Emerson and Bursch suggest that the ability to self-report pain develops between 2 and 7 years of age. 35 The ability to self-report in this age range depends on the communication ability of the child, which can be variable for children of the same age, and the complexity of the assessment tool. 35 To ensure all potential tools were captured, tools were included if designed for individuals older than 2 years of age. Table 1 outlines detailed inclusion and exclusion criteria. Briefly, articles were included if they studied psychometric properties of an assessment tool that could be used to assess pain interference in children, adolescents, or adults with chronic pain and the inability to self-report. Exclusion criteria included assessment tools that (1) did not assess pain interference as a primary or secondary focus (one or fewer pain interference items, only assessed pain behaviour or assessed presence of pain during function rather than pain interference with function); (2) did not assess chronic pain (recall period <24 hours); (3) were administered through self-report; or (4) were developed for assessing pain specific to a condition other than generalized chronic pain (e.g. headaches). The psychometric properties of the tools were reported according to the criteria outlined in the COSMIN manual for evaluating the methodological quality of studies on measurement properties. [36][37][38][39] Studies that used the tool solely as an outcome measure in an intervention study were excluded.

Study selection
Titles and abstracts generated by both searches were screened against the eligibility criteria independently by T A B L E 1 Inclusion and exclusion criteria for the search.

Inclusion criteria Exclusion criteria
Children, adolescents, and adults (>2 years) Children younger than 2 years where self-report is not developmentally appropriate Experiencing chronic pain lasting longer than 3 months or longer than the expected time to heal Pain identified in the article does not align with chronic pain as defined in this study (e.g. acute pain, procedural pain) Assessment tool can be used by individuals who are unable to self-report The tool is administered by self-report Pain interference assessment measures containing a subscale that includes pain interference with daily living as a primary or secondary focus Assessment tool did not measure pain interference as a primary or secondary focus Study designs that investigate psychometric properties (could include validity, reliability, responsiveness, and interpretability) Psychometric properties were unreported for the pain assessment tool and/or the assessment tool was not validated Tool is used in an intervention study as an outcome measurement tool Systematic reviews on psychometric properties of assessment tools Other language translations of the included tools two authors (MGS, LCF). The full text of the articles for all included titles/abstracts were obtained and screened against the eligibility criteria again by the same two independent reviewers. Reference lists of selected articles were manually screened for additional articles meeting the eligibility criteria.

Data extraction and quality assessment
Data extraction was conducted independently by two reviewers (MGS, LCF) to minimize errors. Once completed, both reviewers reviewed data extraction together using a customized template in Covidence. Data were extracted and recorded under the following two headings. (1) Characteristics of the observer-reported outcome measure (ObsROM) or clinician-reported outcome measure (ClinROM): construct(s), target population, mode of administration, scales/subscales, number of items, response options, range of scores, time taken to complete, and available translations. (2) Characteristics of the studies examining psychometric properties: title, name of tool, psychometric properties assessed, population, and results. As there was significant variation in the terminology for different psychometric properties used by authors, the COSMIN definitions were used to determine the psychometric property assessed. 36 Where studies aimed to validate a single tool, we considered the psychometric property evidence of that singular tool. In studies that compared agreement between multiple tools, psychometric evidence for each of the eligible tools in this review was extracted. For the property of construct validity, hypotheses to test were developed (MGS, LCF, ARH) as per the COSMIN recommendations. 36 These hypotheses are listed in Table 2 and are based on the generic hypotheses developed by de Vet et al. 40 Risk of bias was formally assessed using the 'COSMIN Risk of Bias tool to assess the methodological quality of studies on reliability or measurement error of outcome measurement instruments' and the 'COSMIN risk of bias checklist for Patient-Reported Outcome Measures (PROMs)'. 37,38 As the focus of our review was on assessment tools for individuals unable to self-report, tools identified were primarily ObsROMs or ClinROMs. 37 Each study was assessed independently and quality assessment completed using the COSMIN risk of bias checklist (MGS, LCF). In addition, regular discussion occurred for all ratings until consensus was reached. For tools that had evidence of content validity in the included studies, two review authors (MGS, LCF) also completed an author rating of the content validity (comprehensiveness and relevance) of the overall tool as per the COSMIN methodology for evaluating the content validity. 39 Study results for each tool and psychometric property were qualitatively pooled and each patient-reported outcome measure was summarized as sufficient (+), insufficient (−), inconsistent (+/−), or indeterminate (?) using the COSMIN criteria for good measurement properties. The overall content validity rating pooled results from the tool development study, content validity studies (if available), and the systematic review author rating. Briefly, each tool was assigned as follows: sufficient (at least 75% of results met the criteria); insufficient (at least 75% of results did not meet criteria); inconsistent (individual study results were unable to be pooled and the inconsistency was unexplained); indeterminate (all study results were indeterminate). The modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach was then used to rate the quality of the overall ratings as 'high', 'moderate', 'low', or 'very low'. Ratings were downgraded on the basis of indirectness, imprecision, risk of bias, and inconsistency, as detailed in the COSMIN guideline for systematic reviews. 36 Recommendations were made for the most suitable assessment tool(s) for measuring pain interference in individuals of varying ages who are unable to self-report. These recommendations were based on overall evidence quality. 36 In accordance with COSMIN, the evidence for content validity and internal consistency, along with feasibility, were prioritized when determining recommendations. 36

Search results
The combined searches resulted in 2751 studies, of which 1976 were retrieved for title and abstract screening after duplicate removal. One hundred and sixty-six studies underwent T A B L E 2 Hypotheses to evaluate construct validity and responsiveness.

1
Correlations with changes in instruments measuring similar constructs (e.g. pain interference, functional disability) should be ≥0.50 2 Correlations with changes in instruments measuring related, but dissimilar constructs (e.g. pain intensity, pain catastrophizing) should be lower, i.e. 0.30-0.50 3 Correlations with changes in instruments measuring unrelated constructs should be <0.30 4 Correlations defined under 1, 2, and 3 should differ by a minimum of 0.10 5 Correlations between the self-report and proxy-report versions should be ≥0.70 6 Meaningful changes between relevant (sub)groups (e.g. patients with expected high vs low levels of the construct of interest) full-text screening and 29 were deemed eligible for inclusion. A further four studies were identified and included through citation searching ( Figure S1). The final number of included studies was 33. These 33 studies (Table S1) reported on the psychometric properties of 10 pain interference assessment tools (Table S2), including nine ObsROMs [41][42][43][44][45][46][47][48][49][50][51][52] and one ClinROM. 9 Of the 10 tools assessed, two had short versions (Child Activity Limitations Interview-9 item version 53 41 is a multidimensional tool, which means it contains a range of subscales each measuring a different domain of pain assessment, such as how an individual copes with pain or how pain interferes with their function. The BAPQ-P included three subscales relevant to this review: social functioning, physical functioning, and development. A summary of the excluded tools is provided in Table S3. Twelve of the studies excluded at full-text review reported on psychometric properties of translated versions of the included tools: Doloplus-2 (Japanese, 54 Chinese, 55,56 Norwegian, [57][58][59] French, 60,61 German, 62 Persian, 63 Dutch 64 ) and Paediatric Pain Profile (PPP) (Brazilian Portuguese 65 ). These articles are summarized in Table S4. Study populations were broad and included children, adolescents, adults, and older adults with a variety of conditions including cerebral palsy, 47,48,66,67 burns, 68,69 traumatic brain injury, 70 juvenile idiopathic arthritis, 43,71,72 juvenile myositis, 73,74 stroke, 75 musculoskeletal injury, 76,77 cancer, 42,50 cognitive impairment, 46,78 and idiopathic scoliosis. 77 Only five studies (three tools) specifically investigated psychometric properties in the population of individuals unable to self-report. [46][47][48]78,79 The remaining 28 studies reported on a pain interference assessment tool administered by a method that could be used with individuals unable to self-report; however, psychometric properties were assessed in populations that could selfreport. A summary of the pooled results for each tool and psychometric property, including author rating of content validity, is provided in Table S5. The risk of bias score for the psychometric properties in each individual study is reported in Table S6.

BAPQ-P
The BAPQ-P was reported in three studies, with a pooled sample of 598 child/parent dyads as the same participant sample was used by Eccleston et al. and Cohen et al. 41,80,81 The BAPQ-P was developed for adolescents aged 11 to 18 years; however, the age range in the included studies was 5 to 18 years. The BAPQ-P content validity rating was assigned on the basis of a recent systematic review, in line with COSMIN recommendations to not re-rate content validity. 82 The BAPQ-P content validity study was tested in a population who could self-report. The other psychometric properties reported for this tool were internal consistency, reliability, and construct validity. Sufficient internal consistency was reported in two studies but rated as 'low' quality because there was no evidence of structural validity. 41,80 Both test-retest reliability ('low' quality) and construct validity ('moderate' quality) were sufficient but downgraded owing to indirectness. 41,80,81 Reliability was further downgraded as only one study of adequate quality was available. 80 Child Activity Limitations Interview-Parent The Child Activity Limitations Interview-Parent (CALI-P) was developed for children aged 8 to 16 years. The psychometric properties of the CALI-P were reported in four studies, with an age range of 8 to 18 years. [43][44][45]53 The overall content validity rating was of very low quality owing to an inadequate development study and no further content validity studies. [43][44][45]53 Structural validity was reported for both the 21-and 9-item versions of the CALI-P; however, results were inconsistent. 44,53 Other psychometric properties reported for the CALI-P included internal consistency, reliability, construct validity, and responsiveness. The quality of all properties was downgraded owing to indirectness. Internal consistency was further downgraded to 'low' as it cannot be of higher quality than structural validity. [43][44][45]53 Reliability was assessed as 'very low' owing to 'doubtful' and 'inadequate' quality studies. [43][44][45] Responsiveness was only reported in one study, resulting in a low-quality rating. 43

PPP
The PPP is designed for children aged 1 to 18 years. Psychometric properties of the PPP were assessed in two studies of children aged 1 to 18 years. 48,79 Content validity was assessed in one measure development study and was rated as 'low' owing to a lack of content validity studies. 48 The development study was completed through 46 interviews and a survey of 121 parents and caregivers of children with severe and life-limiting conditions who were unable to self-report. Other psychometric properties reported were internal consistency, test-retest reliability, construct validity, and responsiveness. All psychometric properties were assessed in children who were unable to self-report. Internal consistency was rated as 'low' owing to inconsistency and no reported structural validity. 48,79 Two studies contributed to 'high' quality test-retest reliability and construct validity ratings. 48,79 One study demonstrated 'moderate' quality responsiveness to change. 79

PROMIS PPPIS
The PROMIS PPPIS was designed for children aged 5 to 17 years. It had the highest number of relevant studies (n = 16), with ages of participants ranging from 2 to 18 years. Three studies demonstrated its content validity. 51,66,83 This was, however, subsequently downgraded to 'moderate' quality as content validity was not specifically assessed for children unable to self-report. 51,66,83 Internal consistency was reported in one study but downgraded to 'low' owing to no reported structural validity. 77 As the PROMIS PPPIS is a direct translation of the self-report version, and the self-report version does have evidence for sufficient structural validity, it was not appropriate to further downgrade it to 'very low'. 84 Results were pooled for parent/child interrater reliability (Table S5); however, reliability was still rated as 'low' owing to inconsistent results and indirectness. 68,69,71,83,85 Construct validity was assessed in 11 studies, but downgraded to 'moderate' owing to indirectness. [67][68][69][70][71][72][73]77,[86][87][88] When results were pooled, 76% of the hypotheses assessed were supported (Table S5) which met the 75% threshold for sufficient construct validity. 'Moderate' quality evidence for measurement invariance was found in the study conducted by Varni et al. who determined no significant difference between scores of children aged 5 to 7 years and those aged 8 to 17 years. 83 The PROMIS PPPIS psychometric properties were also analysed individually for the fixed short form, computer adaptive test, item bank, and custom short form versions (Table S7). Subgroup analysis of reliability and construct validity found 'very low' quality evidence of insufficient reliability and 'low' quality evidence of insufficient construct validity for the custom short form. 68,69 The fixed short form had the strongest evidence of the four versions with 'very low' quality internal consistency, 77 'low' quality reliability, 71,74,77,86,87 and 'low' quality construct validity. 71,73,77,86,87 Tools for use with children, adolescents, and adults

Modified Brief Pain Inventory
Only one study 47 investigated the psychometric properties of a proxy-report version of the modified Brief Pain Inventory (mBPI), described by Engel et al. 89 This study took place primarily in children with cerebral palsy who were unable to self-report (mean age 9 years); however, it included adults up to 34 years of age. 47 No content validity studies are available. The psychometric properties investigated were internal consistency, which had a 'low' quality rating owing to lack of reported structural validity, and construct validity, with 'high' quality evidence. 47 The mBPI is a promising tool that requires further investigation.

Pain Burden Inventory -Caregiver Report
The Pain Burden Inventory -Caregiver Report was designed for young people aged 7 to 21 years. It had no reported content validity studies. One study investigated its psychometric properties (mean age of participants was 14 years, age range not given), reporting internal consistency, child/parent interrater reliability, and construct validity. 49 This study was performed in a population of children who were able to self-report. 49 Internal consistency, reliability, and construct validity were all downgraded to 'low' owing to indirectness and imprecision. Internal consistency and reliability were further downgraded to 'very low' as the single study was of 'doubtful' quality. 49 Pain Interference Index-Parent The Pain Interference Index-Parent was reported in one study of children and young people aged 6 to 25 years. 50 It had 'low' quality content validity owing to a lack of studies and indirectness. 50 It was investigated in a population of children and young people who were able to self-report. 50 Internal consistency and construct validity were sufficient and indeterminate respectively; however, both were of 'very low' quality evidence. This was due to indirectness, imprecision, and the quality of the included study ('doubtful'). 50

Brief Pain Inventory -Proxy Report
The Brief Pain Inventory was reported in one study as a proxyreport tool in a study of adult patients with cancer, aged 31 to 88 years. 42 Participants were spouses, children of the patient, siblings, friends, or other relatives and were asked to 'answer as if they were the patient'. 42 The only psychometric property investigated was interrater reliability (patient vs proxy), which was found to be insufficient with 'low' quality evidence. 42 Doloplus-2 The Doloplus-2 was investigated in two studies, with participants aged 65 to 96 years. 46,78 The content validity rating was 'very low' owing to a lack of content validity studies and inadequate measure development. 46 Structural validity was also 'very low' as only one study of inadequate quality was available and therefore internal consistency was also rated 'very low'. 78 Inter-and intrarater reliability was pooled (Table S5) and deemed sufficient and 'high' quality in two studies. 46,78 Construct validity was downgraded to 'moderate' quality as only one study of 'adequate' quality was available. 78 Cross-cultural validity was rated as 'very low' as the only available study was 'inadequate'. 46

PROMIS PIS-proxy
The PROMIS PIS-proxy had no reported content validity as it was developed as a self-report measure only for adults aged 18 years and older. Two studies reporting psychometric properties used the self-report measure as a proxy-report tool, with participants older than 62 years. 75,76 'Moderate' quality evidence for insufficient structural validity was reported in one study. 75 Both internal consistency and reliability were reported as indeterminate and of 'low' quality owing to indirectness and risk of bias, as the two studies were of 'doubtful' quality. 75,76 Construct validity was reported; however, this was downgraded to 'low' owing to imprecision (small pooled sample size) and indirectness. 75,76 Responsiveness was also downgraded to 'very low' owing to imprecision, indirectness, and risk of bias as only one study of adequate quality was available. 75 Overall, 10 tools were identified (Table 3). Four were for use with children/adolescents, three with adults, and three that could be used across all populations. Only one tool, the PPP, had low-quality evidence for content validity and internal consistency in a population unable to self-report. It was designed for children and adolescents. None of the tools for the adult population had evidence for content validity and internal consistency. No tool had evidence for all nine psychometric properties.

DISCUS SION
This systematic review identified one tool, the PPP, with strong psychometric properties for assessing pain interference in children and adolescents (<18 years) who are unable to self-report. It is currently the only tool with sufficient quality evidence for content validity and internal consistency investigated in a population who cannot self-report. 48,79 These are the two psychometric properties that COSMIN require to recommend a tool for use. 36,90 The PROMIS PPPIS fixed short form is also recommended with a caution that psychometric testing has not specifically taken place in children older than 7 years who are unable to self-report. The mBPI is a promising tool that has established psychometric properties in children who are unable to self-report but requires further testing to provide evidence of content and structural validity.
There is lower-quality evidence for tools for use with the adult population. The tools available for the adult population should be used with caution as they require further investigation of their psychometric properties. The mBPI, although primarily tested in children, included an age range up to 34 years in one study, indicating this could also be an option for adults. The Doloplus-2 is promising for use in older adults but requires further evidence of content validity and internal consistency. Other tools that seem promising but require further content validity testing in populations that are unable to self-report are the BAPQ-P for adolescents (11-18 years) and the PROMIS PIS (as a proxy version) for adults.
All of the identified tools were feasible for use on the basis of availability, time required, and cost to complete (Table S2). Historically, pain intensity has been the outcome focus for pain management intervention. However, the way pain interferes with function is now understood as more important to individuals with chronic pain. 31 This shift in thinking probably explains why all of the tools identified for assessing pain interference were developed in the past 20 years.
In this review, we identified nine ObsROMs and one ClinROM for assessing pain interference in individuals who are unable to self-report. ObsROMs differ from ClinROMs as they are 'observations made, appraised and recorded by a person other than the patient who does not require specialized professional training, e.g. proxy measures'. 37 In line with COSMIN recommendations, the COSMIN risk of bias checklist for patient-reported outcome measures was applied to ObsROMs, and the risk of bias tool for ClinROMs was applied to ClinROMs. 37,38 The ClinROM (Doloplus-2) was administered by health professionals (nurses) through observation only, whereas the PPP was administered by a combination of direct observation and proxy-parent report, with sufficient interrater reliability in health professionals and parents regardless of familiarity with the child. 48,79 All of the other tools were administered by proxy-report only.
The accuracy of proxy-report was assessed in the included studies as either construct validity (correlations between proxy and patient scores) or interrater reliability (agreement between proxy and patient scores). These studies were conducted in populations of individuals who could self-report. Reliability results were, where possible, pooled for each tool (Table S5). COSMIN advise that, in the case of inconsistent results, studies only of 'adequate' and 'very good' quality can be used to determine the overall rating. As such, the pooled ranges for the PROMIS PPPIS parent/child interrater reliability did not include the results of Alcantara et al. as this study reported significantly different results and was rated as 'doubtful'. 36,86 Although most of the tools demonstrated sufficient agreement (intraclass correlation coefficient >0.70), it is unclear to what extent patient-proxy agreement is influenced by the ability to self-report pain. For example, an individual who can communicate with their proxy, either verbally or through assistive and augmentative communication devices, about how pain interferes with their function, may have a higher level of agreement than an individual who is completely reliant on their proxy to interpret the individual's pain. The PROMIS PPPIS reported higher agreement between parents and children when the parent or child themselves reported chronic pain. 88 This highlights the improved accuracy of the tool when chronic pain has already been identified and may be less useful in populations where pain is present but has not been identified.
The reliability of the proxy-report may be influenced by many factors. For example, higher levels of pain catastrophizing reported by caregivers using the PROMIS PPPIS resulted in overestimation of proxy-reported child pain interference. 85 The included studies most commonly used mothers or female caregivers as proxy reporters, but some also included fathers, grandparents, siblings, children, household members, and friends. Unfortunately none of the studies compared the accuracy of different proxies. Further research to understand the relationship of personal, social,

T A B L E 3
Overall rating and quality of evidence for chronic pain interference tools for children and adults who are unable to self-report. Criteria for content validity rating.
a Overall content validity is sufficient (+), insufficient (−), inconsistent (±), (indeterminant [?] rating not possible owing to availability of reviewers rating). +, The relevance rating is +, the comprehensiveness rating is +, and the comprehensibility rating is +. -, The relevance rating is -, the comprehensiveness rating is -, and the comprehensibility rating is -. ±, At least one of the ratings is + and at least one of the ratings is -. ?, Two or more of the ratings are rated ?. f +, No important differences found between group factors (such as age, sex, language) in multiple group factor analysis or no important differential item functioning for group factors (McFadden's R 2 < 0.02). ?, No multiple group factor analysis or differential item functioning analysis performed. −, Important differences between group factors or differential item functioning were found.
g Overall rating criteria.
A, Recommended for use and results obtained can be trusted. Assessment tools with evidence for sufficient content validity (any level) and at least low-quality evidence for sufficient internal consistency. B, Potential to be recommended for use, but they require further research to assess the quality. Assessment tools that do not fit criteria A or C. C, Not recommended for use. Assessment tools with high-quality evidence for an insufficient measurement property.

T A B L E 3 (Continued)
and cultural factors on a proxy's ability to accurately report pain interference is required. 4 Despite these factors, this review identifies proxy-report as one of only two available and feasible methods of assessing pain interference in individuals who are unable to self-report. We consider this is preferable to no assessment. It has been previously suggested that combining self-report, psychophysical methods of assessment (e.g. quantitative sensory testing), direct observation, and caregiver report may provide the most reliable identification of pain in people who are unable to self-report. 4 The PPP is the only tool identified in this review that uses a combination of approaches.
Although this review has focused on individuals who are unable to self-report by any method, it should be highlighted that, for individuals who do have some ability to self-report, self-reported communication should be supported and prioritized. Individuals should be supported to use their usual method of communication, whether this be 'high-tech' or 'low-tech'; however, a recent systematic review has suggested that lower-tech methods may be more feasible for use in real-world contexts. 91 This may involve adapting existing assessment tools to be more appropriate for use with augmentative and assistive communication devices; this can be done through strategies such as shortening the length of tools and adapting response options to be consistent with those available to the individual, either on their device or through other expression.
Only three of the included tools (PPP, Doloplus-2, and mBPI) were tested in populations of individuals who could not self-report. In addition, only two of these tools (PPP and Doloplus-2) had content validity in such populations. [46][47][48]78,79 This was a significant limitation in recommending tools for use in clinical practice. Content validity is considered the most important psychometric property by COSMIN, as it determines whether the tool measures the construct it claims to, and how relevant, comprehensive, and comprehensible the tool items are to the population of interest. 36 One parent-report tool (CALI-P) was developed without input from parents. 43 Two of the tools (mBPI and PBI-P) were direct translations of the self-report tools to proxy-report without re-testing content validity. A further two of the tools (PROMIS PIS and Brief Pain Inventory-Proxy) were not developed into a proxy-report form and instead asked the proxy to 'respond from the patient-proxy perspective'. 42,75 The mBPI was modified by Engel et al. to be used as an outcome measure in neuromuscular disease without any robust content validity investigation, and then further modified for proxy-report with other psychometric properties investigated. 47,89 COSMIN advises investigating the evidence for content validity of a tool first, followed by structural validity/internal consistency, then investigation of all other psychometric properties. 36,39 This aims to reduce wasting resources on investigating the psychometric properties of a tool without sufficient content validity.
Only one of the tools identified (PROMIS PPPIS) investigated measurement invariance by comparing the proxy reports of children aged 5 to 7 years with those aged 8 to 16 years. 83 Establishing content validity for people who are unable to self-report is important; however, it can be challenging if those individuals are unable to participate in qualitative research. This is particularly difficult in the adult population where individuals may not have a consistent caregiver. An alternative method of assessing the psychometric properties of translated proxy-report tools is to investigate measurement invariance in different populations. 40,92 This may be achieved by comparing groups of individuals with the same condition and severity level, but differing abilities to self-report.
The Doloplus-2 and the PPP were the only tools identified in this review with information on test-retest reliability (intra-and interrater reliability) and measurement error. 46,48,78,79 Reliability is considered an essential psychometric property by COSMIN, but should be investigated after sufficient content validity and internal structure of the tool are obtained. 36,40 The most important reliability measure for clinical practice is test-retest reliability and measurement error. The remaining studies assessed interrater reliability between the self-report and proxy-report versions only. This poses a potential limitation to interpretation of scores in clinical practice, as the smallest detectable change score is not available. Finally, none of the included articles assessed reliability or responsiveness to change of the tools over more than two time points. Further investigation of this would be beneficial in understanding the usefulness of these tools in longitudinal studies.
All of the tools identified had good face validity and feasibility. The utility of the tools, however, could be affected by the degree of other impairments an individual who is unable to self-report may also have. For an individual with reduced functional mobility as well as the inability to selfreport, a tool such as the PROMIS PPPIS or PROMIS PISproxy will probably be difficult to use as many of the items relate to higher-level mobility tasks. Given the frequent cooccurrence of cognitive impairment and significant physical disability, clinicians and researchers need to consider the relevance of the tool items to the population they are assessing.
This review has highlighted the lack of chronic pain interference assessment tools for adults who are unable to selfreport. Only 3 of the 10 tools were specifically for use in the adult population, with a further three having been designed for children/young people but also used in younger adults (up to 34 years of age). The only tool that was investigated specifically in an adult population who could not self-report was the Doloplus-2. The mBPI was also tested in a population who could not self-report, primarily in children, but included adults up to 34 years of age. This may be due to a lack of focus on pain management for older adults with cognitive impairment, or the historical under-servicing of those with neurodevelopmental disability once they transition to adult services. 14 Several of the excluded tools assessed pain behaviour or pain intensity in adults who were unable to selfreport, but only in acute settings such as intensive or postoperative care. It is important that adults with the inability to self-report are given the same opportunity to express how pain interferes with their function and to access evidencebased chronic pain management. The lack of available tools may also highlight the difficulty obtaining proxy-report when many adults do not have consistent caregivers in the same way children do.
This study had several strengths and limitations. A strength of this review was the use of COSMIN, the recommended standard for assessing the measurement properties of outcome measurement instruments. 90 The review also focused on a specific area of chronic pain assessment which, to our knowledge, has not been reviewed, in a vulnerable group with historically under-reported pain. 4 A potential limitation was the decision to exclude studies reporting psychometric properties of other language versions of the included tools. It should be noted that no tool was identified that was solely in a language other than English; all the tools that were reported in other languages had a translated version in English (Tables S2 and S4). In addition, relevant studies in any language were included. This decision was made as COSMIN advise new translations of an outcome measurement instrument should have psychometric properties investigated independently from other language versions, with content validity of primary importance when recommending tools for use. 40 The quality assessment process for COSMIN content validity involves a subjective rating (by the systematic review authors) of the relevance and comprehensiveness of the assessment tool itself, including rating item wording and response options (Table S5). Each different language version of an assessment tool contains linguistic or cultural nuances that cannot be accurately assessed by someone who is not fluent in that particular language. As none of the review authors were fluent in a language other than English, it was not deemed appropriate to provide a content validity rating of the other language translations of the included tools. This is a limitation; however, it is preferable to providing an inaccurate rating of content validity (and therefore inaccurate recommendations for use) of other language translations of the included tools. Overall, the available data and summarized results for each included tool were comprehensive and not limited by this exclusion.
Several commonly used tools were excluded from this review as they did not meet the eligibility criteria for 'pain interference'. Tools such as the Functional Disability Inventory 93 and the World Health Organization Disability Assessment Schedule, 94 which assess overall health/wellbeing or disability, were excluded as they were not specific to pain-related functional disability. 93,94 We determined a recall period of at least 24 hours was appropriate to assess the construct of chronic pain, acknowledging it is difficult for individuals to recall greater periods of time (e.g. >3 months as per the definition of chronic pain). For this reason, the Non-communicating Children's Pain Checklist-Revised, although previously used in chronic pain studies, was not included as its recall period is within the past 2 hours. 95 The exclusion of these tools was not considered to be a significant limitation. The included tools can be used to specifically assess the interference of chronic pain on an individual's life, which is an important and under-reported aspect of pain assessment. 31 Future research should focus on demonstrating further evidence for psychometric properties of pain interference assessment tools that could be used with people who are unable to self-report and have chronic pain, particularly the adult population. Evidence for content validity should be prioritized, which may require tools to be modified to better represent the specific population or condition of interest. Recommendations for a combination of pain interference assessment tools that use different methods such as direct observation, proxy-report, and physical testing could be valuable for people who are unable to self-report. Research using ethnographic approaches, or innovative technology, to investigate pain communication in people who are unable to self-report may provide greater insight into the individual's pain experience, making progress towards selfdetermination for this population. 96 Finally, examination of the strengths and limitations of ClinROMs compared with ObsROMs for people who are unable to self-report may also provide guidance for clinicians and researchers when selecting assessment tools.

CONCLUSION
This review has identified 10 tools for assessing chronic pain interference in children and adults who are unable to self-report. The PPP is recommended for use in children and adolescents (<18 years). No recommendation can be made for tools for use in adults without further psychometric testing.

R E F E R E NC E S SU PP ORT I NG I N FOR M AT ION
The following additional material may be found online: Appendix S1: Search strategies. Figure S1: PRISMA flow diagram for studies identified via searches of databases and citation searching. Table S1: Characteristics of included studies. Table S2: Characteristics of pain interference assessment tools for individuals unable to self-report.