Measuring patient experience: a systematic review to evaluate psychometric properties of patient reported experience measures (PREMs) for emergency care service provision

Abstract Purpose Knowledge about patient experience within emergency departments (EDs) allows services to develop and improve in line with patient needs. There is no standardized instrument to measure patient experience. The aim of this study is to identify patient reported experience measures (PREMs) for EDs, examine the rigour by which they were developed and their psychometric properties when judged against standard criteria. Data sources Medline, Scopus, CINAHL, PsycINFO, PubMed and Web of Science were searched from inception to May 2015. Study selection Studies were identified using specific search terms and inclusion criteria. A total of eight articles, reporting on four PREMs, were included. Data extraction Data on the development and performance of the four PREMs were extracted from the articles. The measures were critiqued according to quality criteria previously described by Pesudovs K, Burr JM, Harley C, et al. (The development, assessment, and selection of questionnaires. Optom Vis Sci 2007;84:663–74.). Results There was significant variation in the quality of development and reporting of psychometric properties. For all four PREMs, initial development work included the ascertainment of patient experiences using qualitative interviews. However, instrument performance was poorly assessed. Validity and reliability were measured in some studies; however responsiveness, an important aspect on survey development, was not measured in any of the included studies. Conclusion PREMS currently available for use in the ED have uncertain validity, reliability and responsiveness. Further validation work is required to assess their acceptability to patients and their usefulness in clinical practice.


Background
Hospital Emergency Departments (EDs) assume a central role in the urgent and emergency care systems of countries around the world. Each and every patient attending ED should receive the highest quality of care. Currently, this is not always the case [1][2][3][4]. In the United Kingdom, for example, the 2014 Care Quality Commission report identified substantial variation in the care provided by EDs.
Patient experience is one of the fundamental determinants of healthcare quality [5]. Studies have demonstrated its positive associations with health outcomes [6][7][8][9][10][11]. Opening up dialogue between patients and providers by giving patients a 'voice' has proved key to improving quality of clincal experience [9]. Accordingly, there have been efforts made around the world to improve patient experience. In the UK, delivering high quality, patient-centred care has been at the forefront of health policy since 2008 [12,13]. The UK government stated the importance of public involvement in prioritization of care needs and has recognized the significance of patient and public participation in the development of clinical services [14,15]. It will only be possible to know if interventions and changes in practice are successful if processes and outcomes are measured.
To be able to identify where improvements in patient experience are required and to judge how successful efforts to change have been, a meaningful way of capturing what happens during a care episode is required. Patient reported experience measures (PREMs) attempt to meet this need. A PREM is defined as 'a measure of a patient's perception of their personal experience of the healthcare they have received'. These questionnaire-based instruments ask patients to report on the extent to which certain predefined processes occurred during an episode of care [16]. For example, whether or not a patient was offered pain relief during an episode of care and the meaning of this encounter.
PREMs are now in widespread use, with both generic and condition-specific measures having been developed. The Picker Institute developed the National Inpatient Survey for use in the UK National Health Service. This PREM, which has been used since 2002, is given annually to an eligible sample of 1250 adult inpatients who have had an overnight stay in a trust during a particular timeframe. The results are primarily intended for use by trusts to help improve performance and service provision, but are also used by NHS England and the Department of Health to measure progress and outcomes.
Such 'experience'-based measures differ from 'satisfaction'-type measures, which have previously been used in an effort to index how care has been received. For example, while a PREM might include a question asking the patient whether or not they were given discharge information, a patient satisfaction measure would ask the patient how satisfied they were with the information they received. Not only are PREMS therefore able to provide more tangible information on how a service can be improved, they may be less to prone to the influence of patient expectation, which is known to be influenced by varying factors [17][18][19][20][21].
A number of PREMs have been developed for use within the ED. If the results from these PREMs are to be viewed with confidence, and used to make decisions about how to improve clinical services, it is important that they are valid and reliable. This means an accurate representation of patient experience within EDs (validity) and a consistent measure of this experience (reliability). If validity and reliability are not sound there is a risk of imprecise or biased results that may be misleading. Despite this, there has, to date, been no systematic attempt to identify and appraise those PREMs which are available for use in ED.
Beattie et al. [22] systematically identified and assessed the quality of instruments designed to measure patient experience of general hospital care [22]. They did not include measures for use in ED. This is important as there is evidence that what constitutes high quality care from a patient's perspective can vary between specialties, and by the condition, or conditions, that the person is being treated for [22][23][24][25]. Stuart et al. [26] conducted a study in Australia where patients were interviewed about what aspects of care mattered most to them in the ED. Patients identified the interpersonal (relational) aspects of care as most important, such as communication, respect, non-discriminatory treatment and involvement in decision-making [26]. This differs to what matters most to inpatients, where a survey in South Australia revealed issues around food and accommodation to be the most common source of negative comments and dissatisfaction [27].
This review aims to systematically identify currently reported PREMs that measure patient experience in EDs, and to assess the quality by which they were developed against standard criteria.

Study objectives
The objectives of this review are as follows: • To identify questionnaires currently available to measure patient experience in EDs. • To identify studies which examine psychometric properties (validity and reliability) of PREMs for use in ED.
• Critique the quality of the methods and results of the measurement properties using defined criteria for each instrument.
Primarily, these objectives will lead to a clearer understanding of the validity and reliability of currently available instruments. This will support clinician and managerial decision-making when choosing a PREM to use in practice.

Eligibility criteria
Measure selection criteria were (i) description of the development and/or evaluation of a PREM for use with ED patients; (ii) instrument designed for self-completion by participant (or a close significant other, i.e. relative or friend); (iii) participants aged 16 years or older; (iv) study written in English. Exclusion criteria were (i) studies focusing on Patient Reported Outcome Measures or patient satisfaction; (ii) review articles and editorials.

Search strategy
Six bibliographic databases (MEDLINE, Scopus, CINAHL, PsycINFO, PubMed and Web of Science) were searched from inception up to December 2016. These searches included both free text words and Medical Subject Headings (MeSH) terms. The keywords used were 'patient experience' OR 'patient reported experience' OR 'patient reported experience measure'; 'emergency medical services' (MeSH); 'measure' OR 'tool' OR 'instrument' OR 'score' OR 'scale' OR 'survey' OR 'questionnaire'; and 'psychometrics' (MeSH) along with Boolean operators. Appendix 1 outlines the specific Medline search strategy used.
The Internet was used as another source of data; searches were conducted on Picker website, NHS surveys website and CQC, along with contacting experts in the field, namely at the Picker Institute.
Finally, the reference lists of studies identified by the online bibliographic search were examined.
The search methodology and reported findings comply with the relevant sections of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [28].

Study selection
Articles were screened first by title and abstract to eliminate articles not meeting inclusion criteria. This was completed by two reviewers. Where a decision could not be made on the basis of the title and abstract, full text articles were retrieved.

Data collection process
Using a standardized form, L.M. extracted the following information: name of instrument, aim, the target population, sample size, patient recruitment information, mode of administration, scoring scale, number of items/domains and the subscales used. This was also completed separately by J.A.

Quality assessment tool
A number of frameworks exist to evaluate the quality of patientreported health questionnaires and determine usability within the target population. This study utilized the Quality Assessment Criteria framework developed by Pesudovs et al. which has been used in the assessment of a diverse range of patient questionnaires [29][30][31].
The framework includes a robust set of quality criteria to assess instrument development and psychometric performance. The former includes defining the purpose of the instrument and its target population, the steps taken in defining the content of the instrument, and the steps involved in developing an appropriate rating scale and scoring system. The latter focuses on validity and reliability, as well as responsiveness and interpretation of the results. Some aspects of the Quality Assessment Criteria framework were relevant to development of questionnaires in which the patient reports on health status only rather than care experience. These were not considered when evaluating the PREMs. Table 1 outlines the framework used to assess how the measure performs against each criterion. Within the study, each PREM was given either a positive (✓✓), acceptable (✓) or negative rating (X) against each criterion.
Each PREM was independently rated by two raters (L.M., J.A.) against the discussed criteria. Raters were graduates in health sciences who had experience in PREM development and use. They underwent training, which included coding practice, using sample articles. Once the PREMs had been rated, any disagreements were resolved through discussion.

Study selection
Study selection results are documented with the PRISMA flow diagram in Fig. 1. A total of 920 articles were identified, of which 891 were excluded. Full text articles were reviewed for the remaining 29 articles, after which a further 21 articles were excluded for the following reasons: duplication of same publication (n = 8), patient satisfaction measure rather than experience (n = 6), protocol only (n = 1), clinician experience measure (n = 3) and PREM not specific to ED (n = 3). A total of eight papers met the inclusion criteria representing four different PREMs.

Characteristics of included studies
Study characteristics are summarized in Table 2. All eight studies were conducted after 2008 within Europe. Three studies described the development of a PREM using qualitative data to elicit concepts. The other five studies evaluated psychometric development of the PREMs. Four were original studies and one further evaluated and developed the psychometric testing of an original instrument [32]. Within these four original studies there was variety in the recruitment process. Two were multicentre studies in hospital trusts [33,34], one targeted a single specific hospital trust [35] and one recruited through general practice [36]. All five studies assessing the psychometric development of PREMs had over 300 participants with a mean age range of 51-56. Not all measures reported specific age ranges and one did not discuss participant demographics [37]. Two of the studies recruited using purposive sampling [33,34], one through a systematic random sample [35] and one used a geographically stratified sample combined with random digit dialling for telephone surveys [37].
All of the studies utilized postal self-completion questionnaires [33][34][35]37], with the Urgent Care System Questionnaire (UCSQ) also incorporating telephone surveys [37]. The length of the PREMs described within the studies varied from 17-84 items across 3-11 domains. Domain contents and names varied, as detailed in Appendix 3, but did cover characteristics identified by the Department of Health [5]. Half focused on the sequential stages of the hospital episode [34,35], whereas others focused on specific areas of care, such as patient participation [33] and convenience [37]. All instruments were administered following discharge from hospital but the time from discharge to completion varied between measures.

Instrument development and performance
A summary of the instrument development is presented in Table 3. All of the measures reported aspects of psychometric testing with evidence that validity was tested more frequent than reliability. Content validity was reported on most often.
The key patient-reported concepts that were incorporated into the quantitative measures through item selection included waiting time, interpersonal aspects of care, tests and treatment, and the environment. Qualitative concept elicitation work revealed similar concepts that were most important to patients [36,39]. CQI-A&E also conducted an importance study to establish relative importance of items within the questionnaire to patients visiting the ED [34]. All measures addressed very similar themes under varying headings.
Item selection was generally well reported with adequate discussion of floor/ceiling effects. Likert scales were used in all bar one study [35], where choice of response scale was not discussed.
Quality appraisal of instrument performance demonstrated a limited level of information on construct validity, reliability and responsiveness throughout all four measures.
All instruments demonstrated the use of unidimensionality to determine homogeneity among items. Of the four measures identified, not one study assessed the responsiveness by measuring minimal clinically important difference.

Methodological quality of the instruments
To our knowledge, this is the first systematic review to identify PREMs for use in the ED and evaluate their psychometric properties. Four PREMs were identified and subjected to an appraisal of their quality. While the developers of each measure reported them to be valid and reliable, the quality appraisals completed within this review do not fully support this position. Further primary studies examining their psychometric performance would be beneficial before the results obtained can be confidently used to inform practice.
Content validity and theoretical development have been well reported across all four PREMs. Item generation through patient participation is important to determine what quality of care means to local populations. It is imperative, however, that it is recognized and this may vary across populations. For example, work carried out to find out what matters to patients in the concept elicitation phase of UCSQ [39] was completed in the UK. If this instrument was to be used in another country, then studies of cross-country validity would have to be completed before using the questionnaire.
Validity and reliability are not an inherent property of an instrument and should be addressed in an iterative manner throughout development. Often, validity and reliability changes over time, as refinements are made. Instrument validity and reliability should be reassessed throughout development to ensure the overall performance is not altered. For example, there are previous versions of the ✓✓-Clear statement of aims and target population, as well as intended population being studied inadequate depth ✓-Only one of the above or generic sample studied X-Neither reported Actual content area (face validity) Extent to which the content meets the pre-study aims and population. ✓✓-Content appears relevant to the intended population ✓-Some relevant content areas missing X-Content area irrelevant to the intended population Item identification Items selected are relevant to the target population. ✓✓-Evidence of consultation with patients, stakeholders and experts (through focus groups/ one-to-one interview) and review of literature ✓-Some evidence of consultation X-Patients not involved in item identification Item selection Determining of final items to include in the instrument. ✓✓-Rasch or factor analysis employed, missing items and floor/ceiling effects taken into consideration. Statistical justification for removal of items ✓-Some evidence of above analysis X-Nil reported Unidimensionality Demonstration that all items fit within an underlying construct.  [38] where focus groups are used to discuss what is important to patients. It is important to keep up to date with changes, as relying on past data can render an instrument poor in terms of validity.
Furthermore, issues around validity of the instrument can change dependent on the data collection process. For example, the UCSQ used both postal and telephone survey to collect data. However, there was no discussion of validation of the PREM for use in both methods.
Disappointingly, for none of the PREMs studied did we find evidence on responsiveness. Responsiveness refers to the ability of an instrument to detect change over time. This is a highly relevant factor if a PREM is to be used to assess how successful an intervention has been to enact change within a service [13]. This review highlights the current gap in studies assessing the responsiveness of PREMs, which should be addressed.
Some instruments appear to have limited positive psychometric properties and caution should be taken when using such measures. This is not to say that these instruments do not have their uses but careful consideration should be taken when selecting an instrument.
Using Pesudovs criteria for quality assessment [29] offered a rigorous and standardized critique of validity and reliability. At times it appeared difficult to fit particular psychometric results into the quality criteria used. For example, CQI-A&E used an important study as part of content validity which did not fall agreeably into any particular quality criteria category. We used consensus discussion to reach agreement on anomalies within the data. Pesudovs     Table continued criteria prove to be a good starting point for assessing psychometric properties of PREM development.

Strengths and limitations of this review
Application of the search strategy identified four PREMs that fitted the inclusion criteria. This low number was expected considering the current advances in the importance of patient experience measures within healthcare and the specificity of the population of an ED. It may be that not all PREMs were identified in the search, but scoping searches and reference list searches attempted to address this issue. Poor reporting and inadequate abstracts may have led to PREMs being erroneously left out in some cases; however, a representative sample has been included.
Data extraction of papers not included in the study was completed by both the main author (L.M.) and supervisor (A.N.) to cross-check data extraction and quality appraisal process. Papers containing PREMs not included in the study were selected to reduce bias in findings. This process allowed assessment of the rate of agreement prior to data extraction of the studies included in the review. Data extraction of studies included within the review was conducted by L.M. and J.A..

Interpretation of findings in relation to previously published work
There is little evidence of similar reviews evaluating the psychometric properties of PREMs for emergency care. Findings regarding the limited information about the reliability and validity of the measures within the general population are supported by outcomes of a recent evidence review conducted by The Health Foundation [40]. This research recognized that hospital surveys often have limited information about their validity and reliability as there is no standardized or commonly used instrument or protocol for sampling and administration [40]. Beattie et al.'s systematic review of general patient experience measures is a useful addition to research [22].

Implications of the review
Concerns are raised by the fact that multiple PREMs have been developed for the same patient population with little concern given to the validation of the measures. It is unknown why researchers continue to develop poorly validated PREMs for the same population. Future research should consider drawing on the most promising existing PREMs as a starting point for the development of new measures. Existing instruments which have not been tested on certain criteria are not necessarily flawed, just untested. Such instruments may give useful information, but should be used with caution. Improving validation will allow them to provide more credible findings for use in future service improvement.

Conclusion
Current PREMs for use within the ED were found to be adequately developed and offer promise for use within clinical settings. The review identified limited PREMs for emergency care service provision, with a low quality rating in terms of instrument performance. Without further work on validation, it is difficult to make recommendations for their routine use, as well as being difficult to draw credible findings from the results they produce. Further development and testing will make them more robust, allowing them to be better used within the population. Looking ahead, it would be of benefit to have a standardized sampling and administration protocol to allow  individual interviews purposively selected from GP practices in one geographic area [38]. A literature review was also conducted as part of this process.
✓ Missing values for postal and telephone surveys ranged from 0 to 4%. This was much higher for satisfaction questions at 12-18%. Some respondents put 'N/A' against answers, demonstrating that a 'does not apply' option was necessary, as some questions were only relevant to some participants. Interpretation of ceiling effects identified a positive skew for telephone survey over postal survey. This may be due to social desirability bias.
✓✓ easier development of PREMs specific to various areas and disease populations.

Funding
This work was supported by the National Institute for Health Research (NIHR), Collaboration for Leadership and Health Research and Care North West Coast and sponsorship from University of Liverpool. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.