Investigating the Bias in Orthopaedic Patient-reported Outcome Measures by Mode of Administration: A Meta-analysis

Background: Patient-reported outcome measures (PROMs) are critical and frequently used to assess clinical outcomes to support medical decision-making. Questions/Purpose: The purpose of this meta-analysis was to compare differences in the modes of administration of PROMs within the field of orthopaedics to determine their impact on clinical outcome assessment. Patients and Methods: The PubMed database was used to conduct a review of literature from 1990 to 2018 with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses protocol. All articles comparing PROMs for orthopaedic procedures were included and classified by the mode of administration. Each specific survey was standardized to a scale of 0 to 100, and a repeated random effectsmodel meta-analysis was conducted to determine the mean effect of each mode of survey. Results: Eighteen studies were initially included in the study, with 10 ultimately used in the meta-analysis that encompassed 2384 separate patient survey encounters. Six of these studies demonstrated a statistically notable difference in PROM scores by mode of administration. The meta-analysis found that the standardized mean effect size for telephone-based surveys on a 100-point scale was 71.7 (SE 5.0) that was significantly higher (P , 0.0001) than survey scores obtained via online/tech based (65.3 [SE 0.70]) or self-administered/paper surveys (61.2 [SE 0.70]). Conclusions: Overall, this study demonstrated that a documented difference exists in PROM quality depending on the mode of administration. PROM scores obtained via telephone (71.7) are 8.9% higher than scores obtained online (65.3, P , 0.0001), and 13.8% higher than scores obtained via self-administered on paper (61.8, P , 0.0001). Few studies have quantified statistically notable differences between PROM scores based solely on the mode of acquisition in orthopaedic

physical status. They are frequently used by clinicians and researchers to assess clinical outcomes to support decision-making. 1,2 Data generated via PROMs influences future research and health policy to guide and improve on healthcare delivery. 3,4 Collection of PROMs has become increasingly common because healthcare systems focus on value-based care, which affects reimbursement. 4,5 PROMs are able to capture data about the patient's mental, physical, and emotional status including pain level, activity level, and functional status at multiple time points along the patient's injury or disease episode. 2,6 Furthermore, PROMs may be obtained in a variety of modalities including in-person surveys, phone calls, online/technology-based surveys, and self-administered/paper surveys. The specific mode of PROM collection may be a confounding variable and cause collection bias; however, few studies report their method of collection. [7][8][9] Researchers have previously investigated the effects of survey mode of administration on PROMs. These studies explore potential biases in fields such as oncology, addiction, and others and have showed that having interviewers present, whether over the phone or in person, can artificially elevate PROM scores up to 15% higher compared with PROM scores done without an interviewer. [10][11][12][13][14][15] This is known as interview bias. A lack of understanding exists of the PROM administration mode bias within orthopaedic surgery, a field in which PROMs are essential. For instance, the American Board of Orthopaedic Surgery now collects PROM data to help inform decision-making for their board certification process. 16 The purpose of this meta-analysis was to compare differences in the modes of administration of PROMs within the field of orthopaedics to determine the impact on clinical outcome assessment. We hypothesized that there would be statistically notable higher PROM scores obtained by telephone when compared with other modes of administration.

Search Strategy
The PubMed database was used to conduct review of the literature from 1990 to 2018 with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses protocol to identify studies that compared the mode of survey administration and patientreported outcomes. Search terms used in the title, MeSH, and keywords included "Data Collection/Methods," "Survey and Questionnaires," "Health Care Survey," "Patient Reported Outcome Measures," "Patient Outcome Assessment," "Musculoskeletal System/surgery," "Musculoskeletal Disease/Surgery," "Orthopaedic Procedures," "Bias," "Interviews," "Telephone," "Postal Service," and "Electronic Mail". References from the included articles were also examined for inclusion that may have been missed by the initial literature search. The details of study identification, screening, inclusion, and exclusion can be found in Figure 1.

Study Selection and Criteria
Study selection for this meta-analysis was determined by two independent reviewers based on the defined selection criteria. Studies were selected for the meta-analysis if they were in the field of orthopaedic surgery, compared the results of PROMs because it pertained to the mode of administration, published in a peer-reviewed journal, and were written in the English language or had translation of text readily available. Studies were excluded if they were not in the field of orthopaedic surgery, had no comparison between modes of PROM administration, were a review article, or were only an abstract.

Meta-analysis
In total, 18 studies were present in this meta-analysis that included a total of 4408 patient encounters who were involved in investigations of patientreported outcome data across multiple modes of administration. Eight studies were excluded because of insufficient data. Ten studies (n = 2384) were ultimately included in the metaanalysis. The studies included in the meta-analysis area summarized in Table 1. All involved patients were receiving treatment of an orthopaedic condition. Each study used a validated patient survey specific to the condition being treated and compared the scores noted among the different modalities of collection. Such modalities include online/technology-based, telephone, in-person interview, inoffice self-administered/paper survey, and postal survey.
PROMs inherently are each scaled differently depending on what outcome is being assessed. Using a linear approach to scale homogenization simplifies interpretation by designating higher scores as more positive clinical outcomes and lower scores as negative outcomes. 17 This approach assumes equal distance between values. Mean scores were transformed using the percentage scale maximum method allowing for normalization of the data on a scale from 0 to 100. Heterogeneity was assessed using general linear model which hypothesized that the studies come from a homogeneous population, asymptotic covariance matrix, and restricted maximum likelihood. A forest plot was created to visually assess the different studies stratified by the mode of survey. Covariance parameters and covariance ratios were analyzed and graphed to determine the parameter effect of any outliers in the data testing for heterogeneity. Restricted Orthopaedic PROM Bias: A Meta-analysis maximum likelihood was conducted to account for the covariance between studies. A repeated random effects model meta-analysis was conducted to determine the mean effect of each mode of survey. This model controlled for heterogeneity because parameters in the model and residuals were held to known values. SAS Enterprise Guide 7.15 HF3 (SAS Institute, Inc) was used to conduct the statistical analysis.

Electronic-/Technology-Based Surveys
Two orthopaedic surgery specific studies included in the meta-analysis showed no notable differences between tablet/ computer and paper survey scores. 18,19 However, their data does show differences in the PROM-specific subscores when assessing the patient data. Regarding differences, Shah et al 20 demonstrated 5% higher scores of the EuroQoL-D Dimensions (EQ5D) and 14% higher scores of the Visual Analog Scale (VAS) with paper surveys in nonsurgical orthopaedic patients, but 25% higher scores of the Neck Disability Index with tablet-based questionnaires. However, the Bojcic group compared traditional paper and pencil to e-mail surveys for patients who recently underwent ACL reconstruction and showed no differences in PROM scores between mode of administration. 21

Postal Surveys
Three articles identified by this metaanalysis examined postal mail's role in PROM acquisition. [21][22][23] All three studies demonstrated excellent agreement between postal mail and other modes including telephone, in-person interviews, and electronic surveys.

Telephone Surveys
Our meta-analysis examined four studies that compared scores obtained via phone with that of other methods of data collection. Of the four, three reported no notable differences in scores between the phone and other modalities that included inperson interview, electronic, paper and pencil, and postal. 22

In-person Interview Surveys
The final mode of PROM acquisition is the direct, face-to-face patient interview. Höher et al 25 examined Lysholm scores at 1-year post-ACL reconstruction obtained by self-administered surveys and direct patient interviews. They found that scores obtained via Preferred Reporting Items for systematic Reviews and Meta-Analyses flow diagram demonstrating orthopaedic patient-reported outcome measure comparison studies. Table 1 Summary          face-to-face interview were notably higher, by up to 3%, than the selfreported scores.

Meta-analysis
Mean scores and ranges are visually similar within the mode of surveys ( Figure 2). The ranges have some overlap, implying that these data have similar characteristics and scores. The difference in scores can be attributed to the different surgeries, surveys, and sample sizes.
The residual maximum likelihood estimate has one observation that displaced the data points. When outliers are removed, a standard panel of influence is obtained when the mean score analysis is iterative using Cook D and covariance ratio statistic that validates the data points used for this meta-analysis ( Figure  3). Some mean scores still had considerable impact on the estimates and residuals.
The average/normalized mean effect size for telephone, postal, online/ technology based, and in-office self-administered/paper surveys are 71.7, 70.3 (P = 0.45), 65.3 (P , 0.0001), and 61.8 (P , 0.0001), respectively ( Table 2). Postal surveys did not have a notable effect size (P = 0.45) in comparison to the effect sizes of the other modes of surveys, likely because of power. Telephone surveys have a notably higher effect size, 71.7 (SE 5.0), P , 0.0001, compared with online-/ tech-based and in-office self-administered/paper survey methods. This indicates that after normalized of scores, PROMs obtained via telephone-based surveys had scores higher than those

Discussion
PROMs are a very important and useful tool in the field of orthopaedics.
They give providers the information necessary to evaluate treatment efficacy and fuel outcome-driven research that defines clinical and surgical decision-making by allowing comparison between studies. Within orthopaedics, PROMs are the main source for assessing patient's subjective outcomes in the setting of clinical research. However, we do not have an agreed on mode of PROM acquisition. Studies tend to publish data as a cumulative set, rather than properly defining collection methods. In addition, the main goal of researchers within this field is to obtain relevant, reliable patient data with high follow-up percentages. Thus, researchers use multiples modes of delivery to acquire PROM data. It is unclear from our review whether data gathered from differing modes of administration provided a more robust data set with less incomplete data. This meta-analysis identified several different studies within orthopaedics that examined results of PROMs based on the mode of acquisition. The four main groups examined were electronic-/technology-based surveys, postal surveys, telephone surveys, and in-person interviews. Differing modes of administration are used for several reasons. Researchers may use basic inperson or paper surveys that are easy to complete and tend to not overwhelm patients. However, other researchers use e-mail, phone calls, and other technology-based methods to administer these surveys that can increase the speed of data acquisition, facilitate data integration, and minimize cost. Several studies suggested that notable differences were present in PROM values based on the mode of acquisition, but the delineation of the specific relationship has not yet been made clear. Interview bias has been described in the past, which, in theory, was thought to apply to in-person interviews and telephone-based encounters. This systematic review and meta- Forest plot demonstrating heterogeneity between studies. Plot is broken down by the mode of survey, and the author represents the article from which the mean scores are derived from. The filled triangle represents the mean score of the patient reported outcome stratified by mode of survey, and the line represents the range of scores reported in the studies.
analysis is the first, to our knowledge, to directly examine the effects of mode of administration on PROM values within orthopaedics.
Of the 10 included studies in the meta-analysis, the PROMs were normalized to a scale of 0 to 100. This analysis allowed for the comparison of multiple survey types among these different studies. After the normalization process, it was shown that PROMs administered via telephone had a notably higher scores compared with those obtained by both online-/ technology-based surveys (8.9% higher) and self-administered/paper surveys (13.8% higher).
Based on findings from this study, we recommend changes in the reporting and publication of orthopaedic studies that use PROMs as a primary outcome measure. Without a full understanding of the degree and magnitude of mode of administration on PROM scores, it is critically important for researchers to strive to use the same mode of administration within studies, and disclose which collection methods are being used for these specific studies. By using the same collection method, researchers can essentially eliminate this potential source of bias within their analyses and allow for comparison across studies without the introduction of a major known confounder. Second, disclosure of collection methodology should become standard practice for readers and reviewers to be aware of the potential introduction of bias. The overall goal was that through further understanding of this collection method bias with orthopaedic surgery, we can use a "correction coefficient" that will allow for standardization of PROMs across different subspecialties and specific surveys.
Limitations to this study exist. First, this study examines multiple different orthopaedic patient populations undergoing different surgeries or clinical evaluation. Second, although every included study was orthopaedic related, the outcomes and PROMs used in each study differed from one another. Although hundreds, and potentially thousands, of studies within the field of orthopaedic surgery use PROMs, very few studies detail their collection methods and even fewer provide adequate statistically data to be included in such a metaanalysis. Thus, underscoring both the difficulty and importance of complet-ing this study. In addition, this study did not account for socioeconomic or demographic data pertaining to the patients that may necessitate variation in the survey administration method. However, for our review, most studies did not disclose the socioeconomic and literacy levels of the patient cohort being studied. Previous studies have shown that these variables may impact access to technology or phones, altering the overall scores reported by the   Telephone survey was the reference survey. P values generated based on comparisons to telephone effect size. Data demonstrates that telephone scores were notably higher than those obtained via online/technology based or self-administered surveys. patient population. 15,26 The timing of survey administration in each study was not consistent, which may represent another confounding variable in the patient data. As with any meta-analysis, the disadvantage is heterogeneity between study designs, which is controlled for with the random-effects modeling. Data can only be compared through reporting, such as central location, in turn, limiting the number of studies included in the analysis. The including and excluding criteria for the meta-analysis are more stringent; therefore, less control exists on the study designs that are included. The meta-analysis was also performed on this group of studies that used several different PROMs and focused on a wide range of orthopaedic surgeries. Although standardized for comparability in our study, future studies should be done that focus on these mode of acquisition effects for each specific surgery and its respective PROM. Finally, the study design of the articles selected was not homogenous, and thus, a statistical metaanalysis of the data was not standardized to a specific PROM or surgical procedure. In the future, large prospective studies that control for survey timing, mode of administration, and survey type can help to mitigate data inconsistencies and improve accuracy. However, this remains unable to be studied until documentation and reporting of collection methods improve.
Ultimately, this meta-analysis demonstrated differences in PROMs based on the mode of questionnaire administration in the field of orthopaedics. PROM scores obtained via telephone (71.7) are 8.9% higher than scores obtained online (65.3, P , 0.0001) and 13.8% higher than scores obtained via self-administered on paper (61.8, P , 0.0001). This is the first study that has quantified statistically notable differences between PROM scores based solely on the mode of acquisition across orthopaedic surgery. As PROMs continually become more important to research, clinical and surgical decision-making, and reimbursement, this study can be used to help researchers better understand the confounding effect of mode of acquisition and how to correct for it.