Online Consumer Surveys as a Methodology for Assessing the Quality of the United States Health Care System

Background: Interest in monitoring the quality of health care in the United States has increased in recent years. However, the policy objectives associated with collecting this information are constrained by the limited availability of timely and relevant data at a reasonable cost. Online data-collection technologies hold the promise of gathering data directly and inexpensively from large, representative samples of patients and consumers. These new information technologies also permit efficient, real-time assessment in such areas as health status, access to care, and other aspects of the care experience that impact health outcomes. Objective: This study investigates the feasibility, validity, and generalizability of consumer online surveys to measure key aspects of health care quality in the United States. Methods: Surveys about the health and health care experiences of a general adult population and of adults with diabetes were administered online and by telephone. The online survey drew from a sample frame of nearly 1 million consumers and used a single e-mail notification. The random-digit-dial methodology included 6 follow-up calls. Results from the online sample were compared to the telephone sample and to national benchmark data. Results: Survey responses about quality of care collected using online and telephone methods were commensurate once they were weighted to represent the demographic distribution of the 2000 United States Census. Expected variations in health and health care quality across demographic and socioeconomic groups were largely observed, as were hypothesized associations among quality indicators and other variables. Fewer individuals were required to be contacted to achieve target sample sizes using online versus telephone methods. Neither method yielded representative cohorts of nonwhite individuals. Conclusions: Conclusions about the level and variations in health care quality in the United States are similar using data collected in this study compared to data collected using other telephone-based survey methods. As is typical for national telephone surveys conducted by the National Center for Health Statistics, stratified sampling and weighting of survey responses is necessary for results to be generalizable. Online methods are more appropriate for understanding health care quality than for conducting epidemiologic assessments of health in the United States. (J


Introduction
Recent years have seen a marked increase of interest in monitoring the quality of health care in the United States. Congress has mandated the annual release of a National Healthcare Quality Report, which will include results from consumer-reported surveys on health care quality [1]. Congress, a presidential commission, and the National Quality Forum have all called for publication of consumer-centered quality performance information, and the administrator of the Medicare program has indicated the government's intention of releasing performance data for nursing homes, hospitals, and perhaps even physicians [2,3]. State Medicaid and State Children's Health Insurance Programs (SCHIP) are required to assess and report on quality of care provided to consumers enrolled in these programs [4,5].

Need for Timely and Efficient Collection of Quality Information
These policy objectives are constrained by the limited availability of timely and relevant data at a reasonable cost. Often, information strategies for health care quality must rely on datasets defined and populated for other reasons, such as documentation of financial transactions, public health surveillance, or contractual oversight and audit. Seldom do such assessment systems address the health care quality concerns of patients and consumers, and rarely do they capture their experiences or attitudes.
Consequently, a tension exists between the policy objective of evaluating our success at creating a more-responsive health system that achieves priority health goals and our dependence on an information infrastructure unable to capture the necessary data to determine whether these goals are being achieved. Two trends offer some hope of resolving this tension. First, scientists have developed and validated an extensive library of patient survey instruments over the past 20 years. Tools now permit us to measure the performance of the health system along the dimensions of health care outcomes and the provision of clinically-appropriate, patient-centered care [6][7][8][9][10][11][12]. Second, new information technologies hold the promise of gathering data directly and inexpensively from large, representative samples of patients and consumers. Online data-collection technologies also permit efficient, real-time assessment in such areas as health status, access to care, and other aspects of the care experience that affect health outcomes [13][14]. Given the potential efficiencies and expediency of collecting data online, as well as growing limitations in telephone-based and/or mail-based surveys, it is clearly worthwhile-perhaps vital-that we develop and test online methods for capturing consumer-reported information on quality of health care.

The Challenges of Web-Based Patient Surveys
All modes of consumer-survey administration entail challenges of measurement error, nonresponse error, and, particularly, coverage error. Online methods may be helpful in reducing some of these sources of error, but may also encounter new challenges in other sources of error.

Measurement Error
Web-based surveys introduce a new mode of interaction with respondents. The online experience involves both technical and contextual changes that may cause variations from how the same individuals would answer questions if presented in person, on the telephone, or by mail. Among technical differences are the presentation of questions and responses on computer screens, and variations in browser layouts, colors, text, and communication speeds. Contextual factors include users' ability to review and change prior answers, look ahead to other content, "multi-task," or start and stop during a session. Studies evaluating Web-based survey-mode effects have generally shown them to more closely resemble self-administered mail surveys than interviewer-administered telephone surveys, though with lower item nonresponse [13,15] and the potential for immediate data analysis and feedback to sponsors and respondents [16,17]. To ensure consistent user experiences and reduce measurement error, a consensus set of procedural recommendations analogous to those for mailed and telephone administered surveys is emerging for conducting Web-based surveys [18][19][20].

Non-Response Error
Continuing changes in consumer telephone behavior have increased and redefined nonresponse error in phone surveys [21,22]. The common use of answering machines and of technologies for caller identification and unknown-caller blocking all contribute to nonresponse bias for telephone surveys. Consumer resistance to receiving telephone calls by telemarketers is reflected in the Do-Not-Call registry recently required by Congress and implemented by the Federal Trade Commission [23]. While survey researchers conducting surveys for not-for-profit or public-interest purposes are not prevented from calling individuals in this registry, the overarching resistance and resentment expressed by consumers regarding calls made to their home during evenings and weekends could generalize to a resistance to respond to calls to conduct these types of surveys.
Although researchers have begun to study the extent and causes of nonresponse error to e-mail and Web-based surveys, it is not well-documented and remains an especially-serious concern of methodologists when considering the use of the Web to conduct population-based surveys intended for use in policy contexts [21,23,24]. Documented reasons for nonresponse range from traditional questions of content interest to respondents' use of multiple e-mail accounts and defunct or infrequently-accessed e-mail accounts [21,25,26]. The emerging consensus procedures focus on the importance of repeat contacts, tracking nondelivery to e-mail accounts, and incentives to maximize response rates. In this paper, we report the extent to which data derived from a national online sample of the general adult population and from a sample of adults with diabetes meet initial criteria for use in characterizing the performance of health care systems.
Four research questions are preliminary to the overall feasibility and validity of using Web-based surveys to estimate health care quality: • Are online survey response rates (derived from a sampling frame recruited using opt-in Internet methods) of sufficient size and representation to estimate health care indicators for the US population?

Online Surveys
A market research firm, Common Knowledge, Inc, recruited a panel of approximately 1 million individuals, using Internet advertisements intended to attract a group with diverse demographic and psychographic characteristics. Approximately 70% of the panel was recruited online, the remaining 30% through traditional direct-mail and telephone contact. Panelists were invited to participate in only one study per month to prevent "professional" survey takers from responding and to minimize respondent fatigue.
Two waves of sampling and data collection took place for the general-adult and adult-diabetes online surveys. In the first wave, separate stratified random samples were drawn, each representing the US population along the dimensions of age, sex, and education using 4 age groups (18-24, 25-44, 45-64, over 65) and 4 educational groups (less than high school, high school/GED [General Equivalency Diploma], some college, college or more). A standard self-reported screening tool was used to identify individuals age 18 and over and those with diabetes. For the general adult survey, 13400 invitations were sent in the first wave of data collection. Diabetes-qualified respondents were screened as part of a larger effort to identify several chronic illnesses. Once a person qualified for one condition they were routed to complete the survey for persons with that condition until target sample sizes for each condition were achieved. As such, no sample-wide qualification rate for adult diabetes is available. A second wave of 1400 invitations oversampled individuals with Spanish surnames or who lived in zip-code areas with disproportionate numbers of African-Americans and/or Hispanics. An online survey research firm, E-valuations, Inc, sent invitations and collected data for both waves, using the sampling design and surveys developed by the Foundation for Accountability and the Robert Wood Johnson Foundation. Each respondent was given a unique 5-digit access code to ensure that the survey was taken only once. Those who completed it were entered into a drawing for a $250 cash prize. No reminder e-mails were sent, nor were nonworking or dormant e-mail addresses tracked.

Telephone Surveys
Adults age 18 and over constituted the sampling frame for the 2 telephone surveys. Wirthlin Associates, Inc identified individuals by means of traditional random telephone-survey methods, and used the sampling design and surveys developed by the Foundation for Accountability and the Robert Wood Johnson Foundation to conduct the surveys. Candidate telephone numbers were randomly selected and call attempts made until the target completed sample sizes of 400 for each survey was reached.

Measures
This study evaluates the Internet methodology for both the general-adult and adult-diabetes samples, using demographic variables and the following topics. Sources of survey items for each topic are provided in the reference associated with each of these topics:  [34].
We selected these variables based on the availability of external benchmarks and representation of a range of health and health care quality topics.
The psychometric reliability of the following survey scales constructed using several survey items was also assessed (these are the multi-item survey scales referred to below in the "Data Analysis" part of "Methods"). A reference for each multi-item survey scale is provided in the reference associated with each of these scales: 1. getting medical care quickly [37] 2. getting dental care quickly [37] 3. shared decision making (diabetes only) [38] 4. self-care education and support (diabetes only) [39].

Data Analysis
We calculated response rates for the online general adult survey as the ratio of the completed sample size to the number of e-mail invitations needed to achieve this sample. The response rate for the online adult-diabetes survey was the proportion of the people completing the survey who were positively identified as having diabetes. Neither rate accounts for nonworking or dormant e-mail addresses. Telephone response rates were the ratio of completed sample size to the number of randomly-selected, working residential phone numbers that had to be called to achieve this sample size.
Survey responses for adults with diabetes were weighted using diabetes-specific age and sex distributions from the 1999 Behavioral Risk Factor Surveillance Survey [34]. General adult survey responses were weighted for age, sex, educational level, and presence of a chronic condition using distributions from the 2000 National Health Interview Survey (Robert Wood Johnson Foundation, oral communication, in person and by telephone, 2000). These distributions were used in lieu of those available from the US Bureau of the Census through the Current Population Survey (CPS) because chronic-condition status was not available from the Current Population Survey [40].
We compared weighted results from the online and telephone surveys for variables listed above to available benchmarks using either the 1999 Behavioral Risk Factor Surveillance Survey or the 1998 National Health Interview Survey. For online, telephone, and benchmarking dataset samples, we used regression analysis methods to evaluate patterns of variation across population subgroups for selected health and health care quality variables. Dependent variables for the general adult sample included health insurance status, having a regular doctor or nurse, physician counseling to quit smoking (for smokers), and poor health days in the last month. Dependent variables for the adult-diabetes sample included receipt of a routine retinal exam, use of health care services, smoking behavior, and poor health days in the last month. Independent variables were age, sex, race, education, and income, plus health insurance status and having a regular doctor or nurse, except where health insurance or regular doctor or nurse was used as a dependent variable. We compared results across samples in terms of the overall explanatory value of independent variables using the Cox and Snell generalized coefficient of determination [41]. The direction, general magnitude, and significance of the effect of each explanatory variable were also compared across samples for each dependent variable.
Each of the 4 multi-item survey scales (see Measures, in Methods, above) were evaluated for psychometric reliability using standardized estimates of Cronbach alpha [41]. SPSS version 9.0 was used to conduct data analysis [42].

Response Rates, Response Bias, and Representativeness
Of the approximately 13400 e-mail invitations sent for the online general adult population survey, 2324 individuals responded and completed at least 80% of the survey, resulting in a 17.3% raw response rate. Based on industry norms, we estimate that at least 10% to15% of e-mail addresses are nonworking or dormant. Assuming this, the true response rate for the online general adult survey is 19% to 20%. For the general adult population telephone survey, approximately 4300 working, residential phone numbers had to be dialed to achieve the target sample size of 400. This resulted in an estimated 9.3% response rate after adjusting for nonworking and nonresidential phone numbers. Completed survey samples for the online and telephone adult-diabetes surveys were 1048 and 397 respectively.   Table 1 and Table 2 summarize the demographic characteristics of the unweighted and weighted online and telephone survey samples. Overall, respondents to the online general adult survey match the distribution of the sampled population, with some underrepresentation of individuals age 18 to 24 and overrepresentation of individuals age 45 to 64 and individuals reporting more than a high school education. Both the unweighted online and telephone general-adult completed survey samples underrepresent nonwhite individuals, those with less than a high school education, and those with incomes over $75000. Compared to the Current Population Survey, both general adult samples overrepresent individuals with a college education (or more) and incomes of $15000 to $35000. The telephone general-adult sample was more likely to underrepresent those with less than a high school education and overrepresent those with a college education. Similar results were found in both adult-diabetes samples (Table 2). However, while the telephone diabetes survey sample dramatically underrepresented individuals under age 44, and overrepresented those over age 65 and with incomes over $75000, this was not the case for the online adult-diabetes survey sample. Neither the online nor telephone methods resulted in samples properly representing racial groups with diabetes. telephone data are both weighted to the same NHIS data, slight differences in the distributions occur because the cell for males 18-24 was 0 for the telephone sample, making it impossible to create a weight for that group. † Weighted to US population; unweighted N = 9496. ‡ Some differences between characteristics of the population of people with diabetes using the BRFSS and both the online and telephone responding populations in this study were observed at the .05 level of significance Table 3 and Table 4 compare results from both the general-adult and adult-diabetes online surveys to those obtained from the telephone surveys and benchmark data reported in other national studies. For the general adult population, the weighted online-survey results are not significantly different from those derived from the Behavioral Risk Factor Surveillance Survey and the National Health Interview Survey on 7 of the 12 health statuses, access to care, utilization of care, and clinically-appropriate health and health care quality indicators, including: (1) presence of health insurance, (2) having a regular doctor or nurse, and (3) receipt of advice to quit smoking for smokers. For the sample of persons with diabetes, results from the online survey were not significantly different from the BRFSS or NHIS benchmarks on 7 of the 13 indictors used, including (1) self assessed health status, (2) presence of health insurance, (3) having a routine checkup, (4) getting a retinal eye exam at least once in the least year, (5) receipt of advice to quit smoking for smokers, and (6) routine retinal exams for diabetics.

Comparison to Other National Studies
In the general-adult population survey, we observed higher proportions of individuals reporting 7 or more poor health days, fair or poor health status, and smoking.
In addition to comparing point estimates produced by this online survey to those produced by national benchmark datasets, we also evaluated how these datasets compare in terms of identifying variations and disparities in the health and health care quality across demographic subgroups as well as according to characteristics such as health insurance status and presence of a regular doctor. Table 5 and Table 6 present results from logistic regression analyses conducted to evaluate patterns of variation observed using data collected online versus data collected by telephone and versus telephone-based national benchmark datasets (BRFSS and NHIS).
The independent variables included in this analysis had similar explanatory power for dependent variables from the general adult survey whether data were collected using online or telephone methods. Specifically, at the low end, the demographic and health care related independent variables explained 5% to13% of the variation observed in reports of days lost because of poor health for the national dataset sample (5%), the telephone sample (13%), and the general adult online sample (9%), respectively. At the high end, these variables explained 25% to 34% of variation observed in the presence of health insurance across all datasets. For the adult-diabetes samples, on the low end, the independent variables used here accounted for less than 5% of the variation observed in rates of high utilization of health care. On the high end, these variables accounted for 11% to17% of variation observed in rates of smoking for all 3 adult-diabetes samples compared.
Along with the overall explanatory value of independent variables, we observed consistency across the general adult population datasets in terms of the approximate magnitude and significance of effect of specific independent variables. Having a regular doctor and income showed the most consistent and statistically significant effects (P < .05). Age and educational level, meanwhile, were the most consistently significant for the dependent variables evaluated for the adult-diabetes samples. No instances were found in which a variable was significant in one sample and also significant in the opposite direction in another. We did find cases of a variable being significant in one sample, but not in another. In most cases, this is attributed to chance or smaller sample size.    * BRFSS sample size is small because the question regarding having a regular doctor is asked only of a subset of subjects. † Education was grouped into high school or less, versus some college or more. ‡ P < .05. || P < .01. § P < .001.

Scale Reliability
Cronbach alpha internal consistency scores were .72 or above for each of the 4 multi-item scales observed here (.72-.95), demonstrating their psychometric reliability when online administration is used (Table 7).

Discussion
This study found evidence that online health care surveys originally designed for mail or telephone administration maintained both psychometric reliability and concurrent validity in results across demographic and other subgroups. More specifically, estimates of access to care, utilization of care, application of clinically appropriate care, and consumer experiences of care were similar to those derived from more traditional methods of obtaining representative samples of the US population.
We were able to achieve a sample representative of the US population in terms of age, sex, and education using a readily-available, opt-in sampling frame that employs relatively low-cost recruitment methods. Basic statistical weighting methods further aligned the responding population sample on these variables. Prior information on the affiliation of individuals included in the Web panel prevented stratified sampling based on race or income. Consequently, we cannot determine whether differences between our completed survey sample and the US population in the proportion of persons representing each racial and income group are due to response biases or inadequate representation in the original sampling frame for this study.
Since a great deal of concern focuses on health care for lower-income individuals, it is important keep in mind that this group was, in fact, overrepresented when compared to the US population.
Given the importance of equitably representing the range of racial and economic groups, Web-based panels used for public information about health care quality should strive to include these variables so that stratified sampling may occur and/or assessments of response bias can take place. Here, oversampling methods often used in other national studies were successful in attenuating potential biases in results caused by lower rates of representation among nonwhite racial groups.
Response rates for Internet-based, telephone, or mailed surveys must be calculated in comparable ways and take into account differences in follow-up steps with nonrespondents. In this study, while analogous administration steps were used for both the online and telephone surveys, more-robust follow-up strategy was used for the telephone survey (6 follow-up calls for telephone and no follow-up steps for online survey). In spite of this, the online response rate was higher than for telephone when comparable calculations were used. This finding is true even when nonworking and nonresidential numbers are removed from the telephone sample and similarly nonworking or dormant e-mail addresses are not removed from the online sample. Given the unique sampling and administration processes employed for both surveys, these findings may not be observed in cases where relatively-simple online methods are compared to more-complex and more-costly sampling and administration methods typical of national studies such as the National Health Interview Survey and the National Medical Expenditures Panel Survey. An important question to examine further is whether such extensive follow-up methods are required to generate public information about health care quality and whether Internet-based methods outlined here may be suitable, especially as Web access continues to expand for all population groups.
Overall, findings from this study demonstrate that many of the sampling and survey administration challenges inherent in telephone and mail modes of data collection are also present for Internet-based methods. In turn, the survey administration, statistical sampling, and weighting approaches used to ensure that data collected via telephone or through mailed surveys yield adequate and representative samples, are also required for data collected via the Internet.
Internet-based data collection is appealing in its potential for allowing information to be collected in a timely and efficient manner. These efficiencies are eroded, however, if costly strategies are required to recruit panels from which sampling may occur and/or when the survey administration process includes extensive nonresponder follow-up and tracking steps.
The methods used in this study were selected to be low burden in terms of the sampling frame and administration. This was done in order to begin to explore whether the benefit of obtaining data in a timely and potentially-interactive manner using the Internet can be achieved without incurring costs that diminish the value of doing so when compared to traditional telephone methods used by most nationally-recognized studies.
As these and other issues regarding the use of the Internet to conduct health and health care quality surveys are evaluated, it is worth recalling that our comfort with telephone surveys dates only from the late 1970s, when relatively-sophisticated methodologies were established involving random-digit dialing and multiple contact strategies [24]. In fact, the rise of the telephone survey in the late 1960s and early 1970s was attended by similar methodological concerns as those now associated with Web surveys -and took a decade of research and refinement to resolve. In recent years, the growing use of unlisted numbers, cell phones, call waiting, caller identification, and answering machines have induced a steady decline in response rates and growing disparities in the populations willing to be contacted by telephone. For example, Gallagher et al found that only an elaborate and expensive combination of mail, phone, and door-to-door solicitations produced a respondent pool fully representative of the low-income community [44]. As a result, Dillman has argued that only self-administered surveyswhether made available by mail, interactive voice response, or the Internet -are likely to be successful in the coming years [19].
Results of these analyses suggest that weighted online sampling offers an imperfect but promising avenue for collecting large-scale representative survey data. Overall, conclusions about the level and variations in health care quality in the United States are similar whether based on data collected online or data collected using more elaborate and costly survey methods.
All forms of survey-based data collection involve certain sampling and mode effect biases. Tradeoffs in the biases entailed in online versus telephone based surveys need to be carefully considered by policymakers. As Internet access increases along with the propensity for individuals to resist telephone solicitations, online survey methods may increasingly represent