A Comparison of Ordered Categorical versus Discrete Choices within a Stated Preference Survey of Whole-Blood Donors

There are different stated preference (SP) approaches, including discrete choice experiments (DCEs). DCEs are a popular SP approach, but in some settings, alternative ways of framing survey questions may be more appropriate. The Health Economic Modelling Of Alternative Blood Donation Strategies (HEMO) study required choice tasks to be framed so that the study could estimate the effect of attribute levels on the frequency of a behavior—in this case, blood donation. SP questions were formulated to require ordered categorical responses from a single profile of attribute levels. However, it is unknown whether this way of framing SP questions leads to estimates of marginal rates of substitution (MRS) that are different from traditional DCE choices between 2 alternative profiles. The aim of this article is to compare estimates of relative preferences from SP questions requiring ordered categorical versus discrete choice responses. We compared relative preferences elicited from the 2 approaches for a common set of attributes and levels, formulated as choice tasks for 8,933 whole-blood donors. We found that the 2 forms of survey questions provided similar MRSs estimates. For example, respondents were willing to trade off only a small increase in travel time to receive a health report, irrespective of whether the choice given was binary (DCE response; approximately 3 min) or from an ordered category (about 8 min). The finding that any differences in the estimated MRSs are not of substantive importance offers some reassurance for policy makers in that estimates of relative preference may be robust to alternative ways of framing the survey questions. These findings can encourage future studies to frame choice tasks that align with the study’s objective. Highlights This article compares the relative preferences from stated preference (SP) questions requiring ordered categorical versus discrete choice responses. The approaches were contrasted for blood donation service characteristics that offer opportunities to donate blood. The estimates of relative preferences for alternative blood donation service characteristics were similar between the 2 forms of SP approach. This study illustrates how SP survey questions can be formulated to provide responses on an ordered categorical scale and to estimate marginal rates of substitution between different attributes, which can be compared with those derived from discrete choice experiment (DCE) choices. The article highlights the potential value of considering alternative choice framings rather than relying solely on DCEs.

There are different stated preference (SP) approaches, including discrete choice experiments (DCEs). DCEs are a popular SP approach, but in some settings, alternative ways of framing survey questions may be more appropriate. The Health Economic Modelling Of Alternative Blood Donation Strategies (HEMO) study required choice tasks to be framed so that the study could estimate the effect of attribute levels on the frequency of a behavior-in this case, blood donation. SP questions were formulated to require ordered categorical responses from a single profile of attribute levels. However, it is unknown whether this way of framing SP questions leads to estimates of marginal rates of substitution (MRS) that are different from traditional DCE choices between 2 alternative profiles. The aim of this article is to compare estimates of relative preferences from SP questions requiring ordered categorical versus discrete choice responses. We compared relative preferences elicited from the 2 approaches for a common set of attributes and levels, formulated as choice tasks for 8,933 whole-blood donors. We found that the 2 forms of survey questions provided similar MRSs estimates. For example, respondents were willing to trade off only a small increase in travel time to receive a health report, irrespective of whether the choice given was binary (DCE response; approximately 3 min) or from an ordered category (about 8 min). The finding that any differences in the estimated MRSs are not of substantive importance offers some reassurance for policy makers in that estimates of relative preference may be robust to alternative ways of framing the survey questions. These findings can encourage future studies to frame choice tasks that align with the study's objective.

Keywords blood donation, discrete choice experiments, stated preferences
Date received: May 30, 2022; accepted: November 11, 2022 Stated preference (SP) studies are used to value nonmarket commodities and predict the effect of future policy changes. [1][2][3] In health economics, there has been a rapid uptake of discrete choice experiments (DCEs), which are a particular form of SP design. 4 In DCEs, respondents are required to state choices between 2 or more discrete alternatives in which at least 1 attribute level of each alternative is systematically varied across choice sets to provide the information required to infer the preference parameters of an indirect utility function. 5 While the popularity of DCEs in health continues to increase, they may not be the most appropriate SP method in all settings. Selecting an SP approach requires consideration of the study's objective and how that can be met by choosing an approach to framing the valuation task that responders find both realistic and understandable. 5 An example of a setting in which an alternative SP approach may be more appropriate than a DCE is when the study is required to estimate the effect of attribute levels on the intended frequency of a behavior. The context considered in this article is that of whole-blood donation. The objective was to estimate how frequently donors would donate according to attributes and levels defined to reflect alternative future changes to the blood donation service. 6 In this context, the frequency of donation could be included as its own attribute within a DCE. However, initial discussions with potential responders, existing blood donors, suggested that as blood donation is voluntary, framing blood donation frequency as an attribute would lead to unrealistic choices, resulting in poor engagement with the task and a high risk of misleading responses. Instead, the study developed SP choice tasks, which took the form of direct (matching) questions using the ''payment card'' approach, with the responder required to select their preferred ''donation frequency'' from a set of alternative ordered categories. 5 Here, a series of single profiles are created based on a set of attributes and levels, with donors asked how frequently (e.g., twice a year, once a year, probably would not donate) they would donate according to the opportunities to donate. Frequency of donation estimated using this ordered categorical approach accurately predicted the actual donation frequency of the same donors. 7 These predicted donation frequencies were combined with cost to evaluate the cost-effectiveness of options for changes to the blood service. 6 These alternative forms of SP questions warrant careful consideration, as in health economic studies, such as those exploring the frequency of a behavior, this framing may be more suitable. However, there is little empirical work supporting the use of this form of SP question, in comparison with the considerable literature on DCEs. [8][9][10] It is unknown whether the ordered categorical approach would provide similar estimates of relative preference to a DCE-the ordered categorical approach asks responders to state a preferred frequency according to a single profile, whereas a DCE asks the responder to make a distinct choice between profiles. While there is a growing literature that suggests different SP methods can produce different results, 11,12 to our knowledge, no study has compared a DCE with an ordered categorical approach.
The aim of this article is to compare estimates of relative preferences from SP questions requiring ordered categorical versus discrete choice responses. To enable the comparison of preferences elicited between the 2 approaches, a common set of attributes and levels were formulated as choice tasks for the same sample (N= 8,933). The article proceeds as follows: the next section introduces the empirical example and provides a conceptual overview to distinguish the 2 ways of formulating the survey questions, the experimental design, sampling and analytical methods; the following section presents the results, and the fourth section discusses the findings and outlines areas for further research.

The Study
Eligible participants from the INTERVAL study (14,725 males and 14,006 females), a multicenter randomized controlled trial of alternative interdonation intervals, 13 were invited via an email from NHS Blood and Transfusion (NHSBT) to participate in a web survey. After consenting to participate, donors were asked to provide information about the travel time for their last visit to donate whole blood before being asked to complete the 2 sets of SP questions. Donors' baseline characteristics (gender, age, ethnicity, and blood type) and donation history (new donor or not, recruitment source, and the number of donations for the year prior to randomization) were extracted from the NHSBT donor database. Ethical approval was granted by NHS (reference 16/YH/ 0023) and LSHTM (reference 10384) Research Ethics Committees.
Attributes and levels were based on the previous study that used a literature review, input from policy makers at NHSBT, and qualitative research with blood donors. 6 Five attributes were identified as pertaining to policyrelevant strategies for NHSBT: donor's travel time, provision of a general health report (which might improve the experience of a donation visit), blood collection venue opening time, appointment availability, and the maximum number of annual donations (see Table 1). The maximum permitted donation frequency attribute is included to understand how donors might respond to increases in the maximum annual number of donations permitted. The annual limit may mean that some donors currently donate less frequently than they would like. The levels on this attribute differed for males (4 to 6) and females (3 to 4) because of gender differences in the minimum allowable time interval between donations ( Table 1).
Female participants first answered ordered categorical questions with a single profile made up of 5 attributes and 5 response options (not donate to 4 times per year), and male participants ordered categorical questions, with 5 attributes and 7 response options (not donate to 6 times per year; see Figure 1a for an example ordered categorical question). Female and male participants then answered DCE questions with alternative profiles made up of 5 attributes, simply choosing which of the 2 profiles they preferred (see Figure 1b for an example DCE question).
An opt-out option was initially considered in the DCE but removed because the first Health Economics Modelling of Blood Donation Study (HEMO) survey reported that this option was chosen by less than 2% of respondents 6 and the policy maker, NHSBT, did not authorize its inclusion. The order of the 2 SP tasks was not randomized to reduce the risk that donors might not complete all of the categorical responses questions required for the main policy evaluation. The survey questions were pretested and piloted prior to the main survey data collection.

Experimental Design
Both the DCE and ordered categorical response tasks were generated with a D-efficient design created using Ngeneä 1.1.2 (Choice Metrics Pty Ltd, Sydney, Australia). 14 The design included both main and interaction effects. D-efficiency is the most commonly used approach for generating efficient experimental design for choice experiments. 15 The D-efficiency criterion satisfies the 4 principles of efficient design of a choice experiment: level balance, orthogonality, minimal overlap, and utility balance. A design that satisfies all of these principles leads to maximum D-efficiency. 16 DCE response choice tasks used a generic A versus B design, and the ordered categorical aspect of the design was generated by including a utility function devoid of any attributes. The extra level in one of the attributes (maximum number of donations) for males rather than females meant that the number of choice sets differed by gender (36 for females and 72 for males). The choice sets were randomly allocated to respondents using a blocked design in Ngene, with each respondent asked to answer 7 choice tasks.

Analysis
Modeling framework. Blood donors expressing a preference for one opportunity to donate over another (the DCE task) or deciding on how often to donate (SPordered categorial task) are assumed to make utilitymaximizing choices given their preferences and the constraints they face. Theoretical foundations supporting the modeling approaches for DCE and SP-ordered categorical responses are included in the supplement (Supplementary 1). One way of summarizing donors' preferences with respect to different attributes of the opportunities to donate is to estimate the marginal rate of substitution (MRS) between any pair of attributes. 17,18 For DCEs, this information is derived from the coefficients of the different attributes estimated by modeling the choice of one option over another using the differences in the levels of the attributes. In the SP-ordered categorical case, the coefficients used to estimate the MRS are obtained by modeling the preferred frequency of donation as a function of the attribute levels that characterize a particular opportunity to donate. As the 2 approaches, of setting a task to indicate preferences or requiring the respondent to state donation frequency, are drawing from the same utility function, it could be expected that they would provide similar estimates of relative preference, according to, for example, the willingness to travel longer to attend a donor session in which a health report is provided. Alternatively, as the questions are framed in 2 different ways, it is conceivable that even if drawing from the same utility function, they could provide substantively different estimates of relative preference.
Modeling approach. We considered several regression models recommended for the analysis of SP data in The first approach (model 1) specified a multinomial logit (MNL) for the DCE response and the generalized ordered-logit for the ordered categorical response. 20,22 The MNL model assumed error terms are independent and identically distributed, which implies homogenous preferences across individuals. The generalized orderedlogit recognized the natural ordering in the donation frequency response variable (3 times per year, 2 times per year, etc.), to reflect the strength of preferences for a particular service configuration. The generalized orderedlogit model 23 allowed for the impact of exogenous variables to affect the threshold parameters, thus relaxing the restrictive assumption of the traditional ordered discrete model, in which the effects of explanatory variables are restricted to be the same across dichotomized levels of the outcome variable.
The second approach (model 2) applied the MNL model to the SP-categorical response data and assumed that response variable was nominal (no ordering; model 2). This model allowed parameters to differ across the alternative attributes without imposing any restriction and offered greater flexibility and explanatory power, albeit at the expense of requiring more parameters to be estimated. Models 1 and 2 reported robust standard errors to recognize the correlation of individual-level responses.
We then extended the MNL for both responses to include random intercepts that allowed the parameter estimation to recognize the correlation in each individual's responses across choice sets (model 3). To consider the potential impact of scale heterogeneity, we also considered a generalized multinomial logit (G-MNL), which allowed for the random error component to differ across individuals. 24 We applied the G-MNL to the DCE data. The G-MNL could not be applied to the ordered categorical repsonse data, and so we applied an ordinal generalized linear model (GLM) that also allowed for scale heterogeneity. 22,25 We considered the implications of imposing a forced choice for the DCE responses versus including an optout option for the ordered categorical responses, by reapplying model 3, as this was judged the most appropriate across the 2 forms of responses but after excluding respondents who chose the ''I would probably not donate'' option for all choice scenarios requiring the ordered response. While the main analyses assumed that the survey respondents were a random sample of the target population, we undertook sensitivity analyses, whereby the responses to both forms of survey question were weighted according to the characteristics of the target population of interest, defined by those of blood donors from the NHSBT register who were eligible because they had donated within the previous 12 mo. All analyses were conducted using Stata SE version 16.0 (StataCorp LLC, College Station, TX, USA).

Results
The SP survey was completed by 4,179 females (29.8% response rate) and 4,754 males (32.3% response rate). Table 2 reports the characteristics of the analysis sample, those invited to complete the survey, and of eligible donors from the NHSBT register. Overall, in the analysis sample, 41% of men and 39% of women were in the age group 41-60 y, 14% of men and women were categorized as having high demand blood types, .90% of participants were self-defined as of White ethnic origin, and 90% of men and 85% of women had donated more than once in the past 12 mo. The ethnicity and blood type of the study sample were similar to all donors invited to complete the survey and those of the target population. A higher proportion of those who completed the surveys were older than 60 y and donated more frequently in the previous 12 mo compared with all of those surveyed and the NHSBT eligible donors ( Table 2).
Within the SP-ordered categorical response questions, a small proportion of participants (1.3% male and 1.1% female) always chose the opt-out, ''I would probably not donate,'' irrespective of the attribute levels. Within the DCE choice tasks requiring a binary response, there were no respondents whose choices were consistent with having lexicographic preferences. Lexicographic preferences were assessed by checking whether the level of a particular attribute always appeared to determine their choice. All of the SP questions were completed by 98.8% of male and 98.9% of female respondents for the questions requiring an ordered categorical response and 93.6% of male and 98.7% of female respondents for the DCE tasks.
For the SP-ordered categorical responses, the MNL (model 2) fitted the data better than the ordered logit (model 1) for both genders (Supplementary Table S2 and S4). For the ordered categorical responses, the inclusion of random effects was helpful in recognizing unexplained variation in responses across individuals, and therefore model 3 fitted best for both genders (Supplementary  Tables S2 and S4). For the DCE response, there was unexplained variation at the level of the responder for females (r = 0.036) but not for males (r = 3.22 3 10 26 ), and so the inclusion of random effects improved model fit for women but not for men. The consideration of scale heterogeneity led to an improvement in model fit only in the response to the DCE questions for males (model 4 v. 3, Supplementary Table S1). For the ordered categorical responses, the ordinal GLM that considered scale heterogeneity led to worse fit for both genders (model 4 v. other models, Supplementary Tables S1 and S2).
The estimated coefficients from all the regression models were in line with prior expectations (see Appendix Tables S5-S8 for those from the best-fitting models). The estimated coefficients represent the effect of attribute levels on annual donation frequency for the ordered categorical questions and the probability of selecting a donation service with these particular attribute levels for the DCE response questions. The estimated coefficients suggest that for both forms of survey question, donors were willing to trade off increased travel time for a service change anticipated to improve their utility, the introduction of a health report, and for changes that would reduce constraints such as extending donor center opening hours or appointment availability or increasing the maximum number of donations permitted annually.     Tables S9  and 10). Figure 2a,b presents the MRS estimates from model 3 for travel time versus changes in each of the other blood service attributes, for both sets of survey questions. The differences between the SP-ordered categorical and DCE approaches in the estimates of the mean MRSs were statistically significant at the 5% level for some attributes and levels, for example, introduction of the health report (both genders), allowing appointment availability every weekday, changing opening time to 2 pm to 8 pm (males) or 9 am to 5 pm (females) or increasing the maximum number of donations permitted (both genders). For the other service attributes, the differences in the estimated mean MRSs were not statistically significant. However, across all attributes and levels, the mean differences in the estimated MRSs between the survey approaches were relatively small. The maximum difference in the estimated MRSs between the different forms of survey questions was 10 min (for the attribute that increases the maximum donation frequency to 6 for males), but the mean difference was 5 min or less for all other attributes for males and 8 min or less for females. The estimated MRSs after reweighting the estimates according to the observed characteristics of the target population were similar to those from the unweighted survey population (Appendix Tables S11-S12).

Discussion
This article compares overall (full sample) preferences elicited from 2 different SP tasks. These approaches were contrasted within the same survey administered to a large sample of blood donors. It was recognized a priori that these 2 ways of formulating the survey questions could lead to different estimates of relative preference, as they ask different questions, albeit drawing from the same utility function. However, the 2 forms of questions provided similar estimates of MRSs between a ''cost'' (increased travel time) and factors anticipated to increase utility (e.g., introduction of a health report) or a relax constraints (e.g., extended donor centre opening hours).
The mean differences between the ordered-categorical and discrete-choice response approaches in the estimated MRSs for travel time versus all other attributes did not exceed 10 min for all models that provided reasonable fit to the data. It is unlikely that these small differences would imply differential policy recommendations according to the form of choice task within the SP survey. The estimated MRSs were similar across regression approaches.
This study exemplifies how SP survey questions can be formulated to provide responses on an ordered categorical scale required for the parameter of interest (donation frequency), as well as according to more traditional DCE tasks. The finding that any differences in the estimated MRSs are not of substantive importance offers some reassurance for policy makers in that estimates of relative preference may be robust to alternative ways of framing the survey questions. It should also be recognized that this robustness may reflect some of the study's strengths, in that the sample size was relatively large compared with previous SP surveys in health, differences attributable to unobserved heterogeneity were minimized by the same respondents completing both sets of survey questions, and the same analytical model was found to fit both forms of response data relatively well.
Considerable literature has found inconsistent results in comparing different types of SP tasks. [26][27][28][29][30][31] Such comparisons are fraught with challenges. In particular, previous attempts have faced major methodological concerns about unobserved differences (heterogeneity) between the comparison groups, inadequate sample sizes, and the absence of a metric for comparison across the alternative designs. 32 Moreover, most studies have not been able to assess the predictive accuracy of the SP estimates with revealed preference data, and so while differences between approaches might be identified, it is impossible to determine which approach is more accurate. The study by Ryan and Watson is one notable exception; here, they compared stated screening uptake with both a contingent valuation and DCE and compared with actual uptake. 29 They found significant differences in the stated screening intention between both methods and with actual screening uptake. This contrasts to this study, which finds similar results between approaches and with observed blood donation frequency.
This article has some limitations and provokes several areas of future research. First, although the MRS estimates from the SP-ordered categorical questions and DCE tasks were similar, they were not directly compared with revealed preference data. 33 However, the donation frequency predicted from the SP-ordered categorical survey questions was previously found to be similar to observed donation frequency. 7 Second, the article exemplifies the use of an SP-ordered categorical versus a DCE task, within a single application. It is unknown whether in other settings there are substantive differences across approaches and how these should be presented in policy recommendations. Third, the questions requiring an ordered categorical response included an opt-out option, whereas the DCE tasks did not. However, this is unlikely to have materially affected the findings, since only a small proportion (\2%) of donors indicated a preference to not donate across all the ordered categorical choice sets received, and the exclusion of these individuals from the analysis had a negligible impact on the results. Fourth, the SP-ordered categorical versus DCE tasks were not presented in random order. The SPordered categorical questions were always presented first, and this may have led to the lower completion rate for the DCE tasks for males. Nonrandomized ordering of choice tasks may lead to strategic behavior 34 and therefore may bias the MRS estimates. However, there was no evidence of strategic or nontrading behavior across the survey formats for either gender. Finally, the analysis models did not fully consider preference heterogeneity, as that was beyond the immediate scope of the article. While the G-MNL has the flexibility to simultaneously address individual-specific scale and preference heterogeneity for DCE response data, the application to ordered categorical responses requires extension. 24 For the DCE responses, we have assessed the preference heterogenity using latent class analysis but did not find any evidence that preferences from DCE responses are obscured by preference heterogenity (Supplement 2).
Future studies that contrast alternative ways of framing SP questions would be seful, and should consider whether there is additional cognitive burden associated with requiring ordered categorical versus DCE choice tasks. If so, it would be useful to consider whether the parameter of prime policy interest could still be provided but with questions in a simpler form. 35 Future research could also extend existing choice models 24,25 to evaluate the impact of alternative forms of survey question for both overall (full sample) preferences and preferences across subgroups in the presence of preference and scale heterogeneity. [36][37][38] In conclusion, this article highlights the potential value of alternative choice framings, rather than relying solely on DCEs, and encourages researchers to consider more carefully the most suitable framing of choices in SP studies. The decision of which choice task is most appropriate for a given study depends on many factors and can incur a tradeoff between theoretical underpinning and the best way to frame a task that responders find both realistic and understandable.