Introduction

The provision of long-term care (LTC) for older people in various OECD countries has been substantially contributed to by informal carers [1]. The projected growth in the LTC needs in Europe has imposed a difficult question of how to effectively allocate limited resources within LTC systems to support people with LTC needs and their informal carers [2, 3]. Concerning the supply side of informal care, providing care can lead to unfavourable effects on carers’ health, well-being, life satisfaction and overall quality of life (QoL). High-intensity caregiving is found to be associated with worse mental health, increased emotional and physical strain, and loneliness or feelings of isolation [4,5,6,7]. It is also associated with decreased life satisfaction [8] and increased use of drugs and outpatient care [9].

Systematic reviews [10, 11] indicate that informal carers’ well-being, stress or burden, mental health, needs and experience have been measured by a number of measures, such as the Caregiver Burden Interview [12], the CES Depression Scale [13] and the Social Satisfaction Scale [14]. Since these measures focus on specific aspects of carers’ well-being, they may omit outcomes or experiences that are important to carers. The use of appropriate measures and methods to assess the costs and outcomes related to the provision of informal care and the QoL of carers has become particularly important in effectiveness and cost-effectiveness studies that include informal care [15].

Adult social care aims to promote the well-being and QoL of adults needing support with daily activities and their informal carers (caregivers). The Adult Social Care Outcomes Toolkit for service users (ASCOT) was developed to measure adult care recipients’ social care-related quality of life (SCRQoL) and the effectiveness of support and services [16]. As carers’ outcomes and experiences differ from those of services users, the Adult Social Care Outcomes Toolkit for carers (ASCOT-Carer) was also developed [17, 18], and English preference weights for the original measure were recently derived [19]. The ASCOT-Carer can be used in effectiveness and cost-effectiveness evaluations of interventions focusing on social care and support to caregivers [18].

Similar to numerous generic preference-based measures [20], the English ASCOT-Carer preference weights [19] capture the values of the general population for ASCOT-QoL states in England. This reflects the point of view that the values of the general population should be used in decisions about how to allocate the limited resources in the health and social care sector as the general population pays for services and the provision of services is tax-funded in many European countries [21]. Furthermore, comparative studies have indicated that the general population’s preferences differ between countries according to culture and health and social care delivery systems [20, 22, 23]. Therefore, we should be cautious about valuing QoL states in one country using preference weights for QoL states that were developed in the context of another country [22, 23]. In the field of health-related QoL measurement, the usual practice is to develop country-specific preference weights to better explain the country’s own populations’ perceptions and values regarding various health states [24,25,26]. This approach was taken for translated-versions of ASCOT [27,28,29] (in German, Japanese and Finnish) and ASCOT-Carer [30] (in German ) measures.

To apply the ASCOT-Carer measure in Finland, we translated the measure from English to Finnish in 2015–2016, following international guidelines in the translation process [31].Footnote 1 Since the preference weights for the Finnish-translated measure has not been developed yet, the primary aim of this study was to estimate Finnish preference weights for the Finnish ASCOT-Carer measure. Following Netten et al. [16], we collected choice data from a web-based general population survey that included a best-worst scaling (BWS) experiment [32, 33]. Using the BWS data and multinomial logit models, we estimated the preference weights for attribute levels of the Finnish ASCOT-Carer.

The recent literature on choice experiments has indicated that sequential choice tasks can give rise to learning or fatigue [34,35,36], where respondent choices become more consistent (learning) or less consistent (fatigue) over the course of the experiment. In the BWS experiment, each respondent had eight sequential choice tasks and made four consecutive choices per task. Since these repeated tasks created a prerequisite to explore fatigue and learning during the choice experiment, an auxiliary aim of the study was to investigate the effect of learning and fatigue on respondent choices and preference estimates in the BWS experiment. This study contributes to enlarging the number of valid measures for use to evaluate capability-based QoL in a general population [37] and better understanding the effect of fatigue and learning on respondent choices in BWS experimental studies.

Methods

ASCOT-Carer, best-worst scaling (BWS) and BWS tasks

The ASCOT-Carer measure has seven four-level attributes concerning different aspects of informal carers’ SCRQoL: (1) occupation; (2) control over daily life [control]; (3) looking after yourself [self-care]; (4) personal safety [safety]; (5) social participation and involvement [participation]; (6) space and time to be yourself [space-and-time]; and (7) feeling supported and encouraged [support] (Table 1). The attribute levels measure carers’ need intensity: Level_1 (top level) indicates the most favourable situation—the ‘ideal state’—and level_4 (bottom level) indicates the least favourable situation, i.e. ‘high needs’, whereas level_2 and level_3 indicate in-between situations (i.e. ‘no needs’ and ‘some needs’, respectively).

Table 1 ASCOT-Carer attributes describing informal carers’ social care-related quality of life

Following the approach used in Netten et al. [16], we used the BWS method to collect data to derive Finnish preference weights for the Finnish version of the ASCOT-Carer measure. The choice of the method used in [16] was informed by results from previous reviews [38, 39] which suggest that more information within choice sets can be obtained with less cognitive burden using the BWS method than using the DCE method. In the BWS profile case, one profile is presented at a time, and choices between alternatives are made within the displayed profile [40]. To reduce the effects of lexicographic and non-trading behaviour in the BWS tasks [41] and to obtain partial ranking for the attribute levels [39], the second-best and second-worst attribute levels from each profile were also selected (Fig. 1).

Fig. 1
figure 1

An example of a BWS profile using different QoL states from the ASCOT-Carer measure. ©University of Kent: The ASCOT-Carer measure is reproduced with permission from the University of Kent. All rights reserved

The full factorial design plan comprised 47 possible profiles [38, 39]. To obtain a reasonable number of possible profiles to be presented to respondents, a fractional-factorial orthogonal main effects plan (OMEP) design of 32 profiles was used [42, 43]. Each profile included seven attribute levels, one from each attribute defined in the ASCOT-Carer measure (Fig. 1). The profiles were randomly divided into four blocks of eight profiles. Each respondent randomly received an eight-profile block. Respondents first imagined a situation where they would have cared for a person who needed help in their daily lives due to old age, disability or illness. Then, they evaluated the alternatives in the profile and sequentially selected four alternatives that gave the greatest and lowest relative utilities, making a BWS choice task. The number of alternatives available per profile decreased after each choice and the best, worst, second-best and second-worst choices was sequentially made per profile in each BWS task before moving to the next profile and a new task (Fig. 1).

A foldover design was applied to reduce the number of easy choices from each profile [44]. To reduce selection bias, the blocked profiles were randomly assigned to respondents. The position (order) of attributes was randomised between (but not within) respondents to avoid ordering bias and disengage the effect of attribute choice from the position of that attribute within a choice task [16, 35, 45].

Survey design and sampling

An online survey that included the BWS experiment using the ASCOT-Carer measure was conducted between July and August 2016 and managed by Research Now. To achieve a representative sample of the Finnish general adult population for this survey, an online panel with quota sampling by age, gender and region was used. Besides the BWS choice data, we also collected information about respondents’ demographic and socioeconomic background (such as gender, age, region, household income, education, marital status, religion, employment status and tenure), well-being (self-assessed health (SAH) and overall QoL), information on experience in caring and need for social care as well as information about how well the respondents understood the given BWS tasks.

Those who spent less than 4.5 min completing the BWS task section were excluded during the data collection. At a testing phase, we found that it took at least that amount of time to read and complete eight BWS tasks (32 choices). Due to power calculation requirements, we continued sampling until the target of 1000 participants was reached, but we obtained a slightly larger sample at the end of the data collection (n = 1009). Excluding those with no information on education (n = 4), the final sample consisted of 1005 respondents, and the long-format panel data had 32,160 choices.

Modelling strategy

The BWS choices were analysed based on the random utility theory [33, 46]. The estimated preference parameters are a function of choice frequencies, and the choice of an attribute level describes the importance of that attribute level relative to other available attribute levels [40]. To start out estimating the coefficients of the attribute levels, we first applied a multinomial logit (MNL) model. As existing scale heterogeneity capturing the variance of the error term in a random utility model can distort preference estimates obtained from the MNL model [47], to account for differences in different subgroups’ error variances and obtain more reliable and consistent preference estimates, we used a scale MNL (S-MNL) model [16, 38, 48] (Table 2).

Table 2 Model developing process and specifications

To select appropriate scale factors for the S-MNL model, we sequentially estimated two specifications of the generalised MNL (G-MNL) model [48] before estimating the S-MNL model. The first model used observed respondent characteristics to investigate taste heterogeneity (hereafter, taste MNL model). This was the MNL model expanded with (i) the attribute-specific constants (ASCs) for the worst or second-worst choices and (ii) interaction terms between attribute levels and observed characteristics of respondents to control for the variation in preferences for attribute levels between the subgroups of respondents. The second model, G-MNL, allowed for both taste heterogeneity and scale heterogeneity (hereafter, taste-and-scale MNL model). Hence, after having controlled for taste heterogeneity and minimised the unexplained variation of the model, we explored heterogeneity related to the error variance and selected the significant scale factors for the S-MNL model. Finally, a taste-adjusted S-MNL model was used to estimate population-based preference weights (described below). Table 2 describes the five-step modelling approach, and Appendix 1 describes the model specifications.

The models were estimated by maximum likelihood using the BIOGEME [49]. ‘Apply runs’ were conducted to detect significant variables capturing taste heterogeneity, using the ALOGIT [50]. Every model used level_4 of the CONT attribute, ‘cont4’, ‘I have no control over my daily life’ as a reference attribute level. The constant and position coefficients of the first attribute in the choice set for the best and worst choices were also assigned a value of zero to prevent over-identification.Footnote 2 We applied sandwich estimators to get robust standard errors, accounting for the repeated choices [51].

Scale factors and learning and fatigue effects

To investigate scale factors, we included age, gender, education, SAH, overall QoL, experience in care, residential area, housing tenure, time to finalise eight BWS tasks and best and worst choices into the taste-and-scale MNL model (Table 2). Some of these factors were tested in Netten et al. [16]. We conducted a series of scale heterogeneity analyses with different subgroups of each variable for several sets of 4 or 5 potential scale variables to compare scale parameters and select scale variables. The final scale factors that were selected based on statistical significance (p < 0.05) were used in the S-MNL and taste-adjusted S-MNL models (Table 2).

The repeated and sequential choice tasks in choice experiments can cause fatigue and learning, affecting respondents’ choice behaviour [34,35,36]. We expected that the position of a choice task in a sequence of eight BWS choice tasks would be a scale factor explaining the error variance of the model. Following Carlsson et al. [34], we defined two identical sequences of four choice tasks in the BWS experiment. We tested the presence of fatigue or learning in the second sequence of four BWS choice tasks relative to the first sequence of four BWS choice tasks. Fatigue [learning] would mean that the respondents’ choice behaviour is less consistent [more consistent] in the last four BWS tasks than in the first four BWS tasks. Correspondingly, for fatigue [learning] to occur, the variance of the error term of the S-MNL model is higher [lower] in the last four tasks than in the first four tasks [34, 52].

Final preference estimates

The preference weights should reflect the values of the Finnish general adult population. However, some socioeconomic and demographic covariates in the analysis sample were found to be over- or underrepresented compared to the general adult population (>10 percentage points of p < 0.05). This occurred in three subgroups: house/apartment renters (from housing tenure), those with lower secondary education or below (from education), and those without any religion (from religion) (Table 3). The taste-adjusted S-MNL model—i.e. an S-MNL model that included significant interaction terms between attribute levels and the subgroups above—was estimated, from which the attribute level coefficients were adjusted for significant taste differences between the sample and general populations using modified post-stratification [53] to derive the final preference weights. This correction method was also applied in previous studies [16, 19, 30, 54, 55]. The standard errors of the adjusted, population-weighted preference weights were calculated using fixed population weights (Table 3) and the estimated variance-covariance matrix of the parameters of Model [III] provided by BIOGEME [49].

Table 3 Analysis data characteristics vs. general population characteristics

We normalised the attribute-level coefficients from different estimated models using the largest attribute-level coefficient as the common denominator. To better understand quantified changes in different ASCOT-QoL states, we linearly transformed the final 28 preference estimates to an index. We anchored the ASCOT-Carer index at a value of one for the set of states presented by the seven highest attribute-level coefficients (each per attribute) and a value of zero for the set of states presented by the seven lowest attribute-level coefficients (each per attribute), keeping the relative differences between the attribute-level coefficients unchanged. Thus, the ASCOT-Carer index measuring carers’ SCRQoL ranges between zero and one, where one represents the best SCRQoL represented by the seven best ASCOT-QoL states (each per attribute) and zero represents the worst SCRQoL represented by the seven worst ASCOT-QoL states (each per attribute). The transformation method has been used to anchor country-specific preference weights [19, 30, 42, 55, 56].

Results

Sample characteristics

Compared to the general adult population, the analysis sample had more people aged 55–64 years, fewer who were employed, fewer people with the lowest educational level, a higher proportion of people with no religion (i.e. fewer people with some religion) and fewer homeowners (Table 3). 36.8% of respondents reported that they personally provided help or support to someone with long-term physical or mental ill-health or disability in the last month. Concerning how often respondents were able to put themselves in the imaginary situations described in the BWS exercises, 57.9% of them were able to do so all the time and 38.7% some of the time. Almost every respondent reported that they had understood the situations in the best-worst exercises all or some of the time (98.7%) (Table 3).

The cont1, occu2, occu1 and spac1 attribute levels were mostly selected as the best or second-best (best, for simplicity’s sake) choices (Table 4). The cont4, occu4, spac4 and safe4 attribute levels were mostly chosen as the worst or second-worst (worst, for simplicity’s sake) choices. The perc2 attribute level was preferred to the perc1 attribute level; perc2 was selected more often than perc1 as the best or worst choice and in total. For the best choices, the further away from the 1st position in the profile an attribute level is, the less likely it was selected. For the worst choices, the likelihood of selecting an attribute level increased from the 1st to the 7th position, but respondents seemed to be indifferent to the 3rd or 4th positions in the profile (Table 4).

Table 4 Descriptive statistics of attribute, attribute levels and position variables in the BWS tasks (n = 32,160)

The preference estimates

Results from the basic MNL (Model [I]) and S-MNL (Model [II]) and taste-adjusted S-MNL (Model [III*]) are reported in Table 5. In Model [III*], the coefficients of the occu3, safe4, soci1 and supp4 attribute levels were adjusted to the significant taste differences between the sample and the general populations, all other estimated coefficients being the same as in Model [III] (Supplemental Table S1). Since pseudo-R2 with values in the [0.3; 0.4] range correspond to an R2 with values in the [0.6; 0.8] range for an equivalent linear regression [57], the pseudo-R2 of 0.289 presents a decent fit for Model [III*].Footnote 3

Table 5 Estimated preference parameters for the Finnish ASCOT for carers (n = 32,160)

The inclusion of four scale factors substantially improved the goodness-of-fit of the model. The log-likelihood value increased from −38,685.26 (Model [I]) to −38,475.50 (Model [II]). The large increase in the log-likelihood value of Model [II] compared to Model [I] implied that it is very important to account for scale heterogeneity. Although the attribute-level coefficients from Models [II] and [Model III*] were quite similar, the latter was significantly better than the former by the log-likelihood ratio test {LR statistic 11.92 = –2 × (–38,457.50–(–38,469.54)); df = 47–43 = 4; p = 0.018} (Table 5). Below, we focus on the results from Model [III*] if not otherwise specified.

Across all attributes, the estimated attribute-level coefficients indicating their importance relative to cont4 were statistically significant. The three most-valued attribute levels were found within the control over daily life, occupation and space-and-time attributes (Fig. 2). The cont1 attribute level was the most-valued ASCOT-QoL state (coefficient 3.437). This was followed by the occu1 (3.343) and occu2 (3.336) attribute levels and the spac1 (3.328) attribute level (Table 5). Furthermore, within each attribute, the bottom level (level_4) was the least-valued state. The least-valued attribute level, cont4, was followed by the spac4 (coefficient 0.287) and occu4 (0.303) attribute levels. The next three smallest valued states were the safe4 (coefficient 0.608), perc4 (0.635) and soci4 (0.674) attribute levels.

Fig. 2
figure 2

The attribute-level coefficients and their 95% confidence interval for the Finnish ASCOT for carers measure

Based on the magnitudes of the coefficients, the two top attribute levels were appreciated more than the two bottom attribute levels. Except for the SAFE attribute, the difference between attribute levels 1 and 2 was not significant at a 5% level of significance. In addition, a higher value was placed on the difference between attribute levels 2 and 3 (i.e. moving up from level_3 [some needs] to level_2 [no needs]) than on the difference between attribute levels 1 and 2 (i.e. moving up from level_2 to level_1 [ideal state]) and a higher value was also placed on the difference between attribute levels 3 and 4 than on the difference between attribute levels 1 and 2. Apart from the PERC attribute, the ordering of the attribute levels described by the magnitude of their estimated coefficients followed the ordering of four levels defined for each ASCOT-Carer attribute (Table 5, Fig. 2).

The result that the coefficient of the perc2 attribute level was greater than that of the perc1 attribute level was unexpected. Due to this, we ran a new taste-adjusted S-MNL with the restriction that these coefficients were the same. This restriction did not have much influence on the estimated coefficients of the other attribute levels, while the new joint coefficient for perc1 and perc2 (Model [IV]) was the exact average of the coefficients of perc1 and perc2 (Model [III]) (Supplemental Table S1). Compared to the unrestricted model [III], the restricted model [IV] was also supported by the likelihood ratio test (LR statistic = 0.60; df = 1; p = 0.436). However, to keep the order of the ASCOT attribute levels indicating the need intensity and ease the application of the preference weights, we used the preference estimates reported in Model [III*] (Table 5), from which we switched the order of the estimated coefficients of the perc1 and perc2 attribute levels for the final preference weights to be used in practice (Table 6).

Table 6 Values of the Finnish preference weights for the ASCOT for carers’ measure

Significant position effects were found for the best choices. Compared to the top position of the presentation of the attributes, the coefficient of the second position variable (pos2_B) did not differ statistically significantly from that of the first position (p > 0.05). However, the coefficients of the position variables other than pos2_B were all statistically significant (Table 5). Moreover, the negative signs of the coefficients indicate that the respondents were less likely to choose an item in the profile that appeared after the second item from the top.

For the worst choices, the coefficients of the position variables were not statistically significant. The negative coefficients imply that the respondents were less likely to choose items located in the 6th and 5th rows of the profile than the items on the top or bottom rows when making their worst choices. Furthermore, except for pos2_W and pos2_B, the coefficients of the position variables were of lower magnitude for the worst choices than for the best choices. The results imply that the position effect was more strongly related to the best choices than to the worst choices, other things remaining constant, which was in agreement with the result from a discrete choice experiment [58].

The scale factors and learning effect

The estimated parameter for the learning scale factor exceeds 1 (Table 5). We thus found a lower error variance for the second sequence of four tasks relative to the first sequence of four tasks, suggesting that the respondent responses were more consistent in the last four tasks than in the first four tasks, i.e. that learning took place in the sequential BWS choice experiment. Our finding is consistent with that by Carlsson et al. [33], who explored learning and fatigue effects in the context of a choice experiment concerning food safety.

Regarding other scale factors, respondents who had better SAH, or a high level of education or spent more time (> 6.5 min) doing the BWS tasks were more consistent in their choices than those who had worse (i.e. fair, bad or very bad) SAH, or a lower level of education or spent less time (≤ 6.5 min) doing the BWS tasks (Table 5). The latter two scale factors were in line with the results in Batchelder et al. [19].

The final preference weights

Table 6 reports the normalised and rescaled values (i.e. preference-based index values) of the attribute-level coefficients for the Finnish ASCOT-Carer measure. Due to differences between the attribute-level coefficients and the average value of all lowest rated attribute levels [42, 55], the rescaled values can also be negative. The originally estimated coefficients of the perc1 and perc2 attribute levels were switched, as discussed above (Table 6).

Preference-based index values for the Finnish ASCOT-Carer measure can be used to measure changes in carers’ SCRQoL, for instance, due to targeted interventions aiming to improve carers’ QoL (Table 6). Since the ASCOT-Carer index is additive, an improvement in the ASCOT-QoL of an individual—for example, from an inferior state of 3442434 to an improved state of 1231321—would suggest a change in value from 0.204 [= 0.063 + (−0.027) + 0.009 + 0.069 + 0.011 + 0.069 + 0.018] to 0.808 [= 0.163 + 0.156 + 0.026 + 0.123 + 0.063 + 0.152 + 0.124].Footnote 4 This gain in SCRQoL is illustrated as the area between two acreages covered by two radars in Fig. 3. Although with a different scale, a similar figure can be drawn using the normalised values. Those who would like to utilise our developed preference weights can use the normalised or rescaled values of the final preference weights in evaluations involving the Finnish ASCOT-Carer measure (Table 6).

Fig. 3
figure 3

Changes in the Finnish preference-based index values for the ASCOT-Carer measure from a poorer state (3442434) to a better state (1231321). Preference-based index values for the Finnish version of the ASCOT-Carer measure were derived in this study (Table 6). The state of 3442434 consisted of occu3, cont4, perc4, safe2, soci4, space3, and supp4 attribute levels and that of 1231321 consisted of occu1, cont2, perc3, safe1, soci3, space2, and supp1 attribute levels

Discussion

In this study, we derived the population-based preference weights for the Finnish version of the ASCOT-Carer measure, which was translated from the English ASCOT-Carer measure [18] to Finnish in 2015–2016 [31]. The BWS choice data were analysed using an S-MNL model, considering the significant taste differences between the sample and general adult populations. Moreover, we provided evidence on the learning effect in the BWS experiment.

Both the most and least-valued attribute levels of the Finnish ASCOT-Carer measure were found in the occupation, control and space-and-time attributes. Compared to English preference weights that were derived using a similar analysis framework [19], Finnish respondents valued most highly the attribute levels within the control, occupation and space-and-time attributes (Supplemental Figure O1). The most preferred attribute level was cont1 state in Finland, while it was occu1 in England. For both countries, the least preferred attribute level was the cont4 state with a negative preference-based index value: − 0.027 (Finland) and − 0.012 (England). Although the rank order of the derived preference weights is similar between the two countries, there are clear differences in the magnitude of the country-specific preference weights, which could stem from differences in the country-specific populations’ preferences and values or norms. These differences justify the contribution of this paper to developing the Finnish preference weights for the Finnish ASCOT-Carer measure.

The significant position effect we found for the best choices was in line with the English [19] and Austrian [30] studies. To mitigate position bias affecting choice behaviour and decision rules, which can result in invalid coefficient estimates, the position of the attributes in the BWS profiles should be rotated to ensure that each item will appear an equal number of times in each profile. This was earlier noted by Campbell and Erdem [58]. Since the position effect can bring about invalid preference estimates [34], in addition to randomisation at the experimental design stage, researchers can include position-specific constants of the attributes into the model to account for the order of the profiles.

The significant scale factors found in this study suggest that researchers should account for scale heterogeneity because varying error variance across different sample population groups can distort preference estimates [47]. This also calls for approaches that can disentangle scale heterogeneity from taste heterogeneity to make accurate estimates about people’s preferences [59]. This, in turn, supports our approach of investigating taste heterogeneity first (using the mixed logit with observed characteristics of respondents) and then scale heterogeneity after having controlled for taste heterogeneity (using the G-MNL) before obtaining the final preference estimates from the S-MNL model.

Education and health as scale factors are known to be related to cognitive functioning [42, 60]. Besides implying the use of heuristics to quickly make choices [61], short response times can imply respondents’ reduced effort to engage in the BWS tasks or to properly consider the available alternatives. The evidence of the learning effect in the sequential BWS choice experiment is consistent with the previous choice experimental studies [52, 62]. As we had two identical sequences of four BWS tasks, the finding implies the more consistent responses in the second half of the experiment than in the first half. We also tested other sequential divisions of the BWS choice tasks as a scale factor (such as 1 task vs. 7 tasks; 2 tasks vs. 6 tasks; 3 tasks vs. 5 tasks), but they were not statistically significant. The learning effect implies that future studies that collect and use sequential choice data should develop study designs that can reduce possible preference uncertainty at the beginning of the experiment and increase respondent engagement throughout the experiment. Concerning scale heterogeneity, researchers can account for the effect of learning and fatigue on the preference estimates by explicitly modelling learning or fatigue as a scale factor using the sequences of the BWS tasks.

There is evidence that modes of survey administration, such as Internet-based surveys, might result in stronger fatigue effects and weaker learning effects [36]. Although the survey including the BWS experiment was Internet based, we found the learning effect. Prominent differences in the preferences for SCRQoL from two models, which used online BWS data and face-to-face interview data, were not observed [63]. The final pattern of learning and fatigue as a research question is beyond the scope of this study. However, it might be relevant to investigate in more detail the potential impact of the learning effects on preference stability and how learning styles and preference uncertainty vary between the individuals [34]. The found learning effect would suggest that these issues could extend to also consider the BWS method in different survey administration modes.

This study contributes to expanding the number of valid measures that can be used to evaluate capability-based QoL in a general population [37] and to consider the evaluation of outcomes and interventions beyond health [64, 65]. Since the ASCOT [16] focuses on measuring care recipients’ SCRQoL and the ASCOT-Carer [19] focuses on measuring caregivers’ SCRQoL, both measures can be in use for the evaluation of social care interventions. Finnish preference weights for the ASCOT measure have been established [29].

Our study has some limitations. Despite the use of sampling quotas, the online panel was not fully representative regarding housing tenure, education and religion. However, we adjusted the preference weights to better reflect the values of the Finnish general adult population, which was done in the studies [19, 30], but in addition, we computed the standard errors of the adjusted final preference estimates, which was not carried out in the mentioned studies. With the used survey administration method, we were not able to monitor external and internal incentives or impetuses during the BWS experiment, such as the respondents’ behaviour, engagement and burden, and changes in the task environment. Nevertheless, respondents who spent less than 4.5 min doing the BWS tasks were excluded during the data collection phase.

We have successfully derived the Finnish preference weights for the Finnish ASCOT-Carer measure. The preference weights established here will enable researchers in Finland, for the first time, to consider the value of different social care interventions for evaluating support and services to informal carers. The learning effect, as one of the significant scale factors, implies not only the importance of accounting for scale heterogeneity in the choice experiments but also that future studies with sequential choice tasks should develop study designs such that they ensure equal consideration to all choice tasks (or profiles) for the attributes in the profiles to reduce potential preference uncertainty at the beginning of the experiment and increase respondent engagement in the experiment.