Introduction

Complex decisions, such as selecting a job candidate or a vacation package, are among the most demanding and challenging human activities. A major cause of this challenge is the presence of trade-offs between attributes (e.g., intelligence vs. motivation for job candidates) that are difficult to compare. While a normative theory, based on weighted additive utility (WADD), was developed by early decision theorists (Keeney & Raiffa, 1976), a widely accepted view considers that the computations required for the normative WADD algorithm are too complex for online human decisions (not assisted by offline calculations or external aids). Accordingly, it is often assumed that when faced with such decisions, humans typically resort to a number of simplifying non-compensatory heuristics, such as Take-the-Best (TTB), according to which one chooses on the basis of the most important attribute (in case of a tie, the second most important attribute is considered; Gigerenzer & Goldstein, 1996, 1999; Payne, Bettman, & Johnson, 1993; but see Newell, 2005, for a critique of this approach and suggestions of formal models of ecological rationality). Such heuristics simplify the decision algorithm, by replacing the compensatory processes – in which all the attributes are weighted into the decision – with a non-compensatory one, in which only a small subset of the attributes is taken into account (Dieckmann & Rieskamp, 2007; Gigerenzer & Goldstein, 1999; Tversky, 1969, 1972).

Recent research has challenged the assumption that compensatory strategies are too complex and thus beyond daily decision-making ability. First, numerous studies in the domain of probabilistic inference with binary cues have shown that even when environments are designed to promote the use of TTB heuristic, a significant proportion of participants do not “take the best” (e.g., Bröder, 2000; Lee & Cummins, 2004; Newell & Shanks, 2003). Second, more recent experimental work has demonstrated that most participants make probabilistic inferences based on multiple cues in a compensatory yet rapid and automatic manner (Glöckner & Betsch, 2008, 2012; Glöckner, Hilbig, & Jekel, 2014). Other research has manipulated time pressure, confirming the presence of compensatory strategies with a 3-s response-deadline and, for some participants, even for a strict deadline of 750 ms (Oh et al., 2016).

A mechanistic account of such an automatic yet compensatory decision process was proposed by Glöckner and colleagues in the form of the PCS model (Glöckner et al., 2014). PCS is a connectionist, accumulator-type model, that integrates (using a parallel architecture) differences in weighted evidence (ΔWA) between the alternatives and predicts slower reaction times (RTs) for decisions with smaller ΔWA (Glöckner & Betsch, 2008, 2012; see also Roe, Busemeyer, & Townsend, 2001 for an accumulator-type Decision-Field-Theory model of multi-attribute decisions with similar RT predictions).

In this paper we demonstrate an ability to make complex decisions using compensatory, rapid, and automatic mechanisms in a different domain: multi-attribute decision-making based on numerical (non-binary) attributes. Such decisions normatively require a weighted averaging computation, traditionally associated with analytical processes. Moreover, multi-attribute decisions with non-binary attributes have received less attention in recent research (see Russo & Dosher, 1983 and Tversky, 1969 for some older studies) and they differ from binary cue decisions in a number of important aspects (problem-space is virtually infinite and precludes the use of simplifying strategies, such as memorizing or counting). Therefore, if rapid and compensatory (WADD) strategies can be deployed in this domain, this would provide support for the impressive power of the intuitive decision-maker.

Recent research has shown that an important precursor of WADD – numerical averaging – can be estimated in a relatively precise and yet automatic manner (Brezis, Bronfman, Jacoby, Lavidor, & Usher, 2016; Brezis, Bronfman, & Usher, 2015; Rusou, Zakay, & Usher, 2017). Here we set to test whether this ability extends to weighted averaging, by employing a job selection multi-attribute decision task.

Experiment

Participants were asked to take the role of a job interviewer who chooses one of two candidates based on the candidates’ abilities on several attributes and their relative importance. We varied (in blocks) the number of attributes (three/four/five), and we presented a large set of decision problems with randomized values (see Methods). This design allows us to contrast decision strategies within each participant using choices and RTs. While the TTB heuristic predicts slower decisions in cases there is a tie on the most important attribute, PCS (or other accumulator models) predicts decision times that increase with lower ΔWA. Another central question of interest is whether the deployment of compensatory strategies results in improved task performance.

Method

Participants

Twenty-six students from Tel-Aviv University (14 females, age: 19–31, M=24.7) participated in the experiment, in exchange for course credit and payment that was dependent on performance. On average, participants received 30 NIS (~7.5 USD). The sample size was set at 26 with each subject tested in three tasks, allowing for 78 (26 × 3) classifications in total to be made (see Strategy Classifications section).

Materials

Each decision was presented in a table-format (see Table 1). Three jobs were presented, with three, four, and five attributes, respectively. Each job specified the attributes' importance (i.e., weight; see Table 1). When the job had three attributes, the specified importance-weights were 3, 2, and 1 (i.e., the most important attribute was three times more important than the least important attribute), for the four-attribute job they were 4, 3, 2, and 1, and for the five-attribute job 5, 4, 3, 2, and 1. The values the candidates received in each trial were generated randomly, as random integers between 1 (poor) and 9 (excellent; from a uniform distribution; if the resulting weighted average for the two candidates was the same, the ratings were generated anew).

Table 1 Example of a trial in which the job had four attributes, with weights of 4, 3, 2, and 1. Here candidate A had the higher weighted average (5.2 vs. 4.3 for candidate B) and so she should be selected for the job, while candidate B should be selected according to TTB heuristic

A time limit for providing an answer was imposed, in order to encourage participants to rely on their intuitive mind-set and not explicitly compute weighted averages (Horstmann, Hausmann, & Ryf, 2010). The time limit increased with the number of attributes that had to be considered to make sure that all the information can be encoded. The time limits were 3 s for the 3-attributes, 4 s for the 4-attributes, and 5 s for the 5-attribute jobs. As we report below, however, the time constraints did not affect the actual decision times.

Procedure

Participants completed 600 trials overall, with three blocks of 200 trials for each job (3, 4, or 5 attributes). On each trial, a choice problem (see Table 1) was presented until the participant entered a decision by using the keyboard. Visual feedback (correct/incorrect) was given after each trial, based on the weighted averages. Feedback was also given on the number of correct trials the participant accumulated, which was translated to monetary reward at the end of the experiment. Once the time limit expired, the trial ended, and the visual feedback, (“too slow,” was presented. The whole procedure took approximately 60 min (see Supplementary Material for details).

Results

Group analyses

Accuracy

To test effects of difficulty and task practice (defined across four chunks of 50-trial sub-blocks) on task accuracy, a repeated-measures ANOVA was carried out, with number of attributes (three/four/five) and sub-block as within-subject factors. As shown in Fig. 1 (solid lines), there was a main effect of difficulty (F(2,50)=37.39, p<.001). As the number of attributes increased accuracy dropped from 90% for the three attributes to 86% for four attributes and 84% for five attributes. No main effect of sub-block emerged (F(3,75)=1.16, p=.329), indicating that extensive learning is not necessary. For each number of attributes, the task-accuracy was higher than a bound obtained from an error-less version of the TTB heuristic (dashed lines), F(1,25)=74.29, p<.001.

Fig. 1
figure 1

Task-accuracy. Solid-lines: accuracy as a function of the number of attributes and of trial-number (in 50-trial blocks). Dashed lines: theoretical performance of TTB heuristic. Error bars represent within-subject standard errors (Cousineau, 2005)

Reaction times

Mean RT and the average number of trials (out of 200) in which the time limit was missed are given in Table 2.

Table 2 Mean reaction time (RT) (standard deviations in parentheses) and average number of trials in which the time limit was missed (out of 200), for each number of attributes

A repeated-measures ANOVA, with number of attributes (three, four, five) as within-subject factor, revealed no effect of number of attributes, F(1.25, 31.23)=0.056, p=.865. Thus, although the task's difficulty increased and more information had to be considered, participants did not require more time. The number of trials in which the time limit was missed was only 0.6% of all trials and the average decision time was of around 1.5 s (see also Glöckner & Betsch, 2008, for similar results).

We also examined the correlation between RT and task accuracy. For each participant, we calculated the correlations between RT and accuracy, across all trials, separately for each number of attributes. The resulting mean correlations were negative, as predicted by the automatic WADD mechanism: r = -0.199 for three attributes, r = -0.203 for four attributes and r = -0.172 for five attributes. Interestingly, the negative correlations between accuracy and RT remained even after controlling for the trials’ difficulty: the partial correlation between RT and accuracy remained negative: r = -0.102 for three attributes, r = -0.084 for four attributes, and r = -0.062 for five attributes. While these partial correlations are small, they are all significantly different from 0 (all p’s<.05, tested using a bootstrap procedure with 10,000 resamples) and in the same direction, suggesting that taking longer to decide reduces accuracy.

Strategy classifications

We next examined individual differences in decision strategies. We tested three potential strategies that participants could use when performing the task: the weighted average WADD, TTB, and the Equal Weights rule combined with the TTB heuristic (EQW-TTB). According to the EQW-TTB strategy, one chooses the alternative for which the non-weighted average is highest (i.e., the subjects average the values but ignore the importance weights). In cases where the non-weighted average for both alternatives is the same, participants choose according to the TTB heuristic, thus its name – EQW-TTB.Footnote 1

We start using a simplified “trembling-hand” approach (Bröder, 2010), according to which the subject has a probability p to mistakenly report an alternative not predicted by the choice strategy. We use this approach in order to obtain an upper bound on the proportion of TTB use (we defer to the computational section, where we examine a more refined type of strategy-classifications allowing probabilistic errors and strategy mixtures).

The classifications were done separately for three/four/five attributes to test whether increased difficulty leads to more reliance on non-normative strategies. To classify the participants based on the three strategies – WADD, TTB, and EQW-TTB – we computed the probability of the data (200 choices) for each strategy and we selected the strategy that has the highest probability; see Supplementary Material for details of the classification procedure. The classification results are shown in Table 3 (see Table S1 in Supplementary Material for individual classifications and Tables S2, S3, and S4 for the normalized probabilities of the three strategies); 82% of the classifications (64 out of 78) are associated with normalized probability larger than .99, and 88% (69 out of 78) with a probability larger than .90.

Table 3 Number of participants classified as users of each one of the three strategies (WADD, TTB, EQW-TTB) as a function of the number of attributes

As shown in Table 3, the majority of classifications (46 out of 78 in total, ~59%) belong to the compensatory (normative) WADD strategy and another 8% (six in total) were a less optimal but still compensatory EQW-TTB strategy. Only 29% of the classifications (23 in total) fell into the non-compensatory TTB category. The amount of WADD classifications did not vary with the number of attributes. A summary of participants’ accuracy as a function of strategy is shown in Fig. S1 (see Supplementary Material). We find that users of the WADD strategy had higher accuracy than TTB users, t(67)=3.08, p=.003. As reported in the Supplementary Material, this is not due to a speed-accuracy tradeoff.

Attributes’ weights

Using logistic regression, we computed the subjective weights each participant gave to each of the attributes. Figure 2 shows these subjective weights for the three strategy subgroups (see Fig. S2 in the Supplementary Material for the group weights). These weights indicate that WADD users are better calibrated with the objective weights, the results of the TTB users show a strong overestimation of the most important attribute, confirming their reliance on a single dimension. A repeated-measures ANOVA on the attributes’ weights of users of the TTB and WADD strategies revealed an interaction between the strategy used and the attributes’ weights, for every number of attributes – for three attributes: F(2,42)=21.36, p<.001, for four attributes: F(3,63)=13.93, p<.001, and for five attributes: F(4,92)=15.40, p<.001. The EQW-TTB users showed the flattest curves, consistent with the equality of weights characterizing this strategy.

Fig. 2
figure 2

Subjective weights for jobs with three (left), four (middle), and five (right) attributes, classified by strategy used. Error bars correspond to standard errors

Reaction times: strategies and correlation with accuracy

The WADD and the TTB strategies differ in their predictions concerning RT (Glöckner & Betsch, 2008). While according to TTB the RT should depend on whether there is a tie on the most important dimension, according to WADD the RT should depend on the absolute difference in the alternatives' weighted averages (ΔWA). To test this prediction, we applied to the log-RT-data of each participant (we used log-RT in order to normalize the otherwise skewed values in the RT-distribution that may involve outliers; see also Glöckner & Betsch, 2008) a multiple linear regression with two factors: (i) ΔWA, (ii) a binary predictor of a tie on the most important dimension (i.e., the most important attribute; see Table 1 for an example on which the tie variable equals 1 and ΔWA=0.9). We compared the standardized regression coefficients for the participants who were classified as WADD users and those classified as TTB users. As predicted, the difficulty coefficient was stronger for the WADD users (M = -0.43, SD = 0.12) compared with the TTB users (M = -0.32, SD = 0.12; t(67)=3.56, p<.001; Fig. 3, left), while the tie coefficient was higher in magnitude for the TTB users (M = 0.06, SD = 0.14) compared with the WADD users (M = -0.02, SD = 0.09; t(31.8)=2.31, p=.027; Fig. 3, right). Unlike for the TTB users, for the WADD users the tie-coefficient was not significantly different from zero.

Fig. 3
figure 3

Averaged standardized regression coefficients of: left (ΔWA – difficulty) – for graphic purposes we plotted the negatives of the difficulty-coefficients, and right (Tie\No Tie) – whether there was a tie on the most important dimension, separately for participants classified as users of WADD and TTB strategies. Error bars represent standard errors

Computational-models of strategy choice: beyond the trembling hand

While the trembling-hand classifications appear to have some validity, as they are supported by differences in subjective weights (Fig. 2) and in RTs (Fig. 3), there are a number of reasons to suspect that these classifications are a simplification and that the participants vary in a more continual, non-dichotomous, manner. First, the weights are subject to individual variability even within a strategy group, indicating continuity rather than dichotomous strategies. Second, even for participants classified as TTB, we obtain a WADD component in the RT-regression (Fig. 3). Finally, as recently discussed by Hilbig and Moshagen (2014), the trembling-hand type of error is not well matched with the natural assumptions of a WADD model, according to which choice-problems with lower ΔWA are expected to have more errors. In order to extend the strategy classification to address these issues, we examined a number of computational models and carried out model-comparison using the aggregate Akaike Information Criterion (AIC; Akaike, 1973) and Bayesian Information Criterion (BIC; Schwarz, 1978). As there are only six EQW classifications in our data (out of 78), we discard these and focus on contrasting between WADD and TTB.

Three new models were examined: (i) A probabilistic model, which in each trial deploys WADD with probability p and TTB with probability (1-p). While this model assumes a trembling-hand error (as before), p provides a continuous measure of the degree of WADD use. (ii) A model that is like (i) with the exception that the WADD errors are not due to a trembling hand assumption, but rather are assumed to reflect Gaussian fluctuations in the WADD estimation (with zero mean and whose SD is a new model parameter); we still have a trembling hand parameter for errors of the TTB heuristic (see Lee & Newell, 2011; Scheibehenne, Rieskamp, & Wagenmakers, 2013; for similar approaches to mixture models in decision-making). (iii) A fully compensatory model, whose weights are characterized by a single parameter, α, based on normalized Wαi, (where Wi are the normative weights (e.g., 4,3,2,1); note that α>1 results in an over-weighting of the high weights and under-weighting of the low weights, as in some version of the PCS; Glöckner et al., 2014). For each model, the parameters were fitted based on the probability of the data given the model (see Supplementary Material). The results are summarized at the group level in Table 4 (see Supplementary Material for individual participants’ classifications).

Table 4 Akaike Information Criterion (AIC) tablea for models of strategy choice

We observe a clear picture. The single-strategy trembling-hand models provide the worst fit, followed by the compensatory α-weight model, and then by the probabilistic strategy mixture WADD/TTB model. The best model by far is the mixture model with Gaussian WADD errors. Importantly, the proportion of WADD use in this model shows high consistency among the participants across the number of attributes (see Suppl.). Furthermore, we also find high correlations between the proportion of WADD use in the probabilistic model of individual subjects and subjective α-weights (all |r|s >.75; p<.001; see Supplementary Material).

Discussion

In this study, we asked whether participants can rapidly carry out a complex weighted-averaging task. We used a job-interview framing and provided the participants with accuracy feedback. The number of attributes varied from three to five, a range exceeding the capacity of online analytical computations for speedy decisions. The results were surprising. First, the decision times (Mean-RT ~1.5 s, which includes the visual encoding and the motor response) were much faster than the maximum allotted time, indicating reliance on intuitive gist-perceptions or heuristic rules (Saunders & Buehner, 2013). Second, despite the short RTs, the accuracy exceeded the bound that could be obtained on the basis of (error-less) non-compensatory strategies, such as TTB. Third, we found a negative correlation between accuracy and decision-time, consistent with evidence integration models, such as those based on the Decision-Field-Theory (Roe et al., 2001), the drift-diffusion model (Krajbich, Armel, & Rangel, 2010) or PCS (Glöckner et al., 2014). These results are consistent with those obtained by Glöckner and Betsch (2008, 2012) in a multi-cue probabilistic inference task, and with their proposal of an automatic compensatory mechanism.

We examined two simplifying heuristics: The first one (TTB) is non-compensatory, while the second one (EQW-TTB) is compensatory, but neglects the importance of the decision attributes. While the group-level performance exceeded the accuracy bound achievable from an error-free TTB heuristic, at the individual participant level we found a certain amount of variability. The simplified (dichotomous) classification showed that, while most participants relied on compensatory strategies, about 30% relied on the TTB heuristic. These participants were characterized by a peaked decision-weight pattern that overestimates the most important attribute (Fig. 2), by reduced task accuracy (without a Speed-Accuracy trade-off). TTB-users were also slower in trials with a tie on the most important dimension (Fig. 3, right panel). Unlike TTB users, most participants appeared to deploy a compensatory strategy that is likely to involve a noisy estimation of the weighted average (WADD; see also Glöckner & Betsch, 2008). The more refined (mixture) classifications indicate a continuum for participants' probability of deploying a compensatory WADD strategy in each trial, ranging from a minimum of .23 to a maximum of 1.

We suggest that the presence of variability in decision strategies across the group reflects two potential ways of dealing with time pressure and information overload in decision making. The non-compensatory TTB heuristic is a lexicographic strategy that applies rules sequentially and neglects much of the information. Automatic and compensatory (WADD) strategies offer an alternative way to deal with information overload. Instead of "calculating" the weighted average, these participants appear to carry out an "approximate" (noisy), but holistic estimation of it, consistent with an affective/intuitive decision mode (Kahneman, 2003). In particular, intuitive/holistic averaging is consistent with Kahneman's suggestion that intuitive processes are holistic in nature (see also Glöckner and Betsch, 2008) and are at the interface of perception and cognition (Kahneman, 2003). This suggestion was also supported by recent empirical results showing dissociations between intuitive and analytical averaging based on load manipulations (Rusou et al., 2017).

A potential mechanism to perform noisy weighted averaging estimations is Glöckner and colleagues' PCS model (Glöckner, Hilbig & Jekel, 2014). According to this model, the weighted average is computed in a neural network, which multiplies a values-vector with an importance-weights matrix. As our task involves some practice, the assumption that the decision mechanism includes learned weights (reflecting the attributes' importance) is not implausible.Footnote 2 Alternatively, the mechanism of weighted averaging could be mediated by a population code model (Brezis et al., 2016; Brezis et al., 2015; Brezis, Bronfman & Usher, 2017), which operates using numerosity detectors (Dehaene, Molko, Cohen & Wilson, 2004; Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004). Future research is needed to probe the nature of the individual differences underlying the reliance on sequential and holistic processing in multi-attribute decisions.