1 Introduction

Theoretical and empirical research underlines the complex nature of negotiation (Fisher et al. 2011; Lewicki et al. 2011; Starkey et al. 2005; Thompson 2015). Its complexity results from a number of factors that comprise the negotiation problem and context, including behavioural, demographical, sociological, economical, and political. These factors need to be considered by the parties when they prepare for and conduct negotiations.

To make the negotiation easier various support methods have been developed. Some methods have procedural character and allow to organize and coordinate pre-negotiation activities, actual conduct of negotiation, and/or post-negotiation activities. These may include recommended pre-negotiation checklist (Simons and Tripp 2003) or chess strategy for building the bargaining approach (Perrotin and Heusschen 2002). Others are focused on the decision making aspects of negotiation and provide the parties with tools facilitating the process of preference analysis and searching for satisfying agreements (Raiffa et al. 2002). From the economic viewpoint, the latter approach to negotiation support seems to be of crucial importance. It allows for the measurement of the value (utility) of the negotiation outcomes, evaluation of the parties’ performance, and analysis of the agreement’s efficiency. This support is a domain of negotiation analysis—a theoretical approach developed in the early 1980s (Raiffa 1982; Young 1991), that focuses on designing decision-analytic techniques.

The core of negotiation support models provided by negotiation analysis includes a preference elicitation procedure. The procedure is used to determine the rating (part-worth) of every negotiation issue and its every option value allowing assigning a number to every alternative, including offers and counteroffers. In effect, a scoring system defined by a series of marginal utility (value) functions is specified. Usually, the multiple criteria decision aiding (MCDA) methods are used to elicit the preferences and build the scoring systems (Figueira et al. 2005; Wachowicz 2010). An additive model has been found to represent the decision-maker’s preferences adequately (Choo et al. 1999); their relative ease of use and interpretation of results led to the developments of such methods as SMARTS (Edwards and Barron 1994), AHP (Saaty 1980), and hybrid conjoint analysis (Kersten and Noronha 1999). Some of these methods were implemented in teaching and training negotiation support systems (NSS), e.g., in Inspire (Kersten and Noronha 1999), Negoisst (Schoop et al. 2003), and NegoCalc (Wachowicz 2008) as well as in the real-world e-commerce applications, e.g., OpenNexus (opennexus.pl) and SmartSettle (www.smartsettle.com) (Thiessen and Soberg 2003).

NSS must rely on cognitively low demanding and easy to use methods, which also can specify ratings relatively quickly and with no tiresome interaction. This is important, because ratings should be determined thoroughly and accurately that any misinterpretations of the negotiation progress, offers and concessions can be avoided (Jang et al. 2017). The diligence in the scoring system specification is even more important when agents conduct negotiations on behalf of principals (Eisenhardt 1989). If the agents’ ratings represent their principals’ preferences adequately, the agents may be sure that decisions they make during the negotiation reflect the principals’ interest. If, however, the agents’ ratings poorly reflect the principals’ preferences, then the agents may achieve results that are unsatisfying, inefficient, or even cause losses to the principals.

Agency theory is concerned with the specification of the reward structure which would motivate the agents to achieve the best possible outcomes for their principals (Bosse and Phillips 2016; Laffont and Martimort 2009). Behavioral studies show that the agents have different social-psychological profiles which affect their intrinsic motivation (Bottom et al. 2006; Nilakant and Rao 1994). A problem that appears to be neglected in the literature is the agents’ ability to reconstruct the principals’ preferences using different information representation. The purpose of this paper is to address the following two questions:

  1. (1)

    Do different principal’s preference visualization schemes influence the accuracy of the agents’ ratings; and

  2. (2)

    What is the impact of (in)accuracy of the representation of the principal’s preferences on the agreements negotiated by the agents and the agents’ perception of the negotiation process and these agreements.

There is a number of studies devoted to the problem of information representation and visualization in decision making (Korhonen and Wallenius 2008; Miettinen 2014). They indicate many possibilities in which the preferences may be presented by the principals and imparted to their agents. Some recent experiments show that the most accurate way in representing the preferential information is to organize them in a form of table (Roselli et al. 2018), which requires operating directly with the quantitative information. However, if the principals were able to provide their agents with precise quantitative information, the latter would only have to copy this information into the scoring system and any problems that could occur with precise mapping of such information into a scoring system would be related to the bounded awareness of the agents (Chugh and Bazerman 2007). They could be simple typos or errors related to change blindness or need for closure.

If we assume that the preferences cannot be imparted by means of precise numbers due to cognitive limitations of the principals, then visualizing them graphically can be considered as the alternative way for describing their structures. Two most popular and simple ways of graphic visualization operate with bars and circles/pies. Historically bars were considered as being more precise in presenting the facts, and numbers in particular (Macdonald-Ross 1977). They are unidimensional (the height matters only) and do not cause problems in comparing as the circles do (the latter are two-dimensional and the comparison of the areas or diameters may lead to different results). Yet, the recent experimental studies question their supremacy, showing that some interpretative problems occur also for bars and it may have an impact on understanding the problem and making better decision in multiple criteria decision making (Kolodziej et al. 2016; Roselli et al. 2018). This shows the necessity of verifying, how these two most popular ways of visualizing numbers perform in the prenegotiation providing the agents with accurate information about their principals’ priorities, how they impact on the final ratings the agent use (the scoring systems they build) and how these scoring systems affect the proper understanding of the negotiation process and quality of the outcomes the agents negotiate. The descriptive conclusions regarding the aforementioned issues comprise the research contribution of this manuscript and are novel for the theory of negotiation and negotiation analysis. Note that despite there are some earlier research studying the issue of visualization in the negotiation support (Gettinger et al. 2012; Weber et al. 2006) they are rather focused on using different visualization techniques during the negotiation process and none of them focus on the potential effects of misinterpretation of preferences visualized by the principals at such early stage as in prenegotiation and their impact on the preference elicitation results (the scoring systems) and the bargaining process.

To address the goal of this paper an electronic negotiation experiments were organized and conducted in the Inspire system with a predefined multi-issue bilateral business negotiation case. The datasets from the experiments were used to assess the agents’ ability to represent the principal’s preferential information accurately and to measure the degree of inaccuracy.

The paper consists of four more sections. In Sect. 2 we describe the variety of research devoted to the problem of information visualization in multiple criteria decision making and negotiation. Then in Sect. 3 the issues related to negotiation support are discussed, that are specific to principal-agent context. We describe the issue of building the scoring system by negotiators, and show how the scoring systems of agents are related to the principal’s preferences, the problem of imparting these preferences using two graphical ways is also presented. In Sect. 4 we discuss the issue of measuring the accuracy of agent’s scoring systems with principal preferences. Then, in in Sect. 5 the negotiation experiment is described. We present shortly two experimental setups that differ in the way the preferential information was imparted to the agents. Then we present the results of the scales of inaccuracy in these two setups and examine their influence on the negotiation process and results in Sect. 6. Discussion is provided in Sect. 7.

2 Data visualization in decision making and negotiation processes

The graphical visualization is widely used to represent information. The popular sentence says “A picture is worth a thousand of words”, which refers to the fact that even a complex idea can be presented with just a single picture, which is more effective than a verbal description of this idea. Information visualization is also an important element of information presentation in multiple criteria decision making problems and decision support systems (Liu et al. 2014). Graphical representation of information helps the decision maker to capture the similarities or differences easier as well as understand the relationships between the alternatives. Several papers (Korhonen and Wallenius 2008; Miettinen 2014; Roselli et al. 2018) discuss the possible ways of visualizing the set of discrete alternatives graphically in the context of multiple criteria decision making process.

Miettinen (2014) classified visualization tools into six classes: commonly known techniques involving bars, scatter plots or value paths, techniques using circles and polygons, icons, techniques based on hierarchical clustering, projection-based techniques and others. She also analyzed the advantages and weaknesses of presented tools providing some comments about their usability. She pointed out that “none of the graphical representations can be claimed to be better than the others but some fit certain problem settings better than the others. It is always good to leave the final decision to the decision maker who can select those visualizations (s)he is most comfortable with.”

DeSanctis (1984) and Vessey (1991) collected the number of studies in literatures showing that the effect of graphical and tabular information representation in decision making is inconsistent. Some results show that graphs performed better than tables, while other conversely, also other studies showed no differences. Recently, Roselli et al. (2018) studied the potential of using graphical visualization in the FITradeoff Decision Support System (DSS) by undertaking an eye-tracking experiment and applying it to a particular decision problem. They used five types of visualization in the experiment: Bar Graph, Bubble Graph, Spider Graph, Table and Bar Graph with Table. The preliminary studies indicated that the use of tables led the participants to better answers than other visuals.

However, Engin and Vetschera (2017) pointed out another issue that may play a role in understanding the visualization. They presented the results of experiment where the relationship between cognitive style and decision performance was analyzed, when using the tabular or graphical representations. They suggested that information representation in decision support needs not only to match task characteristics, but also the cognitive style of decision makers. The results “confirm that a mismatch between information representation and cognitive style indeed has effects that last beyond the solution of the current decision problem”.

In another paper (Gettinger et al. 2013) the impact of problem representations, its complexity, and the user characteristics on a wide range of outcomes were studied. In a series of laboratory experiments two visual representations, parallel coordinate plots and heatmaps, to the numerical tables were compared using subjective as well as objective measures of the decision process and solution quality. They noticed that “different problem representations induce differences in the decision making process, these differences do not seem to have long term effects on either problem understanding or performance in an ex-post test”. They also found “considerable difference between objective characteristics of the decision process and its subjective evaluation by participants”.

The issues of information visualization were also studied in the negotiation context. Gettinger et al. (2012) presented the results of the laboratory experiment concerning the influence of information presentation in three alternative formats (table, history graph and dance graph) on the negotiators’ behavior and negotiation outcomes. The results show that graphical information presentation supports integrative behavior and the use of non-compensatory strategies. Again, the results are linked to some specific profiles that take into account the behavioral, negotiation and decision making characteristics of the negotiators. Weber et al. (2006) reported on the effects of using the graphical representation of the negotiation process in bilateral negotiation conducted via Inspire system compared to negotiations conducted using the same system without such a representation. It appeared that the existence of negotiation history graph must provide the negotiators with some additional information that they are in no need to get involved in an extensive dialog with their counterparts. The authors pointed out that the history graph reduced the intensity of communication of 334 words. Finally, Kolodziej et al. (2016) in their experimental study examined the effects of priority awareness i.e. the awareness of one negotiator about the priorities of their counterpart, on the negotiation outcomes. The priority awareness was created using bar charts. They noticed that despite the bar charts are quick to make and seem easy to understand, the ability to read and interpret even ordinary bar charts correctly is rather limited. It is not necessarily intuitive and has to be learned.

The aforementioned studies clearly show that the information visualization, sometimes together with some other factors, may play a role in understanding the decision making and negotiation process and the results the parties obtain. However, the sources of such misinterpretation may not only depend on the visualization offered during these processes, but also earlier, i.e. in the prenegotiation where the formal scoring systems are determined and later used in the negotiation support in the bargaining phase. Thus, the vitality of this phase and the potential problems with adequate visualization of preferences in prenegotiation in the principal-agent context will be discussed in the next section.

3 Negotiation support in principal-agent context

3.1 Negotiation problem and the scoring systems

To support the decision that negotiators make during the negotiation process a thorough prenegotiation preparation is recommended (Stein 1989). In the prenegotiation phase the negotiation problem is identified and structured (Raiffa et al. 2002), which amounts to precise definition of the negotiation issues and possible solutions for these issues (options). Then a scoring system is built, which is a formal system of quantitative ratings describing the negotiator’s preferences for the set of negotiation issues and options, and which can be used to evaluate any negotiation offer in the forthcoming negotiation. To determine such a system a preference elicitation process needs to be conducted according to some selected MCDA technique. Despite the theory and practice of MCDA provides an extensive list of decision support methods (see Figueira et al. 2005; Tzeng and Huang 2011) their application to negotiation support is rather limited to few of them only, such as AHP (Mustajoki and Hamalainen 2000), even swaps (Thiessen and Soberg 2003; Wachowicz 2008) or TOPSIS (Roszkowska and Wachowicz 2015a). However, the most popular approach used for determining the negotiation offer scoring system derives from the direct rating techniques such as SMARTS (Edwards and Barron 1994). In this paper we will assume, that the preferences are elicited by means of direct rating, that they are additive, and all the issues (criteria) are preferentially independent. Consequently, the issue of offers incomparability does not occur.

To make the prenegotiation preference elicitation easier the negotiations problems are often discretized, which allows to define the limited and countable sets of resolution levels for each negotiation issue. This seems feasible for qualitative issues but not the quantitative ones (such as price, time of delivery, percentage rates etc.), the resolution levels of which are usually defined in form of ranges. In such a situation, the negotiators are asked to specify the salient options only that represent the key reference values within each feasible range (Kersten and Noronha 1999). Hence, the structure of the negotiation problem \(P\) can be defined formally by means of set \(F\) of \(n\) issues and the sets \(X_{j}\) of salient options for each negotiation issue \(j\)

$$P = \left\{ {F,\left\{ {X_{j} } \right\}_{j = 1, \ldots ,n} } \right\},$$
(1)

where \(F = \left\{ {F_{j} } \right\}_{j = 1, \ldots ,n}\), \(X_{j} = \{ {x_{k}^{j} }\}_{{k = 1, \ldots ,n_{j} }}\) and \(n_{j}\) is the number of options for issue \(j\).

The direct rating approach requires evaluating all the elements of the problem \(P\) at the disaggregated level, i.e. independently from the potential offers they may comprise. Thus, the negotiator assigns numerical scores to the issues that reflect their weights, and to each salient option \(x_{k}^{j}\) to describe their profitability. As the result the scoring system is built that can be formally represented as

$$S = \left\{ {\left\{ {w_{j} } \right\}_{j = 1, \ldots ,n} ,\left\{ {V_{j} } \right\}_{j = 1, \ldots ,n} } \right\}$$
(2)

where \(w_{j}\), is a weight of issue \(j\) and \(V_{j} = \left\{ {v( {x_{k}^{j} } )} \right\}_{{k = 1, \ldots ,n_{j} }}\) is the set of option ratings \(v( {x_{k}^{j} } )\) for all \(n_{j}\) options of issue \(j\).

Using a problem structure \(P\) and the scoring system \(S\) the salient negotiation offers (alternatives) can be built as the combinations (Cartesian products) of various resolution levels \(x_{k}^{j}\)—one for each negotiation issue, and evaluated. The details of the offers evaluation depend on the nuances of rating algorithm used. If we assume that the system \(S\) is normalized to [0, 100] range of values, \(\mathop \sum \nolimits_{j = 1}^{n} w_{j} = 100\) and \(\mathop {\hbox{max} }\nolimits_{k} v( {x_{k}^{j} }) = w_{j}\) for each \(j\), then the rating of any salient offer \(A\) is determined as a sum of ratings assigned to the options that comprise this offer

$$V\left( A \right) = \mathop \sum \limits_{j = 1}^{n} \mathop \sum \limits_{k = 1}^{{n_{j} }} z_{k}^{j} \left( A \right) \cdot v( {x_{k}^{j} } ),$$
(3)

where \(z_{k}^{j} \left( A \right)\) is a binary variable indicating if option \(x_{k}^{j}\) comprises the offer \(A\).

Note that the option ratings defined for the salient options allow to build the scoring function for the entire feasible range of the resolution levels within the quantitative issues. It is assumed that the ratings \(v( {x_{k}^{j} } )\) reflect the scores of the breakpoints of piecewise linear scoring function. Hence, the rating of any intermediate resolution level may be obtained by the linear interpolation of the ratings of two neighboring salient options (Goodwin and Wright 2004).

The scoring system may be used during the whole negotiation process to support various activities of negotiators. In the prenegotiation preparation phase, after the scoring system is built, the negotiator may use it for planning the concession strategy, i.e. the series of subsequent offers they would submit to the negotiation table during the bargaining phase (Morge and Mancarella 2009). In actual conduct of negotiation the scoring system allows to visualize the negotiation progress by means of the negotiation history graph or negotiation dance graph (Gettinger et al. 2012). The negotiation support systems (NSS) operate with scoring systems to facilitate the negotiators in constructing the offers in actual conduct of negotiation or generate the offers themselves (Kersten and Noronha 1999; Schoop et al. 2003; Wachowicz 2008). Finally the scoring systems of both negotiators may be applied to conduct a mutual symmetric analysis in the postnegotiation phase to verify the efficiency of the negotiated agreement, improve it or to suggest a fair bargaining solution if the parties was unable to reach it themselves (Brams 2003; Raiffa 1953). Therefore it seems to be extremely important to assure that the scoring system reflect the negotiator’s preferences accurately, otherwise the entire support offered to them will rather mislead them and result in false decisions than help to achieve the best possible agreement.

3.2 Scoring systems of principals and agents

The problem of using scoring systems to negotiation support seems not so trivial if we consider the specific negotiation context, in which the agent negotiates on behalf of his principal. In such a context there is a principal having his own preferences and goals, which form her own scoring system \(S^{P}\) that is not very often defined explicitly due to the principal’s lack of formal decision-making abilities or his cognitive limitations. Yet, the principal communicates his preferences to the agent, who builds his own scoring system \(S^{A}\) that will be used later to support her decisions during the negotiation process.

Communication may introduce errors so that the system \(S^{P}\) of scores, which describes principal’s preferences and system , which principal communicates to the agent are different. is the set of sentences, terms, graphics, etc., which principal uses to impart his preferences on agent. The relationship between \(S^{P}\) and allows us to distinguish two situations:

  1. 1.

    \(S^{P}\) =  , i.e., principal communicates his preferences using scores. All the agent needs to do is to assign \(S^{A}\) =  . A difference between the preference systems \(S^{P}\) and \(S^{A}\) suggest that agent is either sloppy or dishonest.

  2. 2.

    \(S^{P}\) ≠  , i.e., principal communicates his preferences using terms that do not include scores. This requires agent to interpret description in the way that she can construct \(S^{A}\). The difference between \(S^{P}\) and \(S^{A}\) depends on the precision in which is constructed and its interpretation. The fewer interpretations of are possible and the more accurately describes \(S^{P}\), the closer \(S^{A}\) is to \(S^{P}\). Thus, the difference between \(S^{P}\) and \(S^{A}\) may be due to the difficulty in understanding and interpreting . As in situation 1, the sloppiness and dishonesty may also play a role.

If we assume, following some recent work by Hendry (2002), that it is not uncommon for principals and agents to be honest but having limited competence and knowledge, the latter factor that may influence the differences between \(S^{P}\) and \(S^{A}\) may be ignored. Hence, the comparison of \(S^{P}\) and \(S^{A}\) can be used to:

  1. 1.

    Determine how different communication schemes influence the final concordance of \(S^{P}\) and \(S^{A}\).

  2. 2.

    Determine whether the agents’ individual characteristics cause that for some agents \(S^{P} = S^{A}\) but not for others; and

  3. 3.

    Assess the impact of the discrepancy between of \(S^{P}\) and \(S^{A}\) on the negotiation process and outcomes.

It is worth noting that despite the relationship between the principal’s preferences and the preferences, which the agent uses in her negotiation is important, it is analytically not well-examined. Kersten et al. (2010a) used an earlier Inspire dataset to compare the impact of preference impartation modes and analytical support on the negotiation outcomes. There were two modes of representation of the principal’s preferences: (1) verbal and graphical; and (2) verbal, graphical and numerical. The experimental results show that participants who received numerical representation and used analytical support aids achieved significantly better outcomes than participants who did not have access to analytical support aids. When numerical information about the principal’s preferences was not available, then analytical support aids made no difference for the negotiation outcomes. The authors did not directly compare the principal’s preferences with the preferences used by the agent. Instead, they “observed [that] differences between the elicited values and the principals’ values of agreements reached by subjects in T1 were normally distributed with mean 0 and an estimated standard deviation of 15.45 score points.” In the T1 treatment, the agents were given both verbal and graphical information about their principal’s preferences.

There are also few earlier works, in which the second issue of agents characteristics influencing the scoring system accuracy were examined. The agents’ differences in the negotiation profiles, determined by means of Thomas–Kilmann Conflict Mode Instrument (Kilmann and Thomas 1977) were studied but not binding conclusions could be drawn (Kersten et al. 2016). The correlation between the intensity of various conflict modes (such as accommodating, avoiding, compromising, collaborating and competitive) and the extent of the scoring system’s inaccuracy appeared to be very weak. The only significant outcome of the analysis was that the negotiators with intermediate level of compromising behavior appeared to build the more accurate scoring systems, than others. When linking the results with our other experiments we also found the differences in scoring system accuracy may be related to the role the agent play (buyer or seller in contract negotiation) or may be linked to the heuristic-based way of thinking presented by them during the prenegotiation preference elicitation stage.

Literature search show no studies that would extensively and comprehensively examined the three issues related to the comparison of differences in \(S^{P}\) and \(S^{A}\) as mentioned before. Having limited datasets that do not allow us to build many numerous clusters of agents with homogenous cultural or demographical characteristics that can be used to draw sound conclusions supported by statistical analysis, in this work we focus on the issues 1 and 3 only. We will try to use two different setups of mixed verbal and graphical communication schemes and analyze how accurate were the scoring systems the agents were able to build for the same negotiation problem. Then we will analyze the impact of inaccuracy between \(S^{P}\) and \(S^{A}\) on the agent’s perception of the negotiation process (interpreting the concession paths and the counterparts’ moves) as well as on the outcomes they obtain.

3.3 Bars and circles in visualizing the preferences

The use of charts in preference elicitation makes the process less cognitively demanding and simpler but it raises the question of the type of charts that should be employed. Studies on graph comprehension indicate a relationship between the types of graphs used, the graph’s purpose, and the users’ characteristics (Clark et al. 2004; McCrudden and Rapp 2017). There are many ways in which the preferences may be visualized (Miettinen 2014), however, two of them are considered as the classic ones: bars and circles. In the literature one can find some advice regarding the suitability of graphical elements for the representation of numerical quantities, however, there is no strict consensus which way of visualization is the best (Croxton and Stein 1932; Macdonald-Ross 1977; Spence and Lewandowsky 1991).

Over one hundred years ago Brinton (1917) showed that circles drawn on a linear (diameter) basis or on an area basis cause reader to misperceive the relative importance of the data. Circles compared on a diameter basis mislead the decision maker by causing him to overestimate the ratios, while on an area—to underestimate the ratios. In four other papers the authors concluded that the relative values of circles are misperceived and these mistakes are systematic (see in Macdonald-Ross 1977). Therefore they proposed a formula which allow “psychologically correct” circles to be constructed. The correction formula: Subjective area = Areaa proposed exponent relation between perceived areas of circles and their physical areas, indicating that people underestimate area as the size of the circles increases. The exponent a derived in the four studies varies from 0.8 to 0.91.

Croxton and Stein (1932), summing up the results experiments concerning graphic comparison of bars, squares, circles and cubes advised using bars rather than other graphical elements. They showed that comparisons based on the bar charts were more accurate than comparisons based upon circles or squares and comparisons based on the latter were more accurate than comparisons based on cubes. The relationships shown by the squares and circles used in their study were represented by the respective areas of the diagrams, while the relationships shown by the cubes were represented by their volumes.

Meihoefer (1973) noticed that the circles are useful map symbols despite that people are unable to discern small variations in circle sizes and have difficulties in making quantitatively accurate comparisons of the areas of different circle sizes. He also argued that differences between area-based circles were indeed underestimated but not in a systematic way. He proposed employing range-graded circles, varying in size according to an experimentally derived key and by incorporating an appropriate legend.

MacDonald-Ross (1977) described and identified the strengths and weaknesses of the variety of graphic formats for the presentation of numerical data such as graphs, tables, bar charts, pie charts, cartograms. He also recommended the use of bar charts in representation of preferences in spite of the pie charts, arguing that the judgment of bars involves only an assessment of length, whereas estimating the size of the pie slice a combination of area, angle and arc length. On the other side, Spence and Lewandowsky (1991) mentioned that the results of some experiments do not support the opinion that pie charts are inferior and “the analysis of psychophysical literature suggest that that the traditional prejudice against the pie chart is misguided”.

Hollands and Spence (1992) in two experiments considered the change and proportion when viewing graphs. They showed that the change was understood better by decision makers when line and bar graphs were offered to visualize the problem than when pie were implemented. The difference in accurate judgment of the change was larger when the rate of change was smaller. Surprisingly, when analyzing proportions the pie charts appeared to result in more errors again than line or bar graphs. This was, however, the situation in which the additional scale was used to depict the bars and pies. When no such an additional scale accompanying the graphics was implemented, pies appeared to be significantly better in judging the proportions than other visualization techniques. These conclusions were confirmed a decade later by Shah and Hoeffner (2002). They reviewed graph comprehension research and observe that to reduce graph interpretation errors visual features should evoke a particular fact or relationship. They also formulate specific suggestions, namely that bar graphs should be used for discrete data and pie charts for percent data and that multiple formats should be used because different displays make different information salient. The latter observation raises an important issue of the impact of different visualizations on the user’s preferences.

As we see, there is no consensus over the supremacy of bars or circles in accurate representation of data. Thus, in our study both of them will be used accompanied by the same verbal description of preferences. The remaining of the negotiation setup will be the same, therefore the visualization technique can be considered as the only influencing factor for the accuracy of the agent’s scoring systems \(S^{A}\) and the resulting correctness of understanding the negotiation process and outcomes. The only issue that still needs to be considered is how to measure such an accuracy between the agent’s and principal’s scoring systems.

4 Measuring the scoring system accuracy

To determine how the scoring systems of the principal and their agent are different from each other a formal measure needs to be formulated. Such a measure may be defined at a various degree of precision. One may say that the agent’s scoring system is accurate (similar or concordant) to the principal’s one if it correctly reflects the order of preferences in \(S^{P}\) defined by principal. For some more sophisticated analyses the cardinal comparison of scores will be required to measure whether the strength of principal’s preferences are correctly reflected in \(S^{A}\). Two basic approaches of statistical multivariate analysis seem to be most appropriate to measure the scoring system accuracy one for the ordinal and second for cardinal level. For analyzing the ordinal relationship between \(S^{P}\) and \(S^{A}\) the notion of matching index can be used, while for cardinal comparisons—the notion of distances may be applied. Some suggestions of such measures were described in details in our earlier works (see Kersten et al. 2016; Roszkowska and Wachowicz 2015b).

4.1 Ordinal accuracy

The simplest test for checking the general accuracy of \(S^{A}\) with respect to \(S^{P}\) is to verify, if the former preserves the rank order defined by the principal in \(S^{P}\) for all issues and options from \(P\). Note, that no comparison of corresponding ratings from \(S^{P}\) and \(S^{A}\) is conducted, but the relationship between the subsequent ratings in \(S^{P}\) is compared to the analogous ratings in \(S^{A}\) first for issue weights \(\left\{ {w_{j} } \right\}\) and then separately for each of \(n\) sets of option ratings \(V_{j}\). For instance, if there are four issues in \(P\): \(F_{1} , F_{2}\)\(F_{3}\) and \(F_{4}\) and the principal specifies their weights in \(S^{P}\) as: \(w_{1}^{P} > w_{2}^{P} = w_{3}^{P} > w_{4}^{P}\) it needs to be checked if the he agent’s individual scoring system \(S^{A}\) keeps this order, i.e. if \(w_{1}^{A} > w_{2}^{A} = w_{3}^{A} > w_{4}^{A}\). If so, we will call the issue weights in \(S^{A}\) to be ordinally accurate (concordant) with those in \(S^{P}\). Note, that the structure of the entire ranking is compared here, and the ordinal accuracy does not require the weights to be equal in both scoring systems, i.e. there is no requirement that \(w_{1}^{P} = w_{1}^{A} , w_{2}^{P} = w_{2}^{A}\), etc.

To verify the overall ordinal accuracy of the scoring system \(S^{A}\) with respect to \(S^{P}\) the ordinal accuracy index can be defined. It is represented by a ratio of the number of elements of \(P\) for which the rank orders in \(S^{P}\) and \(S^{A}\) are concordant (\(n^{\text{cor}}\)) to the number of all elements (\(n + 1\)) that are defined within the structure of the negotiation problem \(P\)

$$OA = \frac{{n^{cor} }}{n + 1}.$$
(4)

In each problem P the number of all elements is always equal to \(n + 1\) as there are \(n\) rankings that need to be built for \(n\) sets of salient options \(\left\{ {X_{j} } \right\}\) (one for each issue) and one extra ranking for the issue weights. If the ordinal accuracy index for the scoring system of agent \(S^{A}\) equals to 1 it means that it is in the total concordance with \(S^{P}\), i.e. the accuracy of \(S^{A}\) is full (100%) and it represents precisely the principal’s issue and option importance. Contrary, OA = 0 means that \(S^{A}\) does not represent the principal preferences correctly for any element in \(P\). An example of determining the ordinal accuracy for the entire scoring system in a negotiation problem with three issues is shown in Table 1. The shaded elements indicate the differences in \(S^{A}\) and \(S^{P}\) which make them discordant.

Table 1 Determining the OA index

Note, that measuring the ordinal accuracy by means of formula (4) resembles the general approach of verifying the responses to the test questions proposed by the Item Response Theory (Hambleton and Swaminathan 1985) and its use was studied more extensively in (Roszkowska and Wachowicz 2016). If higher granularity is required to measure ordinal accuracy of scoring system the formula (4) may be replaced with some notions of distance measuring. For instance, the Hamming or Jaccard distance can be used, which would allow to check the scoring system concordance using pairwise comparisons within each category of elements of the negotiation problem (Roszkowska et al. 2017).

It should be emphasized again, that the OA index checks the ordinal concordance at the disaggregated level, i.e. for the series of rankings of elements defined in \(P\), not for the rankings of potential salient feasible offers that can be built out of \(P\). Consequently, the agent with the scoring system that is fully ordinally accurate (OA = 1) may evaluate two offers \(A_{1}\) and \(A_{2}\) differently from their principal. It is because OA does not take into account the accumulation of cardinal errors (differences in ratings between \(S^{A}\) and \(S^{P}\)) that plays the role when the global score of an offer is measured additively according to formula (3). To capture the scale of cardinal errors the cardinal accuracy should be introduced.

4.2 Cardinal accuracy

Cardinal accuracy focuses on measuring the discrepancy in strengths of preferences represented in \(S^{P}\) and \(S^{A}\) (Roszkowska and Wachowicz 2015b). It measures the differences in ratings assigned by the agent and the principal to all the elements of the negotiation problem \(P\). There are, however, some dependencies in issue and option ratings defined for the scoring system normalized in a way described in Sect. 3.1 (i.e. \(\mathop {\hbox{max} }\nolimits_{k} v ( {x_{k}^{j} } ) = w_{j}\)). Therefore, the cardinal accuracy measure should be designed in a way that allows avoiding double counting of errors made for issue weights that are copied as the ratings of best options.

The cardinal inaccuracy in defining the issue weights can be easily measured using a multi-dimensional distance formula that determines the differences between the issue ratings in \(S^{P}\) and \(S^{A}\):

$$II = \mathop \sum \limits_{j = 1}^{n} \left| {w_{j}^{P} - w_{j}^{A} } \right|.$$
(5)

Using a taxi-cab distance metric in formula (5) allows us to keep an intuitive interpretation of such a measure, which is a total sum of rating points distributed among the weights in \(S^{A}\) differently than in \(S^{P}\).

Measuring inaccuracy of option ratings requires some additional considerations. We cannot simply add the differences in option ratings to \(II\), since they would be biased by any potential mistake made by the negotiators while setting up the issue ratings. However, we may compare if the general shapes of value functions for options of issue \(j\) defined in \(S^{P}\) and \(S^{A}\) are similar using the normalized option ratings and then quantify their value using the weight of this issue \(w_{j}^{P}\).

Let us consider a simple example, in which three options of an exemplary issue of “time of delivery” (tof) are evaluated: 7 days, 21 days and 60 days. The importance of this issue was declared by an agent as \(w_{tof}^{A} = 80\), while the principal rated it 50. Let us assume further that the principal assigned the option ratings to this issue in the following way: 50, 25 and 0 respectively. The agent defines now her preferences using different reference value, i.e. the weight she had assigned earlier, e.g.: 80, 40, 0. Having normalized the ratings to (0; 1) range in both systems we obtain: \(\bar{V}_{tof}^{P} = \left\{ {1;0.5;0} \right\}\) and \(\bar{V}_{tof}^{A} = \{ 1;0.5;0\}\). Thus, we have to conclude that the scoring systems \(S^{A}\) and \(S^{P}\) are the same with respect to the ratings of options of issue of time of delivery, and no inaccuracy occurs. Note, however, that the issue weights were different in \(S^{P}\) and \(S^{A}\), but they will be taken into consideration in formula (5) when determining \(II\) index.

However, if the agent’s subjective ratings were 80, 30, 0, and \(\bar{V}_{tof}^{A} = \{ 1;0.375;0\}\) the only inconsistency that should be taken into consideration when comparing it to \(S^{P}\) should be the one that occurred for the second option. The difference between the \(S^{P}\) and \(S^{A}\) for this option equals to \(\bar{v}^{P} \left( {{\hbox{``}}7 days{\hbox{''}}} \right) - \bar{v}^{A} \left( {{\hbox{``}}7 days{\hbox{''}}} \right) = 0.5 - 0.375 = 0.125\). This is a normalized difference and should be rescored using the weight of this issue \(w_{tof}^{P}\). Thus, the inaccuracy of evaluation of the option “7 days” is worth of \(50 \cdot 0.125 = 6.25\) rating points (in \(S^{P}\) the option score is \(50 \cdot 0.375 = 18.75\), while in \(S^{A}\): \(50 \cdot 0.5 = 25\)). This amount of 6.25 rating points should be considered then as the inaccuracy of option ratings for issue “time of delivery”.

Summarizing, the cardinal inaccuracy of option ratings for issue \(j\) will be determined as

$$OI_{j} = w_{j}^{P} \cdot \mathop \sum \limits_{k = 1}^{{n_{j} }} \left| {\bar{v}^{P} \left( {x_{k}^{j} } \right) - \bar{v}^{A} \left( {x_{k}^{j} } \right) } \right|.$$
(6)

Finally the cardinal inaccuracy index may be represented as a sum of \(n + 1\) summands. First summand is the cardinal inaccuracy of issue weights \(II\) given by formula (5), while the next \(n\) summands are the rescaled normalized distances between options ratings \(OI_{j}\) determined for each issue \(j = 1, \ldots ,n\) according to formula (6). The final form of the cardinal inaccuracy index is the following:

$$CI\left( {S^{P} , S^{A} } \right) = II + \mathop \sum \limits_{j = 1}^{n} OI_{j} .$$
(7)

5 Research approach

5.1 Research questions

As shown in the previous sections, the comprehensive decision support in negotiation requires a prior design of the scoring system that represent the preferences of the negotiating parties. Taking into account the support possibilities and their potential impact on the negotiator’s understanding of the negotiation process, evaluating the counterpart’s behaviour and undertaking own moves and actions; these scoring systems should reflect the parties preferences in most adequate way to avoid mistakes, misinterpretations and selecting inferior solutions. When agents negotiate on behalf of their principal the problem of reliable scoring system is even more crucial. The agents build the scoring systems that should represent the preferences of their principals, and hence not only they diligence, but also the way in which the preferences are communicated to them by their principals may play an important role in determining the accurate scoring systems.

To explore the problem of building the scoring system in principal-agent context of business negotiation processes we will try to answer a few questions based on the series of the web-based bilateral negotiation experiments. These questions are related to three issues raised in Sect. 3.2. The first question is general, and it is related to the negotiation approach that involves the formal mechanisms of negotiation support, i.e. the implementation of the decision support algorithm to elicit the negotiator’s preferences. According to the theory of negotiation analysis, this should help negotiators in better understanding of the negotiation problem, setting their goals and priorities (Raiffa et al. 2002). Yet we need to remember simultaneously about the potential role of the principal and the preferential information he imparts on the agent, especially the way he uses to visualize his preferences using different graphical techniques (Huber et al. 2002; Miettinen 2014). Hence, we ask:

  1. Q1:

    What is a fraction of agents (that use a formal preference elicitation method) being able to determine the scoring systems \(S^{A}\) accurately, i.e. in concordance to the preference information communicated by their principals?

  2. Q2:

    Does the way in which the principal communicate his preferences affect the accuracy of the scoring system \(S^{A}\) built by his agent?

To answer the question Q1, we determine both the ordinal accuracy indexe (according to formula (4)) and the cardinal inaccuracy indexe (formula (7)) for all agents in our experiments. We also analyze the separate categories of the negotiation template to find whether the the specificity of structures of preferences could affect the results. To answer question Q2, we compare average values of accuracy measures of scoring systems obtained by the agents in two experiments that differ in a way in which the preferential information of principal was visualized.

Analyzing the reliability of the scoring systems determined by the agents we would also like to discover, what are they capabilities in processing the preferential information provided by the principal. Agents may be able to process the principal preferences on the ordinal level, i.e. try to keep the rank order of preferences communicated by means of , but have some problems with understanding the differences in strength of preferences, i.e. assign correct cardinal ratings. This issue may be related to cognitive capabilities and thinking styles of the negotiators and is often raised when the use and usefulness of various MCDA techniques are considered, i.e. those that require simple ordinal information, such as the holistic ones, and those that operates at the disaggregated level (Kadziński et al. 2013; Kadziński and Tervonen 2013). Therefore, we ask:

  1. Q3:

    What is the relationship between the ordinal and cardinal accuracy of scoring systems \(S^{A}\) built by the agents?

To find such a relationship, the correlation between ordinal accuracy indexes and cardinal inaccuracy indexes can be measured.

Further research questions refer to the potential impact of using the scoring systems \(S^{A}\) determined by agents in the actual negotiation phase. Since using visualisation techniques in depicting the negotiation history by means of history graphs has been proved to be an efficient tool that helps negotiators in better understanding of the process and reduces the additional information that need to be exchange by the parties to clarify the current negotiation status (Weber et al. 2006) we would like to discover:

  1. Q4:

    What is the impact of the level of accuracy of scoring systems \(S^{A}\) on the interpretation of the negotiation history?

We may presume that the agents with the scoring systems of high inaccuracy may interpret the negotiation history in more false way than the agents with accurate scoring systems. Therefore we analyze the negotiation history graphs of the negotiating agents and classify the parties’ subsequent moves as concessions or reverse concessions both from principal and agent’s perspective (\(S^{P}\) vs. \(S^{A}\)). Then we will compare how many of such moves are interpreted by agents in the same way to how their principal does and whether this number is correlated with the ordinal accuracy and cardinal inaccuracy indexes.

Finally, we will analyze the results the agents obtain. There were many earlier studies that investigated an impact of using additional graphic information on the negotiation outcomes and outcomes (Gettinger et al. 2012; Kolodziej et al. 2016; Swaab et al. 2002), but they ignored the potential mediating effect that the scoring system quality may have on these outcomes. Hence, we would like to discover:

  1. Q5:

    What is the impact of the level of accuracy of scoring systems \(S^{A}\) on the negotiation outcome and its efficiency?

We will check what are the negotiation agreements negotiated by the agents with more and less accurate scoring systems and how they differ in terms of ratings from the viewpoints of the agent and the principal. We will also verify if these are the efficient solutions. For each cluster of scoring systems accuracy level we will measure the percentage rate of efficient agreements negotiated by the agents. The fairness of such agreements will be checked by means of the notion of Nash bargaining solution too, which is to conclude on the impact of imprecise scoring systems on the social aspects of the negotiation process.

5.2 Experimental setup

To verify the research questions two negotiation experiments were organized in Inspire negotiation system (Kersten and Noronha 1999), which is the first negotiation support systems designed for conducting bilateral multi-issue negotiations via Web. It has been widely used in many studies regarding, for instance, cross-cultural aspects of electronic negotiations (Koeszegi et al. 2004), the process of strategy formulation and communication (Wachowicz and Wu 2010), negotiators’ behavior and motivations (Kersten et al. 2010b); and decision aspects of negotiations (Vetschera 2007).

In two series of negotiation experiments organized in Inspire in 2015 and 2016 a bilateral negotiation case was implemented, in which a representative of musician (Fado) and a representative of broadcasting company (Mosico) discuss the terms of the potential contract. The participants were matched up in dyads and each of them was to play the role of the agent for one of the parties. The negotiation problem consisted of four issues, each with a predefined list of salient options (see Table 2). This makes the negotiation problem to be discrete. According to such a discrete structure of negotiation problem 240 various offers may be built and exchanged by the negotiators during the negotiation problem.

Table 2 Mosico-Fado negotiation template

Before the negotiation started, the participants read the case description. In the Mosico-Fado case each agent representing either the musician or the broadcasting company, was provided with public and private information regarding the case. The former one introduced to the negotiation situation, problem and context (i.e. the negotiation problem was defined). The latter one contained a detailed description of the principals’ preferences ( ) the agents should take into account while building their scoring systems. The preferential information was provided in two different ways: (1) as a verbal description of priorities, aspirations and reservation levels; (2) by means of graphical representation of these priorities. In both series of experiments the verbal description (1) was the same but the visualization (2) differed. In Study 1 the preferential information was visualized by means of circles, the sizes of which represented both the issue importance and the option values. In study 2—the bars were used instead. However, in both studies the verbal description was the same for Fado party. The examples of preferential information regarding the issue importance for Fado party with circles-based and bar-based visualization approaches are shown in Figs. 1 and 2 respectively. The full preference description and the bar-based visualization from Study 2 are shown in “Appendix”. Analogously the principal’s preferences were described for Study 1.

Fig. 1
figure 1

Verbal and circle-based description of preferences in Inspire

Fig. 2
figure 2

Verbal and bar-based description of preferences in Inspire

Having read the principal’s preference information the agents determined their individual scoring systems \(S^{A}\) using the direct rating mechanism described in Sect. 3.1. This was a two-step procedure. In Step 1 the agents distributed 100 rating points among all the negotiation issues to define their weights. And then, in step 2, they assigned ratings to options within each issue in a way that the best (most preferred) option received the maximum possible score equal to the issue weight, while the worst (least preferred)—score of 0. All the intermediate options obtain scores from between 0 and the issue weight. Based on the scoring systems defined this way the negotiation support was offered by Inspire during the actual negotiation and post-negotiation phases.

The reference scoring systems representing the principal’s preferences \(S^{P}\) in both studies were determined based on the graphical preference information, i.e. the circles and bars were measured and the results were normalized and multiplied by \(P = 100\) to obtain the scoring system analogous to the one used in Inspire. For Study 1 two reference systems were used: one determined for circles’ sizes (areas); the second—for circles’ radiuses. These will play role in analyzing the cardinal inaccuracy of scoring systems.

In both studies the students from Austria, Canada, China, Great Britain, Holland, Poland, Taiwan and Ukraine took part. In Study 1 there were 189 active negotiating dyads, while in study 2 as much as 173 negotiation instances were set up. After eliminating the incomplete records we were able to analyze the scoring systems of 176 representatives of the Mosico party and 174 representatives of the Fado party in Study 1, and 150 Mosicos and 161 Fados in Study 2.

The experiments were organized in form of asynchronous negotiations (no real-time bargaining was required) and lasted for 5 days each. The agents were told to negotiate best possible contract for their principals, yet the potential long term benefits resulting from the future relationship needed to be taken into consideration. Therefore, apart from the outcome, the agent’s effort was also evaluated, i.e. the diligence in prenegotiation activities (including the process of determining the scoring system) and the activeness and engagement in the negotiation process. For all participants the negotiation results (the outcome and effort) determined the final grade or part of the final grade they have obtained for the university course, within which they registered for the Inspire experiment.

To analyze the dataset the classic statistical tools were used. For the whole dataset or the identified clusters of agents the fraction test were used to compare the proportions under consideration. When the results for clusters or studies were analyzed the Mann–Whitney test was implemented to measure the significance in differences (distributions) of the average values of variables. For within-study analyzes (when principal’s and agent’s values were compared) the Wilcoxon test for depending samples was used. The significance threshold \(p\) = 0.05 was consequently used for all comparisons. For measureing differences in OA and CI values for entire dataset within each study the notion of stochastic dominance was used (first order stochastic dominance - FSD) and almost stochastic dominance (Levy 2009).

6 Results

6.1 Ordinal perspective on scoring system accuracy

To eliminate the potential impact of differences in verbal description of preferences on the scoring systems accuracy and the agents results we analyzed the data for Fado agents only, for which the only difference in preference impartation between the studies was a form of the visualization technique used (bars vs. circles). Analyzing the Inspire’s dataset for both studies we verified first the ordinal accuracy of agents’ scoring systems to answer the questions Q1 and Q2. Determining the scoring system to be accurate on the ordinal level does not require of agents any special decision making skills nor high cognitive capabilities. No sophisticated number sense is required to recognize one circle (bar) to be bigger than another, and assign a higher number to the item it represents. Therefore, we expected most of the agents to construct the scoring systems of high ordinal accuracy index. Surprisingly, the percentage of people having fully accurate scoring systems (\(OA\) = 1) was very low no matter which visualization technique was used and was equal to 21.8% and 22.4% in Study 1 and 2 respectively. Across both studies there was on average 7.1% of negotiators with totally inaccurate scoring systems (\(OA\) = 0). The structures of ordinal accuracy are shown in Fig. 3.

Fig. 3
figure 3

Histogram for indexes of ordinal accuracy in Study 1 and 2

The average \(OA\) indexes for Study 1 and 2 equal to 0.638 and 0.652, respectively, and this difference is not significant (in Mann–Whitney test \(p\) = 0.578). The stochastic dominance analysis of the cumulated distribution functions of \(OA\) for Study 1 and 2 does not allow to consider one distribution outperforming the other at the level of FSD, since the circles assure lower probability for the \(OA\) ∈ 〈0; 0.2) (see Fig. 4). However, this is a single range for which circles are better, and if the notion of almost stochastic dominance (AFSD) is used (see (Levy 2009)) we will find that bars dominate circles with respect to quality of ordinal fit at \(\varepsilon\) = 0.16. This means that the scale of violation of FSD between \(OA\) determined for agents in Study 1 and 2 equals 16%. In other words, 84% of the area between these two cumulative distributions (that indicate the scale of outperformance of one variable oven another) is in favor of bars. Since \(\varepsilon\) ≪ 0.5 we may conclude that the results obtained by agents in Study 2 almost dominate those from Study 1. What is more, if we removed the totally inaccurate agents with \(OA\) = 0 (as being outsiders, purposely misinterpreting the principal’s preferences) from our analysis, the cumulative distribution of \(OA\) values for bars would have outperformed the one for circles according to pure FSD rule, and we would have been allowed to conclude, that the bars help agents to map the principal’s preferences better at the ordinal level.

Fig. 4
figure 4

Cumulative distribution of \(OA\) values in Study 1 and 2

This does not change the fact that the fraction of ordinally inaccurate agents is really high. There is on average as much as 78% of negotiators in both studies that determine more or less inaccurate scoring systems (\(OA\)≠1). One may claim that the reason is case-specific, i.e. there are some nuances of preference structures that are too complicated for an average agent to map into the cardinal system correctly. Some conclusions on the potential reasons of this inaccuracy may be drawn after the individual rankings for each element of the negotiation template have been analyzed. The percentages of the ordinally accurate rankings for all elements of the negotiation problem P built by the agents in both studies are shown in Fig. 5.

Fig. 5
figure 5

Fractions of agents with correct rankings of the elements of negotiation problem

The data from Fig. 5 prove there are some elements of negotiation template that, regardless of the preference impartation scheme, are ranked correctly by 70–80% of the experiment participants. Such high ordinal accuracy is recorded for options of concerts, royalties and contract signing bonus at Fado party. For two other elements, i.e. for rakings of issue weights options of songs the accuracy rate is lower. The common feature of the categories with high accuracy fractions is that the preferences predefined for them are monotonically increasing or decreasing according to the order of appearance of the items in the negotiation problem description (see “Appendix”). For instance, the preferences for options of concerts decrease for Fado when moving from 5 to 8. For the categories with lower accuracy fractions the principal’s preferences are not monotonic, as it is for instance in case of the options for ‘number of songs’. Here, the preferences increase when moving from option 11 to 14, but then decrease when moving further to option 15. However, the most mistakes in correct representation of principal’s preferences is observed for issue ratings. Significantly fewer agents assigned them fully accurately (34.5% and 28% in Study 1 and 2 respectively). Some difficulties may be related to the preference structure—first two issues were declared by principal to be equally important (having the bars and circles of the same sizes). But there are also some technical issues related to the process of preference elicitation in both studies that could impact the scale of inaccuracy. While in the description of preference information the issues were listed and visualized from the most important to the least one (see Figs. 1 and 2), the scoring mechanism operated with changed order of last two issues in the list.

6.2 Ordinal and cardinal accuracy of scoring systems

Despite the scoring systems appeared in majority to be ordinally inaccurate (\(OA\) ≠ 1) one may ask if this inaccuracy is relevant when measured in terms of incorrectly allocated rating points (Q3). Such a scale of inaccuracy may be measured by means of cardinal inaccuracy index (\(CI\)) given by formula (7). Thus, we have determined \(CI\) indexes for both the studies, taking into account that in Study 1 the cardinal inaccuracy may be measured according to two alternative reference scoring systems, one that uses as reference the circles’ radiuses (\(CI_{R}\)) and second—the circles’ areas (\(CI_{A}\)).

The Pearson correlation coefficients determined for \(CI\) and \(OA\) values within the studies confirm that there is a significant and rather strong relationship between these two measures (the relationship indicated as negative since \(OA\) describes ordinal accuracy and \(CI\)—the cardinal inaccuracy)—see Table 3.

Table 3 Pearson’s coefficients and average values for \(CI\) indexes

The comparison between the studies now is not easy. The cardinal accuracy in Study 1 may be determined with respect to two reference systems, one that measures the circles’ radiuses and the second focused on their areas. Comparing average values we find that \(CI_{R}\) occurs better than \(CI_{B}\), yet the difference is not significant (\(p\) = 0.187). When \(CI_{B}\) is compared to the results measured for the circles areas (\(CI_{A}\)) the former appears to be significantly better (\(p\) < 0.001). The detailed analysis of cumulative distributions does not allow to consider neither of the visualization techniques to be better than another using the notion of FSD. In Fig. 6 (cut at \(CI\) = 200 for better readability) all the series cross each other.

Fig. 6
figure 6

Cumulative distributions for \(CI\) values in Study 1 and 2

However, if we analyze the curves we will see that bar based preference visualization assures higher probabilities of obtaining the scoring systems better than for circles only up to the \(CI\) ≈ 61 (when compared to radius-based results). Note that this time the lower \(CI\) the better, i.e. the higher cumulative distribution lies, the better. Hence, the notion of AFSD may be used again to consider, what is the scale of outperformance. For comparison between \(CI_{R}\) and \(CI_{B}\) we find \(\varepsilon_{{CI_{R} }}\) = 0.64, and for \(CI_{A}\)-to-\(CI_{B}\): \(\varepsilon_{{CI_{A} }}\) = 0.24. We see, inconclusively, that when we assume that the agents operate with circle’s radiuses, they can be considered as performing better than those operating with bars (\(\varepsilon_{{CI_{R} }}\) > 0.5 so the curve for \(CI_{R}\) almost dominates the one for \(CI_{B}\) but the scale of violation of FSD is equal to 0.46). However, if the agents were about to operate with areas, it seems that agents with bars almost outperforms those operating with circles in the cardinal accuracy (and the FSD violation is only 24%).

Being unable to find the global outperformance of any visualization technique we decided to conduct the cluster analysis. We divided them first into two groups of negotiators: (1) totally accurate in ordinal terms (\(OA\) = 1) (38 agents in Study 1 and 36 in Study 2); and (2) ordinally inaccurate (\(OA\)≠1) ones (136 and 125 agents in Study 1 and 2 respectively). The results are shown in Table 4.

Table 4 Agents’ average cardinal inaccuracy (\(CI\)) for Study 1 and Study 2

What we found is that the scoring systems of ordinally accurate agents are more concordant with the preferential information of their principals at the cardinal basis than the scoring systems of those who made at least one mistake in rankings. The \(CI\) indexes for agents with \(OA\) = 1 are significantly smaller than those with \(OA\)≠1 (Mann–Whitney test with \(p\) < 0.001) across for both studies (Table 4). What is also interesting, the \(OA\) = 1 agents appeared to be cardinally more accurate in Study 2, i.e. when the bar-based visualization technique was used, than in Study 1. The differences between average \(CI_{R}\) and \(CI_{B}\), and \(CI_{A}\) and \(CI_{B}\) are all significant (\(p\) < 0.001). The differences between the studies for ordinally inaccurate agents appear insignificant, when radius-based reference scoring system is used (\(p\) = 0.915), hence no supremacy of circles over bars in better representation of principals may be concluded. However, when for Study 1 the results are measured using area-based reference scoring system, the ordinal inaccuracy appears significantly better for Study 2, than for Study 1.

Having identified the significance of differences in the cardinal accuracy at the most aggregated level, we analyzed the trends in accuracy for subsequent \(OA\) values. The results are shown in Fig. 7.

Fig. 7
figure 7

Average \(CI\) values for various classes of agents’ ordinal accuracy (\(OA\))

The trends we can observe confirm our intuition on the relationship between the mistakes made by agents in rankings and the ones made by them in ratings. The more accurate the agents are on the ordinal basis (the higher the \(OA\) values are) the more accurate they are in assigning the ratings that follow the preferential information (the smaller the \(CI\) values are). The results are the same no matter which reference scoring system is used and confirm the relatively high Pearson correlation coefficients (Table 3).

These trends, however, do not differ in \(CI\) values significantly (in Mann–Whitney test). For instance, there difference between \(CI_{B}\) and \(CI_{R}\) are analyzed in clusters the former occurs significantly better for the cluster \(OA\) = 1 only (\(p\) < 0.001), but not for \(OA\) = 0.8 (\(p\) = 0.298). On the other hand it is significantly worse for \(OA\) = 0.6 (\(p\) = 0.034) and \(OA\) = 0.2 (\(p\) = 0.015). When \(CI_{A}\) and \(CI_{B}\) are compared in clusters, the latter becomes significantly better for \(OA\) equal to 1 and 0.8 (\(p\) = 0.001). For all remaining clusters the differences are not significant (\(p\) > 0.089).

Since such a detailed analysis across the whole range of \(OA\) values may be confusing in producing sound conclusions at the later stages of our study, we cluster the agents with respect to \(OA\) into three classes in which the differences appear evident (yet, in some situations not significant). We obtain then three groups of negotiators with: (1) high ordinal accuracy (\(OA = \left\{ {1.0, 0.8} \right\}\)), for which the dominance agents form Study 2 perform better than those from Study 1 (87 agents in Study 1 and 89 in Study 2); (2) medium ordinal accuracy (\(OA = \left\{ {0.6, 0.4} \right\}\)) where both visualization techniques result in similar accuracy (55 and 44 agents in Study 1 and 2 respectively); and (3) low ordinal accuracy (\(OA = \left\{ {0.2, 0.0} \right\}\)), for which circles occur to assure higher cardinal accuracy than bars (32 and 28 agents in Study 1 and 2 respectively). The comparison of the results for these three clusters is show in Table 5.

Table 5 Average cardinal inaccuracy (\(CI\)) for three clusters of ordinally accurate agents

These clusters allow to differentiate between such agents whose scoring systems differ in the cardinal inaccuracy with a required significance \((p\) < 0.001 in Mann–Whitney test). We will use them in further analysis of the consequences of the inaccurate scoring systems on the negotiation process and outcomes. Yet, no matter if the original range of \(OA\) indexes is used or the clusters are applied, our analyses show that there is a positive relationships between the negotiator’s ordinal accuracy and cardinal accuracy in building negotiation scoring system (see Table 3).

What is worth noting, that in Study 1 building the cardinal scoring systems based on circles’ radiuses leads to better accuracy in all clusters of ordinally accurate agents than when circles’ areas are compared. This may indicate the problems the agents have with analyzing circles on the two-dimensional basis.

6.3 Scoring systems accuracy and the interpretation of negotiation history

Since the agents’ scoring systems are, on average, quite inaccurate, it seems interesting to verify if they allow the agents to interpret the negotiation process properly (Q4). The negotiation process is represented by means of the negotiation history graph, which consists of the concession paths of the negotiator and his counterpart. While analyzing the subsequent offers in these paths three types of negotiation moves may be identified. If agent’s (\(N\)) own concession path is analyzed a move is considered to be a concession if the rating of the offer \(A_{t + 1}^{N}\) sent by him in round \(t + 1\) is lower than the rating of the offer \(A_{t}^{N}\) submitted by him in previous round \(t\), i.e. \(V\left( {A_{t}^{N} } \right) > V\left( {A_{t + 1}^{N} } \right)\). We consider a move to be a reverse concession if the situation is the opposite, i.e. \(V\left( {A_{t}^{N} } \right) < V\left( {A_{t + 1}^{N} } \right)\). No concession occurs if \(V\left( {A_{t}^{N} } \right) = V\left( {A_{t + 1}^{N} } \right)\). If the counterpart’s (\(C\)) concession path is analyzed, the negotiator’s interpretation of moves is the opposite. The counterpart’s move is considered to be a concession if his offer \(A_{k}^{C}\) submitted in round \(k\) is rated by the negotiator lower than the offer \(A_{k + 1}^{C}\) sent by the counterpart in round \(k + 1\), i.e. \(V\left( {A_{k}^{C} } \right) < V\left( {A_{k + 1}^{C} } \right)\). The other two moves are defined analogously.

To analyze if the agent interprets the negotiation process correctly will be required to depict both the concession paths twofold: (1) by using the agent’s own scoring system \(S^{A}\), and (2) using the prinncipal’s one \(S^{P}\). The correct interpretation of each move by an agent requires it was interpreted the same by his principal. For instance, if agent’s move is considered by his principal as a concession according to his \(S^{P}\) (i.e. \(V^{P} \left( {A_{t}^{N} } \right) > V^{P} \left( {A_{t + 1}^{N} } \right)\)) it should be also recognized as a concession by the agent’s in his \(S^{A}\) (i.e. \(V^{A} \left( {A_{t}^{N} } \right) > V^{A} \left( {A_{t + 1}^{N} } \right)\)). The general correctness of interpretation of the whole negotiation process by each agent will be determined as the fraction of moves recognized by him (according to \(S^{A}\)) in the same way as his principal did (according to \(S^{P}\)). An example of such analysis is shown in Fig. 8, where the agent’s concession path is depicted by means of his individual rating system (\(S^{A} )\) as well as the principal’s one (\(S^{P} )\).

Fig. 8
figure 8

Concession paths interpreted using \(S^{P}\) and \(S^{A}\)

There are five moves that may be identified based on six subsequent offers of the agent. According to the principal’s scoring system \(S^{P}\) we consider first move (the difference between offer 2 and 1) as a concession. Similarly, the second, fourth and fifth moves are considered alike. There is only third move which is considered as no-concession. However, if the concession process is analyzed from the viewpoint of agent’s individual scoring system \(S^{A}\) it is interpreted differently. First two moves are correctly read as concessions, but the third one is considered by the agent as a concession while it was no-concession for his principal. Similarly the fourth move is interpreted incorrectly, considered to be a reverse-concession while a concession was made according to \(S^{P}\). Hence, the general correctness of interpreting this part of negotiation history is equal to 60% (three moves out of five are read properly). To determine the global general correctness of the negotiators the results of a similar analysis of the counterpart’s concession paths should be added.

The results of the analysis of correct interpretation of negotiation history by agents for our experiment is shown in Table 6. Note again, that for Study 1 the right history interpretation was determined using two reference systems: (1) the one based on circles’ radiuses (\(S_{R}^{P}\)); (2) and the one based on circles’ areas (\(S_{A}^{P}\)).

Table 6 Average fraction of correct interpretation of negotiation moves

The results show that the more accurate scoring systems \(S^{A}\) are, the more correctly the agent interpret the negotiation process. The average percentage rates of correctly interpreted moves made during the whole negotiation process both by their own and their counterparts are highest for the negotiators from the class of high ordinal accuracy (\(OA = \left\{ {0.8;\, 1} \right\}\)). The best result is observed in Study 1 where the reference scoring system based on circles’ area is used. In this case on average as much as 93% of all moves (concessions, reverse concessions or no-concessions) was interpreted properly by the most accurate group of negotiators, while those from the second class (i.e. \(OA = \left\{ {0.4;\, 0.6} \right\}\))) was able to interpret only 83% of the negotiation process correctly. Similarly, the negotiators from the second class of ordinal accuracy were always better in proper understanding of concessions than those form the las class of \(OA = \left\{ {0.0; \,0.2} \right\}\). The differences between groups are statistically significant (fraction test with \(p\) < 0.05) also for \(S_{R}^{P}\) and Study 2, except the difference between average percentage rates for high and medium ordinally accurate Fados in Study 2 equal to 86.4% and 81.9% respectively (\(p\) = 0.2). The results do not differ significantly between the studies except one situation when the cluster of most accurate agents from Study 1 (with \(S_{A}^{P}\)) and Study 2 are compared. Here the fraction test confirms the significance of the difference (\(p\) < 0.05).

Apart from analyzing the moves as such, a detailed investigation of the scale of differences in perceiving the concession paths may be conducted. The most intuitive way of measuring such differences between the “objective” path determined according to \(S^{P}\), and the agent’s subjective one that is based on \(S^{A}\) is by comparing the ratings of offers, which comprise both these paths. The absolute differences between the ratings for each offers sent or obtained may be determined and an average difference in offers evaluations may be used as a scalar measure of misinterpretation of correct concession path. For the paths shown in Fig. 8 the subsequent differences in evaluation are: 0, 5, 10, 5, 5 and 5. Hence, an average path misinterpretation will be equal to 5. The results of such quantitative analysis of paths’ misinterpretations in our experiments are shown in Table 7.

Table 7 Average cardinal misinterpretation of concession paths

The average difference in proper interpretation of offers’ ratings during the whole negotiation process increases for subsequent groups of ordinal accuracy for both studies. The differences between the groups are significant (\(p\) < 0.05 in Mann–Whitney test). What is more important, that the scale of misinterpretation is really big. Most precise agents in Study 2 are two times more accurate in the interpretation than the second group of students of medium ordinal accuracy. If we exclude from the comparison the extreme offer which results in the maximum possible rating 100 (which is in vast majority of cases evaluated correctly no matter how inaccurate the individual scoring system is built), and assume, that the problem of interpretation starts after the first concession and applies to offers starting from 90 or less, the scale of misinterpretation for less accurate negotiators (\(OA\) < 0.8) for each offer is on average equal to 10% of its value or more. Taking as an example the average rating of an offer in the negotiation process in Study 2 for Fado group with \(OA = \left\{ {0.4; \,0.6} \right\}\), which was equal to 68.44, the scale of misinterpretation is on average 13% of the offer value. If we add to it the findings on the false interpretation of moves in the groups of lower ordinal accuracy, which equals to 19% or more, the scale of misinterpretation of offers submitted during the bargaining phase occurs definitely not so marginal.

6.4 Negotiation outcomes and their efficiency

The scale of inaccuracy in interpreting the negotiation process may suggest the potential problems with evaluating the final negotiation agreement and, consequently, in the quality of the final compromise. Therefore we analyzed the outcomes the parties obtained to verify the potential differences depending on the classes of the agents’ ordinal accuracy and preference visualization technique used. The compromise was reached by 143 of Fado agents in Study 1 (out of 174 in experiment) and 126 in Study 2 (out of 161). The results of the comparisons of the agreements evaluated according to the agents’ individual scoring systems \(S^{A}\) and the principal’s ones \(S^{P}\) for Studies 1 and 2 are presented in Table 8.

Table 8 Average ratings of negotiation agreements in Study 1 and 2

Having in mind the differences in interpretation of negotiation process discovered in previous subsection, one may expect that the negotiation results will also significantly differ among the groups of negotiators, and those with more accurate scoring system will better interpret the value of negotiation compromise. Analyzing the three classes of negotiators in our two studies we can confirm the previous findings that highly accurate negotiators negotiate better outcomes objectively (i.e. when \(S^{P}\) is used to measure the ratings) and more accurately interpret a real value of this outcome. For instance, those from class \(OA = \left\{ {0.8; \,1} \right\}\) are better than those from other classes regardless of the study and reference system. However, from the viewpoint of their principals the differences between the subsequent clusters are not significant (\(p\) > 0.060 in Mann–Whitney test) except the one between medium and low class when \(S_{A}^{P}\) is used in Study 1.

If we analyze the agents performance from their own viewpoints (i.e. using \(S^{A}\)), we will find, that those from Study 1 performed significantly better than those from Study 2 for two most accurate classes (\(p\) < 0.009 in Mann–Whitney test). However, when the corresponding results are compared for \(S_{*}^{P}\) in Study 1 and 2, we will find that there are no significant differences between the results obtained by the agents for their principals (\(p\) > 0.170 in Mann–Whitney test). As the results we find here, that agents in Study 1 consequently overestimate the results they obtain and the differences of what they claim they negotiated to what they principals see are significant for accuracy clusters (\(p\) < 0.05 in Wilcoxon test), but one when \(S^{A}\)-to-\(S_{R}^{P}\) is compared in Study 1 (\(p\) = 0.138). For Study 2 all over or underestimations revealed are highly insignificant (\(p\) from 0.110 to 0.915).

Despite the average values of the compromises obtained by agents do not differ significantly across the classes, the packages they negotiated may be different. Therefore we verified what were the offers most often chosen as the negotiation agreements, whether their structures differed for various level of scoring system accuracy and what were their efficiency and fairness.

There were 39 different packages selected as the negotiation agreements in Study 1 and 31 in Study 2. There were, however, only 5 packages that was used in 50% of all agreements in Study 1 (72 instances) and the remaining 34 was utilized by another 71 negotiating dyads. The same five packages comprised a group of 59.5% agreements in Study 2 (75 instances). Naturally, the question of the efficiency of these agreements appears, as well as an issue of the their balance and fairness. The percentage rates of the efficient agreements within each class of negotiators accuracy is shown in Table 9.

Table 9 Average percentage rate of efficient agreements

Based on the results obtained we can confirm a general tendency in efficiency of negotiation agreements for negotiators from different classes of accuracy and for studies. In Study 1 when radius-based principal’s scoring system \(S_{R}^{P}\) is used as a reference one, we may observe that both for Fado and Mosico agents the efficiency increases for higher accuracy classes. This tendency, is not confirmed only for high and medium accuracy classes when \(S_{A}^{P}\) is used as a reference scoring system. Here the agents from second class seem to negotiate the efficient outcomes more frequently than those of highest accuracy. Yet, the results for Study 2 are similar to those for \(S_{R}^{P}\). That what we can observe is the fact that for circle-based visualization the percentage of efficient agreements is, excluding this one exception mentioned above, significantly higher than for agents with bar-based visualization. Unfortunately, due to small sample sizes none of these differences can be regarded as significant in proportion test (the highest Chi square value for the compared fractions is equal to 0.381).

The final issue we investigated was if the accurate scoring systems of agents may have an impact on the fairness of negotiated agreements. In our research we applied the notion of Nash bargaining solution (Nash 1950). For each study and reference scoring system we determined the fair solution for status quo point equal to (0, 0) and then measured the distances between the agreements achieved by agents from different classes of accuracy. The distance was measured by means of \(L_{1}\) taxi-cab metric, which is easy to interpret as a sum of differences in ratings for both the negotiators. Each class was represented then by an agreements average distance from Nash solution. The results are shown in Table 10.

Table 10 Average distances between the negotiated agreements and Nash fair solutions

Not a clear pattern of dependencies can be discovered looking at the results above. For both studies the average distances between Nash solution and the agreements negotiated by agents from first class of accuracy is smaller than within the last class of lowest accuracy. However, similar relation cannot be observed for most accurate and medium accurate agents in Study 1. There is only one result that differs significantly from others—it is the average distance to Nash fair solution for most accurate agents in Study 2 (\(p\) < 0.001 in Mann–Whitney test). Once again, the performance of the most accurate agents who operate with bar-based scoring system occurs better that the performance of others (see results from Tables 5 and 7).

7 Discussion

The results we obtained in our study are not unequivocal, but they allow to formulate the answers for the research questions asked and formulate some prescriptive recommendations regarding the usage of bars and circles in visualization of principal’s preferences.

Our first observation derived from Sect. 5.1 regards the common problem with adequate representation of principal’ preferences even at the ordinal level. From the detailed analysis of errors made we found that the agents have the biggest problems with correct scoring of issues and options with non-monotonic preferences. In such situation, bar-based visualization seems to assure somewhat better results, yet the difference is not significant (as for ratings of issue of “songs”). However, when some additional problems or frictions occur, such as unintended change of order of issues to score in decision support protocol when compared to an order in preferential information provided by the principals, the bar-based visualization does not perform better. But the problems with adequate rating of issues or options in such situations seem not to be linked to the visualization scheme, but rather with some cognitive errors or heuristics that may occur. Here, probably, the bounded awareness played a role and, in particular, the agents were prone to act according to change blindness rule, not noticing that the issues were presented to them in different order (Chugh and Bazerman 2007).

Note, that coping with these two issues could help to increase the ordinal accuracy significantly. Other options for which standard monotonous preferences were declared had an average fraction of 75% of agents who were able to rate them in ordinally correct way. Since the non-monotonic preferences are quite common in very many decision problems, some mechanism of additional support should be designed to increase the agents focus and initiate more analytic way of information processing. Maybe some hybrid mechanisms that use the pair-wise comparisons could be helpful in assuring such an ordinal accuracy, or the visualization mechanism should be implemented that would depict the numerical preferences defined individually by agents in a graphical way.

Summarizing the results obtained from the ordinal analyses we need to conclude, that the fraction of agents that were able to define the scoring systems in accordance to the principal’s preference information was surprisingly low (Q1), more than three quarters of agents made at least one mistake while mapping the ordinal structure of preferences of their principal into the numerical scoring system. However, summarizing the results at the ordinal level we may partially confirm our concerns addressed in question Q2. The preference visualization used in our two studies seems to affect the accuracy of scoring systems built by agents differently. Generally, the visualization systems based on bars allow to obtain the scoring systems with slightly better accuracy, than those based on circles This was confirmed by AFSD with particularly low index of violation, and only for those least inaccurate agents this statement does not hold strong (i.e. in the sense of FSD).

We learned a little more about the scale of the agents’ inaccuracy when the cardinal inaccuracy index was determined. This, simultaneously, shed lights on the problem with interpreting circles by agents. When we analyse the cardinal performance of agents in Study 1, we need to refer to two different concepts that can be used in understanding the circles. Some agents may consider them as two dimensional figures and focus on their areas when assigning the ratings, while some other may simply compare their heights, i.e. the radiuses/diameters (Macdonald-Ross 1977). Therefore the comparison of the bars and circles needs to take into account these two ways of interpretation. The simple correlation analysis show that cardinal and ordinal accuracy are strongly and significantly dependent (Q3). The first recommendation that can be drawn from this finding is that to assure good quality of the agents’ scoring systems at the cardinal level (good representation of strength of preferences) first the ordinal accuracy needs to be preserved, no matter which preference visualization technique is used. Again, some additional mechanisms should be implemented in the prenegotiation protocol to assure the increase in ordinal accuracy.

When the details of the cardinal performance are analysed it seems that the bars are a little better in reflecting the nuances in principal’s strength of preferences than circles. The average cardinal accuracy for all agents in both studies show that even if we instruct (or would be sure) that the agent are looking at the circles from the one dimensional perspective (comparing the radiuses) the average cardinal accuracy is not significantly better than for the agents who were instructed by bars. If we assume that the agents interpret the areas of circles, the supremacy of bars is evident. What is more, for those negotiators who are diligent enough to analyze the principals’ preferences correctly on the ordinal basis the supremacy is nearly twice as good as when circle areas would be interpreted. For those with mistakes (OA < 1) the situation is similar, however the relationship in not strict. If we assume that the agents in Study 1 were focused on the radiuses, the bars do not occur significantly worse; when the focus was put on the areas—bars are again significantly better.

This nuances in results of cardinal performance of circles and bars show that when using circles to visualize their preferences the principal should additionally explain what where the grounds and assumptions for drawing such circles. This would avoid misinterpreting the preferences by agents the scale of which (differences in performance measured for radiuses and areas) can be really big (10 rating point, on average, as shown in Table 5). On the other hand, it seems interesting, that using circle radiuses/diameters may produce better accuracy than for bars. We do not have enough data to discover the potential reasons, but it maybe the positive effect of friction (Hu 2005), i.e. comparing two bars seems cognitively easy so the agent pay no attention to measure them precisely, while comparing two diameters requires some effort (the circles overlapped each other).

The second part of our study was focused on analizing the impact of the scoring system accuracy (and indirectly the preference visualization scheme) on the correct interpretation of the negotiation process and outcomes. We found that the qualitative (fractions) and quantitative (average misinterpretation) measures of interpretation correctness increase when the ordinal accuracy increases. This allows to answer the question Q4 univocally positive. Again, the bipolar ways of interpreting the circles makes some problems with interpretation of the results directly between the studies. If we compare both visualization techniques, differences in the qualitative results in fractions should be considered as insignificant if the circle radiuses are used. It is not confirmed for circle areas because of the higher fraction for most accurate agents. However, if the quantitative cardinal measure is used, it is much in favour of bar-based scoring systems. This is an interesting result, that together with earlier finding comprises a useful recommendation for the successful representation of principal’s preferences by agents. To be sure that the agents most accurately represent your preferences and interpret the negotiation process the principals need to visualize their preferences by means of bars and assure that no more than one ordinal mistake is fulfilled while determining the scoring system. Note, that despite such agents may misinterpret the subsequent moves from concession paths a little more frequently, yet the scale of the misinterpretation would be significantly lower.

The differences in misinterpretations of negotiation moves observed for subsequent classes of ordinally accurate agents are noticed for the entire bargaining phase except the final agreement. It appeared that the final compromises negotiated by agents differ only slightly between the subsequent classes, though, those most accurate obtain significantly better compromises than those least accurate ones. It is worth noting, however, that the other factors may play the role in affecting the final compromise, since it is always a solution elaborated by both parties, and even an agent with perfect scoring system may receive poorer result when negotiating with tough and aggressive counterpart. That what is interesting is the influence of the visualization technique on the misperception of the agreements by agents. When looking at the results one may claim that the agents who operated with circle-based scoring systems negotiated better compromises than their colleagues from Study 2, which may made them feel more satisfied from their performance. But when their results are confronted with the true value of the compromise for their principals, the agents in Study 2 occur more accurate in real evaluation of the agreements (even those from the medium and low ordinal accuracy class). This does not match well to the results obtained for the interpretation of the concession paths, where misinterpretation of the subsequent offers differs among the classes. But it may suggest that the inaccuracy of option evaluation differ depending on the options themselves. In other words, the agents may not pay attention to assigning adequate ratings to the options that they feel unrealistic in constituting the final compromise, but put enough effort to score accurately those of them, that consider probable to comprise the agreement.

The efficiency of the agreements negotiated by the agents from different accuracy clusters seems rather average. In the most optimistic situation agents operating with their own scoring system negotiate efficient agreement in at most two-third of negotiation processes. The agents using bars seems to perform here worse than their colleagues provided with circle-based visualization for all accuracy classes. This is, however, not a pessimistic result, since it shows that there is still a room for improving the outcomes they negotiated themselves in the post-negotiation phase. Consequently, if offered any post-negotiation improving mechanism, they would potentially end with better results for their principals, than they initially elaborated. When the fairness is considered the agents seem not to differ neither for clusters nor for studies. There is, however, one exception. Again, the agents from the highest accuracy class and those provided with bar-based scoring systems are able to negotiate significantly more fair solutions than others. If we add this to our previous findings that they were also able to construct the scoring systems of significantly lower cardinal inaccuracy and to most accurately interpret the concession paths and negotiation history we may formulate quite a sound prescription on how to assure adequate principal representation by the agents: bar-based visualization and additional facilitating algorithms that allow to eliminate as many ordinal mistakes as possible would maximize the number of agents that can find in this most promising cluster.

Summarizing the results of the outcomes to answer our question Q5 we may say that the accuracy of the scoring systems determined by the agents impacts the result they obtain, both in the virtual quality perceived by themselves and by their principals, yet the significant differences are observed only for the extreme classes of the most and least accurate ones. Yet there are no clear and evident differences when the outcomes are analyzed in a symmetric way, i.e. the efficiency and fairness start to play the role.

8 Conclusions and future work

The main goal of our study was to check how different ways, in which the principals impart their preferences on their agents affect the accuracy of the scoring system the agents build, and whether such systems can provide a reliable support during the negotiation process allowing to interpret the negotiation moves and outcomes correctly. We aimed at investigating if the inaccurate scoring systems may misinform their agents on the negotiation process and offers valuation to such extent that they negotiate worse, inefficient and unfair outcomes. While discussing the results in previous section we were able to answer all our research questions and formulate final conclusions. Bars help the agents to capture the principals preferences slightly better than circles and determine the scoring systems that are more ordinally accurate, and this affect further the cardinal accuracy, interpretation of negotiation process but only slightly affects the negotiation outcomes. There is however one cluster of agents for which we notified significantly better results than for others, i.e. the cluster of most accurate bar-supported agents. As noted previously in Sect. 7, this finding can be considered as a pragmatic recommendation in designing efficient and reliable decision support mechanisms for representing negotiation, which should operate with bar-based visualization and implement the mechanisms that increase the control of the scale of ordinal accuracy by agents.

However, the final conclusions we draw need to be considered in a specific experimental context we used. First, in the Inspire negotiation system, the decision support mechanism we used to determine the scoring systems implements the direct rating (SMARTS-like) approach. Consequently, the agents had to define they preferences directly, by means of quantitative ratings and at the disaggregated level of the negotiation problem. The question arises, if some problems with accurate definition of the preferences is not related to this particular way of defining them and to cognitive capabilities of agents and cognitive demand of the mechanism. It may occur that by implementing another preference elicitation protocol that operates with different MCDA approach the agents accuracy would change. Our ongoing research focused on using the holistic approach based on modified UTA and MARS methods shows that the changes may occur (Roszkowska et al. 2017)

Furthermore, it seems that the support approach we used did not eliminated the syndrome of fast thinking (Kahneman 2011), and some heuristics occurred while determining the scoring systems by the agent. As noted before, they simply did not notice that an order of issue listed to be scored did not match the order of them used in the description of preferences provided by the principal in the prenegotiation preparation. This made the fraction of accurate ratings to drop significantly (it was twice as low as for the case of non-monotonic options for issue of songs). Maybe the problem results from the negotiators’ disability to following the analytical scheme of problem decomposition and, again, they would prefer to use other mechanism based on preference disaggregation (see Bous et al. 2010; Górecka et al. 2014; Siskos et al. 2005).

Note further, that in our analysis of the scoring systems’ accuracy we did not verified how much the participants were focused on the verbal and how much on the graphical preferential information. It may be that they were focused on the verbal information and their inaccuracy result from the individual interpretation of not the circle or bar sizes, but simply such imprecise statements as “significantly more important”, “nearly as important as” or others. And even if they focus only on the graphics, we do not know (no such data was collected in the negotiation transcript and post-questionnaires) how did they interpret the circles in Study 1. The results clearly show, that using radius- (diameter) or area-based perspective change the interpretation of agents accuracy by the principal.

Finally, we observed that visualization impacts the quality of scoring systems and this quality impacts the quality of interpretation of negotiation moves. However, when the data are compared for the accuracy clusters and the interpretation quality of moves between the studies the relationship does not look evident (Tables 6 and 7). It may imply that there may be some indirect relationships among the factors that we analyzed in clusters and the quality of the scoring systems can be considered as a mediator between the visualization schemes and the correctness of the interpretation of the negotiation process. To confirm that, some advanced models of structural equation modelling are required, as well as more ample dataset of Inspire negotiations.

Taking the above into account, our future research in investigating the impact of preference impartation scheme on scoring system quality and negotiation outcomes will be broaden of the measuring the cognitive, decision making and negotiation skills of the negotiators, e.g. by means of such psychometric tests as REI (Handley et al. 2000) or TKI (Kilmann and Thomas 1977), and to verify how much they influence the perception of preferential information and ability (and necessity) of using the decision support tools in negotiation, which consequently may affect the willingness in using slow thinking mode during the process of building the scoring system and increase its accuracy. It would allow us to identify the potential requirements the decision support and tools should fulfill in their technical and graphical layers as well as in their general approach to preference elicitation, to meet the agents’ cognitive capabilities as well as their technical skills and the number sense. Only by being intuitive, easy to use and technically interesting the negotiation support may encourage the negotiators to thorough analysis of the preferences and their adequate mapping into a formal rating system. It is important, since as the results of our experiment prove, only the most accurate negotiators perform significantly better than others and are able to track correctly the negotiation process and interpreted appropriately the negotiation moves of the counterparts as well as the real value of the offers and outcomes.