Incentive and nudge design for human behavioural change

This paper addresses the modelling and management problems for human behavioural change, in particular, aiming reduced congestion. First, the behavioural change driven by incentivizing and informational nudging is modelled reflecting the tendency of a human group. Next, an incentive- and nudge-based optimal management problem is formulated, where the model of behavioural change is taken into account. Finally, the effectiveness of the incentive and nudge design is verified in a numerical experiment.


Introduction
Congestion is a major problem in commercial facilities since it causes economic losses and reduced user satisfaction.Nowadays the reduced congestion is required to prevent the spread of infectious diseases.There have been various trials for the reduced congestion by "indirect" management, which induces users to change their behaviour without any compulsory actions.Typical examples of the indirect management include incentive-and nudge-based management.One can define incentive is a monetary reward that promotes users to change their behaviour as defined in [1], while nudge is non-monetary information that promotes users to change their behaviour as defined in [2].By appropriately designing incentive and nudge, users' behaviour is indirectly manipulated to achieve the desired behaviour.
Incentive-based management can be achieved by, e.g.dynamic pricing [3], which is to change the prices of goods and services depending on users-demand.The effectiveness of the dynamic pricing has been shown in various applications such as dynamic rail fare [4], parking fee [5], electricity price [6], goods prices in retail [7], and so on.In [4], rail fares are varied according to rail congestion to promote users to change their behaviour and to smooth out the congestion rate at different times of the day.In [5], parking prices are varied depending on the distance to users' common destination from parking areas.In [6], the electricity price is varied based on the time zone: the price is increased during the daytime aiming at shifting electricity demand from daytime to night-time.In [7], goods prices are varied depending on demand aiming at reducing surplus inventory and disposal.As shown in [4][5][6][7], the appropriate design of dynamic pricing is beneficial for both providers and users.
A previous work on nudge-based management can be seen in [8], where informational nudging is utilized for managing traffic flows: sending manipulated, and possibly misleading, global information about the congestion level on the entire network to drivers for altering their route choices [9].In [10], the nudgebased behavioural change of boundedly rational decision makers is modelled and its proportional-integral control is addressed.
This paper addresses a systematic design of general incentive-based management with utilizing informational-nudge simultaneously.Incentive-based behavioural change is modelled involving nudge effects in a different manner from [8][9][10].Then, the model is utilized for the design of model predictive control (MPC, [11]).
The contribution of this paper is as follows: we propose a novel model of behavioural change driven not only by incentivizing but also by informational nudging.Furthermore, we propose a systematic design of incentive-and nudge-based management based on MPC.
The rest of this paper is organized as follows: Section 2 presents a static model of human behavioural change based on behavioural economics [12] and extends it to a dynamic model.Next, in Section 3, management system design is addressed based on the constructed dynamic model.Finally, Section 4 gives a summary of this paper.
Notation: Symbols 1 and 0 denote the all-ones and all-zeros vectors, respectively.Symbol diag(v) denotes the diagonal matrix formed from vector v.

Preliminary
To make clear the incentive-and nudge-based management system to be designed, let us consider congestion at a group of restaurants, which are managed by a common owner.As shown in Figure 1, we suppose that users are concentrated in Restaurant 1 as illustrated in the bottom left figure.Aiming at equal distribution in the three restaurants, the owner gives incentive to the users based on the current user distribution as illustrated in the upper figure.By incentivizing, some of the users change their choices on restaurant.For example, setting low incentive in Restaurant 1 reduces the number of users in the restaurant as illustrated in the bottom right figure.In addition to the incentive, we apply informational-nudge to users, which is information on the congestion situation on each restaurant in this example, to further reduce the congestion.
We address the management system design for a human group who has n ∈ N choices.The overall structure of the management system is given in Figure 2. In the figure, symbol P represents the human group, composed of all users of the system.Symbol K represents the controller to be designed such that the user distribution is equalized in n choices.Symbols π, π * , p and q represent the distribution of users, its desired distribution determined by the owner, incentive, and nudge, respectively.Furthermore, let π Then, each element of π, denoted by π i , represents the ratio of users who decide ith choice, the element of p, denoted by p i , represents the incentive given to users who choose i, and the element of q, denoted by q i , represents the ratio of users informed on the state of ith choice.The signal flow in the management system is as follows.The owner dynamically updates the incentive p and nudge q, and users individually make decisions based on the updated ones.By repeating the above mechanism, the management system aims at achieving the desired user distribution, i.e. π → π * .

Static model of behavioural change
In this subsection, we review a static model, which has been presented in the research field of behavioural economics.We first present a full-information model, in which all system-users receive the information on the congestion, i.e. the information on the user distribution in the discrete choices.Then, we generalize the full-information model to a partial-information one, in which only a part of the users receives information on the user distribution.

Full-information model
One can model the decision making of a human group by a congestion game [13], where a set of noncooperative selfish individuals shares a finite resource.
In the congestion game model, a human group is composed of selfish individuals who have their own preferences.A drawback of the congestion game modelling is in the difficulty of estimating the preference of each individual.It is not always possible to model their preference accurately, in particular, if the control system targets a numerous number of individuals.In this sense, a congestion game formulation is not compatible with model-based control, which is the main aim of this paper.In this paper, we focus only on the averaged behaviour of a human group and assume that it is expressed by the individual model given in [10] under the assumption that the probability of the decision by each individual is the same as the proportion of the decisions by a human group.Then, recalling incentive p and user distribution π , defined in the second paragraph of Section 2.1, a model of decision making of a human group is described as the following optimization problem: where f (π , p, ω) and g(π ) are given by respectively, and ω ∈ R n is weighting vector.In (2), h is a perturbation function that represents the tendency of the system-users, i.e. it determines the user behaviour if no incentive is given.In this paper, we assume that h is given by a quadratic form as where Q 0 ∈ R n×n , R ∈ R n and S ∈ R are coefficient matrices.Another example of a perturbation function is given by h(π ) = n i=1 π i logπ i , which is known as the Gibbs entropy [14,15].The quadratic perturbation function considered in this paper is an approximation of the Gibbs entropy.This can be seen in Figure 3, which shows the function π i logπ i , π i ∈ (0, 1] and its approximation as 1.23π 2 i − 1.06π i − 0.131.Although it is not shown mathematically, the approximation is valid for some cases as demonstrated in the figure.One can choose a quadratic function in the model of h such that the model of decision making, given in (1), expresses its behaviour locally.
We assume that users make decisions based on the incentive p, but its effect is distorted by the user distribution.As studied in e.g.[10,16], decision making is boundedly rational and humans cannot evaluate given incentives honestly.Such bounded rationality is modelled in the weighting vector ω, which depends on the user distribution π.In this paper, we consider that each element of ω is given by where α i and β i are constants.Supposing α i > 0, i ∈ {1, 2, . . ., n}, we see that ω i (π i ) in ( 3) is an increasing function of π i , which models conformity bias in human decisions: humans tend to behave similarly to those around them.Conversely, supposing 3) is a decreasing function of π i , which models anticonformity in human decisions: humans tend to behave differently from those around them.
Example: Consider α i = −1 and n , which represents anticonformity in users decisions.Suppose here that π i > 1 n for some i ∈ {1, 2, . . ., n}, which implies the popularity of the ith choice is above average.Then, ω i (π i )p i < p i holds, which implies ω i (π i )p i is less than the undistorted incentive p i .Therefore, users tend to avoid the majority.Conversely, supposing that π i < 1 n for some i ∈ {1, 2, . . ., n}, we see that ω i (π i )p i is greater than p i .
Noting that 1 n π = n i=1 π i , we see that the equality constraint in (1) implies π is normalized.

Partial-information model
We note that the full-information model, given in ( 1) and ( 3) is derived under an implicit assumption, where all users access the information on user distribution.Recall that the aim of this paper is the management system design in which the user distribution is managed by giving incentive with supplemental nudging.To this end, we assume that the ratio of users who access the information on user distribution is manipulable, which is a realization of informational nudging.Then, we extend the full-information model to a partial-information one, in which a part of the users accesses the information.
We let η ∈ R n + be a vector composed of Here, q i ∈ [0, 1] represents the ratio of users who access the information on π i , which is the proportion of users who choose i.For example, q i = 0.3 implies that the management system discloses the value of π i to 30% of all users.Then, the partial-information model is described by the following optimization problem: where f (π , p, ω(η)) is given by Note here that model ( 5) is derived by replacing weighting vector ω in model ( 1) by ω(η), defined by ( 3) and ( 4).If we let q i = 1 implying the information on π i is disclosed for all users, it holds that η i = π i from (4).This reduces the partial-information model, given in (5) with ( 3) and ( 4), into the full-information one, given in (1) and (3).In this sense, the partial-information model is an extension of the full-information one.Recall again Figure 2, in which the structure of the overall management system is illustrated.The partialinformation model is driven by both the incentive and informational-nudge to change the user distribution.We can see that the incentive and informational-nudge, denoted by p and q, are the control input for the human group.

Dynamic model
We note that the models given in Subsection 2.2 are static, in other words, they express only the steady state responses from incentive and nudge to user distribution.In this paper, we aim at managing the transient responses in addition to the steady state ones.To this end, the partial-information model is extended to a dynamic one as follows.
Consider an optimization algorithm described by where k is the discrete-time, λ ∈ R is the Lagrange multiplier, and τ π ∈ R + and τ λ ∈ R + are positive constants that determine timescale of the human behaviour.Optimization algorithm ( 6) is said to be a primal-dual gradient algorithm, which has been studied in the literature [17][18][19][20].Based on the optimization algorithm given in ( 6), we derive a dynamic model of decision making.Letting x = [π λ] ∈ R n+1 be the state vector, we have (7) where g represents the projection onto g(π respectively, and φ i and ψ i are given by In this paper, the dynamics of decision making is modelled in a different manner from other models such as population dynamics and best response dynamics [21,22].The best response dynamics is given by π = τ (T(p) − π), where τ > 0 is the parameter and T : R n → S := {w ∈ [0, 1] n | 1 n w = 1} is a solution to the optimization problem: T(p) = argmin π∈S (− π diag (π )p + h( π)).Since the best response dynamics is described by an implicit equation in this manner, it is not tractable to handle such equations in control system design, in particular, MPC design.The dynamic model given in (7) overcomes the drawback: decision making is described by a state equation and can be utilized directly in MPC formulation as shown in Section 3. The validation of the model (7) with comparing with the models given in [21,22] is not studied well.This should be addressed in future work by applying them to real-world data.
It should be emphasized that the dynamic model (7), is an extension of the static one (5).This is seen as follows.In (7), we suppose that π(k + 1) = π(k) and λ(k + 1) = λ(k) at k → ∞, which implies that human behaviour is at the steady state.Then, it follows that hold.We see that Equations ( 8) and ( 9) are the KKT optimality condition for the optimization problem (5).
In other words, the dynamic model ( 7) is reduced to the static one ( 5) at the steady state.The regularity condition always holds in the optimization problem (5) as it has only one constraint g(π ) = 0.

Management system design
In this section, we give management system design by utilizing the state space model given in (7).The objective of the management is to shift the user distribution to desired one, determined by an owner, in other words, to achieve π(k) → π * .To this end, MPC is designed aiming at realizing the objective.In the following, we first address the design of MPC-based management system, and then we verify the effectiveness of the management system through numerical experiments.

Incentive-and nudge-based management
The control law in MPC determines the optimal incentive p and nudge q to achieve the desired user distribution denoted by π * .Letting C = [I n 0 n ] ∈ R n×(n+1) + , we define the control input and control output as u = [p q ] and z = Cx = π , respectively.Then, we let the cost function as where z * = π * is the desired user distribution and N is the prediction horizon.Equation (10) evaluates the error in the user distribution z = π and desired one z * .By minimizing the cost function in (10), we aim at improving the tracking performance.Controller K is given by the following optimization problem.For simplicity of notation, we let where p min and p max ∈ R n + are constants.
The control horizon in MPC is 1: letting the minimizer to Problem 3.1 be denoted by {u * (k), u * (k + 1), . . ..u * (k + N − 1)}, we find that controller K actuates u * (k) to plant P, which is a human group, as in the standard setup in MPC.Due to the nonlinear equality constraint (11a), Problem 3.1 is in a class of nonlinear optimization problem on the decision variables {u(k)} and {x(k)} and is computationally intractable.In the numerical experiment given in the next subsection, we solve the problem by using MATLAB function "fmincon" and show that it is solvable in a realistic computational time.
Note that the control law given in Problem 3.1 aims at improving the tracking performance on π(k).In addition to the tracking performance, one can improve the control efficiency by reducing the budget required for the system owner, which is estimated by the total incentive paid to users and is modelled by k+N−1 τ =k p(τ ) z(τ + 1).Although its details are omitted in this paper, one can extend the control law by adding a term k+N−1 τ =k p(τ ) z(τ + 1) to the cost function in (10) and/or imposing its upper bound.

Numerical experiment
In this subsection, we perform numerical experiments to verify the performance of the presented management system, given in Problem 3.1.First, we study the tracking performance achieved by the incentive-based management.Next, we apply informational nudge in addition to the incentive to show the performance improvement.
We consider n = 5, i.e. there are five choices in total.Let Q = diag(2, 2.1, 3, 2.5, 2.2), R = [−1.7 − 1.8 − 2.6 − 2.2 − 1.9] , τ π = 0.005 and τ λ = 0.1 in model (7).Furthermore, we let α i = −1, , . . ., n}, implying that the target human group behaves with anticonformity.In the model, we show that when (p, q) = (0, 0), the user distribution converges to π = [0.0970.133 0.353 0.250 0.167] , which represents the tendency of the group without no external actions.The aim of the numerical experiments is to dynamically manipulate the incentive and nudge, denoted by p and q, to shift the user distribution from the initial state π = [0.0970.133 0.353 0. Figure 4(a,b) shows the response of the user distribution by incentive-based management (Case 1) and incentive-and nudge-based management (Case 2), respectively.We see that the user distribution converges to the desired one, π * , for both cases.We evaluate the tracking performance based on the performance index given by F The values of F 1 for Case 1 is 0.18, while that for Case 2 is 0.16, which shows the supplemental nudge improves the tracking performance effectively.Figures 5(a,b), and 6 show the changes in the incentives and nudges in the management system for Cases 1 and 2. We evaluate the incentive by F 2 = 15 τ =1 p(τ ) p(τ ).The values of F 2 for Case 1 is 30.9, while that for Case 2 is 29.7, implying informational nudging successfully reduces the incentive.Since a large fluctuation in the incentive can deteriorate the user trust on the management system, nudging contributes to improving the trust, while reducing the financial burden.
In Figure 6, q 1 > 0.5 and q 3 < 0.5 generally hold during the transient period like k ∈ [0, 0.4].Recall from Figure 4 that choice 3 is popular compared with choice 1.Then, we see that nudging to users who tend to pick a popular choice should be suppressed, while that to users who pick an unpopular one should be promoted.

Conclusion
In this paper, we proposed a dynamic model of decision making of a human group.The novelty in the presented model is in the interpretation of the informationalnudge in a different manner from [8][9][10]: assuming that the proportion of users who access the state information is manipulable, the proportion and incentivedriven human behavioural change is modelled by a state equation.The model was utilized for the optimal management system design based on model predictive control (MPC).The design management system was verified in two numerical experiments including incentive-based management and both of incentiveand nudge-based management.We demonstrated that the nudge effectively improved the reliability of the management system and simultaneously reduced the incentive.We also showed that nudging to users who tend to pick a popular choice should be suppressed, while that to users who pick an unpopular one should be promoted from the numerical experiment.
There are mainly two issues to be addressed in the future.The first issue is the unclearness of the physical meaning of the Lagrange multiplier, denoted by λ and appeared in (7).In this paper, it is assumed that x = [π λ] is measurable, and MPC-based management system is designed.Future works include further understanding of λ in a real world, and it is used for the efficient design of the state observer by which x is estimated based on measured π and control action p and q.
The second issue is the method of the parameter estimation of Q, R, τ π , and τ λ , which characterize the behaviour of the target human group.In particular, we need to develop the data-driven parameter estimation and to apply it to model some real-world applications.

Figure 1 .
Figure 1.Sketch of change in user distribution by using incentive and nudge given by an owner.

Figure 2 .
Figure 2. Block diagram of a human-in-the-loop system.

Figure 3 .
Figure 3. Signals of the Gibbs entropy and its approximation by a quadratic function.