Learning in two-dimensional beauty contest games: Theory and experimental evidence

We extend the beauty contest game to two dimensions: each player chooses two numbers to be as close as possible to certain target values, which are linear functions of the averages of the two number choices. One of the targets depends on the averages of both numbers, making the choices interrelated. We report on an experiment where we vary the eigenvalues of the associated two-dimensional linear system and ﬁnd that subjects can learn the Pareto-optimal Nash Equilibrium of the system if both eigenvalues are stable and cannot learn it if both eigenvalues are unstable. Interestingly, subjects can also learn it if the system has the saddlepath property – with one stable and one unstable eigenvalue — but only if the one unstable eigenvalue is negative. We show theoretically that our results cannot be explained by homogeneous level-k models where all agents apply the same level k depth of reasoning to their choices, including the naïve learning model. However, our results can be explained by a mixed cognitive-levels model, including the adaptive learning model. We also run a horserace between many models used in the literature with the winner being a simple mixed model with levels 0, 1, and equilibrium reasoning.


Introduction
The process by which individuals learn an equilibrium has been the subject of a large theoretical and experimental literature (see Sargent, 1993;Fudenberg and Levine, 1998;Evans and Honkapohja, 2001;Camerer, 2003;Hommes, 2013). In this paper, we provide new experimental evidence on learning behavior in coupled, linear multivariate settings that are of interest to both game theorists and macroeconomists. The equilibrium and the dynamic properties of such settings are often expressed in terms of the eigenvalues of the corresponding system. To fix ideas, consider a 2 variable coupled linear (or linearized) economic model of the form where ā t and b t denote the population average of the expected time t values of the state variables a t and b t , and where the 2 × 2 matrix M, and the 2 × 1 vector d are exogenously given. Let I denote the identity matrix and assume that I − M is invertible. Let z E = (I − M) −1 d and observe that this is a unique point where z t =z t . There are a number of economic settings that can be mapped into this basic framework and Appendix A provides two examples one from industrial organization and one from macroeconomics. 1 In the oligopoly market example, firms produce differentiated products for two interrelated markets and compete in prices. Eq. (1) describes, for each firm, the price-setting strategy in the game; the best response depends on the average prices in the two markets, ā t and b t . Assuming common knowledge of rationality, z E is the symmetric Nash equilibrium profile. The properties of the learning dynamics, when firms best respond to past average prices, depend on the eigenvalues of M. The one-dimensional analogue of this model is the so-called "beauty contest" (BC) game, and has been studied extensively. It has been shown that both the magnitude and the sign of the slope (that is, the eigenvalue) of the one-dimensional map matter for whether subjects play the Nash equilibrium from the outset or whether and how fast they converge toward it. 2 In the macro example from the monetary policy literature, the state variables are inflation and the output gap and Eq. (1) describes how these variables depend on the expectations formed by a "representative agent" (in the macro literature instead of z t one would typically write z e t ). Point z E is then the unique rational expectations equilibrium (REE). As coordination on this equilibrium requires strong assumptions, the learning literature in macroeconomics assumes that the representative agent predicts z t using past values of the state variables. Convergence to the REE depends on this learning process. When expectations are naïve, that is, when z e t = z t−1 , convergence depends on the eigenvalues of the matrix M. 1 Generally, there are many examples, especially in the macroeconomics literature, that use linearized multivariate dynamical systems. The eigenvalues of such systems are important for deciding whether the solution is determinate (i.e., locally unique) or not, see Blanchard and Kahn (1980). The examples include optimal growth models (e.g., Alvarez-Cuadrado et al., 2004) and monetary policy models (see Benhabib et al. 2001, where the steady states of the system may be of the sink, saddle or source varieties, the concepts that we use as the basis for the treatments in this paper). 2 The importance of the magnitude of the slope has been shown already in Nagel (1995) and Ho et al. (1998). Sutan and Willinger (2009)  Motivated by these considerations, the main question we ask in this paper is whether the eigenvalues of the reduced form, two-dimensional multivariate linear system (1) matter for the ability of human subjects to learn the equilibrium. We address this question by developing an extended, multi-dimensional version of the BC game. Subjects are asked to guess two numbers, a t and b t , instead of the usual one, in a coupled system of the type given in (1). We refer to our game as the "Two Dimensional Beauty Contest" or the 2DBC game. By contrast with the univariate version, in our coupled multivariate system, guesses about the realization of one variable can matter for realizations of the other variable, and vice versa, which makes the task of coordinating on equilibrium behavior even more complex than in the standard one-dimensional case. It is difficult to know, a priori, how subjects will react to the additional complexity of the 2DBC. On the one hand, complexity might slow or prevent learning of the Nash equilibrium, as subjects struggle to understand how their guesses for one variable impact on their guesses for the other. On the other hand, the greater complexity of the 2DBC might promote greater introspection; subjects might respond to the greater complexity by thinking harder about the problem at hand, in the extreme perhaps even applying "fixed-point reasoning" to directly solve for the steady state. 3 Our experiment involves four treatments that vary the eigenvalues of the matrix M. To make the treatments as comparable as possible, we only change one or two elements of this matrix between treatments, and we also adjust the vector d so that all treatments have the same equilibrium z E . To address the question of learning, we ask subjects to play the 2DBC game repeatedly for 15 periods and we examine the dynamics of both variables, a t and b t , over time. Consistent with the experimental BC literature, there are no explicit intertemporal dynamical linkages from one period to the next in (1). In such settings, learning agents will likely condition their guesses on the past history of play. For instance, if agents' guesses for each variable are equal to the previous period values (as under the best response dynamics or, equivalently, naïve expectations), then model (1) is effectively a first-order dynamical system governed by matrix M with the steady state z E . We label our treatments with reference to such naïve best response dynamics, using the standard terminology of dynamical systems. 4 In treatment Sink, both eigenvalues of M are stable, so that in theory, all trajectories converge to the steady state. In treatment Source, both eigenvalues are unstable and so all trajectories diverge. In treatments SaddleNeg and SaddlePos, one eigenvalue is stable and the other is unstable. Theoretically, in both cases, almost all trajectories diverge. The difference between these treatments is in the sign of the unstable eigenvalue, which is negative in SaddleNeg and positive in SaddlePos.
We find that subjects are able to learn the equilibrium steady state of the system in the Sink treatment. Remarkably, they are also able to learn the equilibrium steady state in the SaddleNeg treatment. By contrast, in the other two cases, the SaddlePos and Source, subjects are unable to learn the interior steady state equilibrium. Our findings thus suggest that steady states exhibiting 3 The evidence on the relationship between task complexity and task performance is mixed. The majority of studies surveyed by Liu and Li (2011) indicate a negative relationship, but several studies show a positive relationship and there is even some evidence for an inverted U-shaped relationship between task complexity and task performance. Oprea (2020) reports that complexity costs (i.e., differences in willingness to pay to avoid more complex tasks) in individual decision-making experiments are heterogeneous across individuals but do decrease with experience. 4 See, e.g., Azariadis (1993). Linear systems in discrete time, such as (1), have a unique steady state the stability of which depends on the eigenvalues. The eigenvalue is stable, if its absolute value is less than 1, and is unstable otherwise. The dynamics are globally stable, that is all trajectories converge if and only if all eigenvalues are stable. If all eigenvalues are unstable, then all trajectories diverge (except for the steady state). If there are both stable and unstable eigenvalues, the saddle-path is defined as any trajectory starting in the linear space formed by the eigenvector(s) corresponding to the stable eigenvalue(s). The saddle-paths converge, all other trajectories diverge.
the saddle-path property can be learned by agents who do not begin a process of social interaction with rational expectations. This result is important since saddle-path steady states are commonly used in macroeconomic analysis, 5 but the empirical relevance of such systems can be questioned because they require substantial computational and coordination efforts among participants to be achieved. At the same time, we find that steady state convergence is not a general property of saddle-path stable solutions. We also ask whether a single learning model can explain the different behavior we observe across treatments. We generalize the level-k model of Stahl and Wilson (1994) and the cognitive hierarchy model of Camerer et al. (2004) to our multivariate, dynamic setting. We express the convergent properties of those models in terms of matrix M and compare the predictions with our experimental data. We also use individual-level data to estimate and compare the fit of these and several other models to our data including the structural level-k model of Gill and Prowse (2016) and the EWA model of Camerer and Ho (1999).
Our paper contributes to a large literature on the Keynesian Beauty Contest game, first explored experimentally by Nagel (1995); see Mauersberger and Nagel (2018) for a recent overview. The classic BC game can be viewed as a univariate version of (1). In that game, each player simultaneously guesses a number in the interval [0, 100], and the average of all guesses, ā t , determines the payoff relevant target value, a t = pā t + d. When p ∈ (0, 1) and d = 0, the unique, dominance solvable Nash equilibrium is for all players to guess 0. However, this outcome is rarely observed in the first round of play, t = 1. Instead, subjects tend to apply only a limited number of steps of iterated elimination of dominated strategies. 6 The experimental evidence suggests that there are sizeable fractions of level-0, 1, and 2 types, as well as subjects playing the Nash equilibrium. 7 In repeated play of the one-dimensional BC game, players use the history of outcomes and continue to apply level-k reasoning to prior period winning numbers. The dynamics converge to the Nash equilibrium for 0 < p < 1 both in the classic BC game where d = 0, and in the game with an "interior" equilibrium, where d = 0, as in Güth et al. (2002). The dynamics diverge for p > 1.
Convergence to REE has also been studied in the related literature on learning-to-forecast (LtF) experiments (Hommes et al., 2005;Hommes, 2011;Anufriev and Hommes, 2012) that seek to understand how human subjects form expectations in self-referential systems. Subjects forecast an endogenous variable that evolves, given the average forecast, ā e t , as a t = pā e t + d. In contrast to the BC experiments, the LtF literature typically uses a limited information setup, where subjects do not know the target equation (Mirdamadi andPetersen, 2018 andKryvtsov andPetersen, 2021 are exceptions) and are given only qualitative information about whether higher average forecasts results in higher or lower target values. Recent LtF experiments, e.g., Pfajfar and Žakelj, 2016;Assenza et al., 2021;Mauersberger, 2021;Levelt et al., 2021, study expectations in coupled systems arising out of various versions of the New Keynesian model of 5 The saddle-path environment is particularly attractive since the reaction of the economy to shocks to fundamentals can be uniquely determined only in this environment. In contrast, when the steady state has the sink property, there are infinitely many paths by which the system can adjust following a perturbation. 6 "Level-0" players are not strategic and play randomly. Other players begin with an initial reference point â as the average choice of "level-0" players. "Level-1" players presume that all other participants are "level-0" and choose the best response, p ×â, as their guess. "Level-2" types best respond to the "level-1" choice and guess p 2 ×â. Generalizing, a "level-k" type guesses p k ×â. 7 Penta (2015, 2021) endogenize the depth of reasoning using a cost-benefit approach. The cost is the cognitive effort needed to employ greater depths of reasoning, while the benefit is the value of additional reasoning as reflected in the payoff incentives of the game. monetary policy. The aim of these studies is to understand whether certain monetary policy rules or types of central bank communication can move the system to a more desirable equilibrium or reduce the volatility of forecasts. By contrast, in this paper, we are not interested in considering a specific model (e.g., the New Keynesian model) and the role of monetary policy; rather we are interested in the more fundamental question of whether and how subjects can learn the equilibrium of coupled, multivariate systems that are commonplace in economic modeling. Toward that end, we provide our subjects with full information about the data generating process and employ the contemporaneous forecasting setup, as used in the experimental BC literature. Our paper is also related to theoretical work exploring whether saddle-path stable solutions can be learned under adaptive learning dynamics, e.g., Evans and Honkapohja (2001), Ellison and Pearlman (2011). This work shows that steady state solutions with the saddle-path property can be learned if agents have perceived laws of motion (PLMs) that are correctly specified. 8 By contrast, in this paper, we do not endow subjects with such specifications, though they do know the data generating equations of the system. We show that simple adaptive learning dynamics, initialized according to an uninformative prior belief that the means of both variables will begin at the midpoint of the guessing interval, closely track the behavior of the subjects.

Experimental design
In each experimental session, a fixed group of N individuals participate in T repetitions of the 2DBC game. In each period, each player i submits a pair of numbers, (a i,t , b i,t ), with each number restricted to lie in the interval [0, 100]. 9 Based on these "guesses", the group averages The linear, multivariate model (1) determines the "target" values, that is where the 2 × 2 matrix M and 2 × 1 vector d depend on the treatment and are known to all participants. The payoff to participant i in each period is given by points. Thus, participants are motivated to submit their guesses as close as possible to the two target values. Guessing both targets exactly would bring a maximum reward of 100 points. Deviations from the two target values decrease each participant's payoffs by an equal amount. 10 8 For instance, if agents use a minimal state variable representation for the PLM, i.e., that the two target variables are constants so that E t (a t ) = a and E t (b t ) = b, then, as Evans and Honkapohja (2001) have shown, the system (1) is e-stable if the eigenvalues of matrix M − I are negative. 9 The bounded interval helps to minimize the effects of extreme guesses (mistakes or outliers) and facilitates comparisons with the one-dimensional BC game. From our analysis, we expect that our convergence results would continue to hold without the bounded interval, though learning might take longer and be noisier. 10 We use a distance payoff function as opposed to the tournament structure of the classical BC. The distance payoffs do not change behavior in the BC, see, e.g., Güth et al. (2002). The LtF experiments typically use inverse quadratic distance function truncated at 0 to avoid negative payoffs. Our hyperbolic payoff function, involving the inverse of the distance errors, penalizes the smallest deviations from the targets, giving participants robust incentives to guess the targets precisely. Also truncation of payoffs is not necessary, as the function asymptotically approaches 0. Similar functions have been used in Adam (2007) and Assenza et al. (2021). We study the behavior of experimental subjects in four treatments that differ in the matrix M and the vector d. The treatments and the corresponding parameters are presented in the first two columns of Table 1. Fig. 1 illustrates the dynamics in the treatments under the simple baseline assumption that all subjects play a best response to the previous period target. In all treatments, we use a lower triangular matrix by setting m 12 = 0 in (2). This makes our system coupled, but, at the same time, simple enough that subjects could possibly solve for the fixed point. The key difference across our four treatments is in the location of the eigenvalues of M, i.e., its diagonal elements (these are shown in the last column of Table 1). In the first three treatments, we set m 11 = 2/3, so that the first eigenvalue is stable (see footnote 4). With this specific choice for the parameter, the first (uncoupled) equation of our two-dimensional multivariate system is a version of the classic, 2/3 of the average, BC game (albeit with an interior solution).
Depending on the second eigenvalue m 22 , we then have three treatments. In the Sink treatment, m 22 = −1/2, and so both eigenvalues are stable. In this treatment every trajectory in Fig. 1 converges to the steady state, indicated by the large dot, and so we predict that the steady state would eventually be learned by agents. By contrast, in the two treatments with the "saddle-path property", one eigenvalue is stable while the other is not. The steady state is then generically unstable; the system converges only if it is exactly on the saddle-path, which is the "stable" eigenvector depicted by the solid line in Fig. 1. In the SaddleNeg treatment, m 22 = −3/2, while in the SaddlePos treatment, m 22 = 3/2. Note that the only difference between these treatments is in the sign of the unstable eigenvalue. In particular, the eigenvalues in these two saddle treatments have the same absolute values. The negative eigenvalue introduces "negative feedback" to the bnumber strategy in the SaddleNeg treatment: that is, higher guesses imply a lower target. Instead, when the eigenvalue is positive in the SaddlePos treatment, the b-number guess generates "positive feedback" with higher guesses implying a larger target. 11 For the Source treatment, we set m 11 = 3/2 and m 22 = −3/2. As both eigenvalues are unstable, we do not predict that agents will converge to the steady state in this case except in the knife-edge case where they all start out there. This treatment can be viewed as the multivariate analogue of the univariate BC game with p > 1, see Nagel (1995); Ho et al. (1998). 11 Positive or negative feedback is typically a property of univariate systems. We use this terminology to characterize the feedback for b-guesses only. While we could characterize the SaddlePos system as one of pure positive feedback (as both eigenvalues are positive), the other three systems have mixed feedback properties as one eigenvalue is negative while the other is positive. Given our parametrizations, the matrix I − M is invertible in all four treatments. It can be directly checked that if every participant submits the guesses given by both targets a * and b * coincide with the corresponding guesses. Since there are no profitable deviations by any individual subject, (4) defines a Nash equilibrium. 12 Depending on the treatment, the restriction on the guesses may lead to other, non-interior, or boundary Nash equilibria of the game. These equilibria are shown in the fourth column of Table 1; see Appendix B for formal proofs. In any boundary equilibrium, the target value for at least one variable will be outside of the [0, 100] range. This implies (much) lower payoffs in the boundary equilibria than in the interior equilibrium, and so we refer to z E as Pareto-Optimal Nash Equilibrium (PONE). 12 The restriction [0, 100] on the guesses motivated us to select the particular Nash equilibrium of a E = 90 and b E = 20.
We wanted to have an interior equilibrium which is also far enough from the focal point of the admissible guessing plane, (50, 50), to observe some learning. At the same time, we wanted to have simple numbers so that the participants could easily solve for the equilibrium.
The experimental sessions were conducted in the Experimental Social Science Laboratory at the University of California, Irvine (UCI) using undergraduate student subjects. We conducted four sessions of each of our four treatments. For each session, a group of 10 subjects played the same 2DBC game for 15 consecutive periods. In each period, participants were incentivized to independently choose their two numbers to be as close as possible to the target values via payoff function (3). As our focus is on the convergence properties of the dynamic environment, we had the same 10 subjects play all 15 periods together. 13 Every subject participated in one session only and thus we report data from 4 × 4 × 10 = 160 subjects.
At the start of each session, the Instructions (see Appendix C) were read aloud and the procedure by which the target numbers were determined in each period was carefully explained to subjects. We also projected the equations determining the two target values on a screen for all subjects to see. After the instructions were read, subjects had to successfully answer several control questions before they were able to move on to the main experiment. At the end of the experiment, subjects completed a brief survey. The experiment was computerized using the zTree software (Fischbacher, 2007). In every period, the upper part of the main decision screen reminded subjects how the target values a * and b * were determined based on their choices (i.e., system (2) was presented, though in a simple, non-matrix way). In the middle part of their screen, subjects entered their pair of numbers for the given period, one "a-number" and one "b-number". Subjects could also click on an icon to get access to an online calculator. See the screen shots in Appendix F1, Supplementary material. After all 10 subjects submitted their guesses, the computer program determined the target values for the period and the subjects' payoffs in points according to the payoff function (3). Period t ended with a second results screen reminding subjects of their submitted pair of numbers and informing them of the group average values for the two numbers, ā t and b t , the two target numbers, a * t and b * t , and the points that the subject had earned for the period. Except for the very first period, subjects could see a history of all previous outcomes (their chosen numbers, the averages for both numbers, the target values, and the points they earned) in the lower part of the main decision screen for each period. Subjects' total point earnings, from all 15 periods of a session, were converted into dollars at the fixed and known rate of 100 points = $1. Thus, subjects could earn a maximum of $15 from their guesses. In addition, subjects were given a $7 show-up payment for a total maximum of $22. Actual total earnings (including the show-up payment) varied with the treatment but averaged $12 across all four treatments for an approximately 75minute experiment.

Results
In this section, we provide an overview of the main results from the experiment, illustrating the dynamics of average guesses and individual choices. Further findings including an analysis of first period choices, can be found in Appendix F, Supplementary material.

Dynamics of averages.
The four panels of Fig. 2 show a representative session (session 1) of the evolution of the averages, ā and b , across all four treatments. The time evolution of ā (the thick red line with circles) and of b (the thin blue line with squares) are graphed in relation to dashed lines indicating the PONE levels, a E = 90 and b E = 20. We observe convergence in both the a 13 Rematching subjects each period would make past information less relevant and thus interfere with learning. This approach is standard in this literature, e.g., Nagel (1995). and b numbers to those levels in the Sink and SaddleNeg treatments. By contrast, in the Sad-dlePos and Source treatments, the a and b numbers approach payoff-dominated Nash equilibria instead -see Table 2. Overall, the dynamics in the four sessions of each of the four treatments are very similar to those of session 1; see Figs. F.6 to F.9 in Appendix F, Supplementary material.
The most remarkable finding is the contrast in dynamics between two saddle treatments. Within the time frame of our experiment, participants are able to learn a steady state with the saddle-path property under negative feedback quickly. By contrast, participants are not able to learn a steady state with the saddlepath property under positive feedback. Table 2 shows the average guesses for the aand b-numbers in the first and the last periods of each session of the experiment (as well as the treatment mean, standard deviation, and median). In parentheses, we also report the deviations of the average guesses from the PONE. From this table we can make the following observations.
First and most importantly, there is a consistent pattern in the dynamics of all four sessions of each treatment, comparable with the representative session 1 shown in Fig. 2. Specifically, in the first period the average aand b-guesses in all treatments are significantly different from the PONE. 14 We observe convergence to the PONE (90,20) in the Sink and SaddleNeg treatments and we do not observe convergence to this equilibrium in the SaddlePos and Source 14 For each treatment, using the 4 × 10 = 40 individual a-guesses and a two-sided t-test, we reject the null hypothesis that the mean guess is initially 90. We also reject the null hypothesis that the mean initial b-guess is 20 for each treatment. All p-values are less than 0.0025. Table 2 Statistics on average guesses and deviations (in parentheses) from the PONE (90, 20) for both the a and b numbers and subjects' payoffs for each session of the experiment. treatments. 15 In the SaddlePos treatment, the b-guesses end up even further away from the equilibrium than they were in the initial period. Similarly, the a-guesses in the Source treatment move further away from the equilibrium. Finally, the b-guesses in the Source treatment move only very slightly towards the PONE. Second, the heterogeneity in guesses (as measured by the standard deviations) is similar across all treatments for first period choices and typically decreases to a very small number in the two converging cases. Third, the differences in behavior between treatments translate into differences in subjects' payoffs. The last three columns of Table 2 show the average payoffs per experimental session 15 Using the four session-level observations (as these are independent) for the Sink and SaddleNeg treatments, we do not reject the null hypothesis that the average aand b-guesses in period 15 are different from 90 and 20, respectively. However, we reject this same hypothesis for aor b-guesses of the SaddlePos and Source treatments. We use two-sided t-tests. The p-values for the tests for the Sink, SaddleNeg, SaddlePos and Source treatments for a-numbers are 0.076, 0.488, 0.034, 0.0001, respectively, and for the b-numbers are 0.106, 0.457, 0.001, 0.0001, respectively. (in points) for periods 1, 15, and over all 15 periods. 16 The largest total payoffs are consistently achieved in Sink and SaddleNeg treatments where the dynamics converged. The payoffs in Sad-dlePos and Source treatments are much lower. When we look at how payoffs changed during the experiment, we observe that the initial payoffs did not improve in the SaddlePos and Source treatments. In the two other convergent treatments, the payoffs improved. Interestingly, the initial payoffs in the Sink treatment were almost twice as large as in the SaddleNeg treatment, indicating that the one shot game was much easier in the Sink treatment than in the SaddleNeg treatment (as well as in all other treatments). However, the convergence observed in the Sad-dleNeg treatment was apparently even quicker which allowed subjects to earn the largest total payoff across all four treatments. Individual choices and levels. We illustrate the dynamics of individual guesses in two ways. First, the eight panels of Fig. 3 show, for all four treatments, the cumulative frequencies of individual choices in periods t = 1 (magenta thick line), t = 5 (blue dotted line), t = 10 (green thin line), and t = 15 (black dashed line) for both guesses: the a-number (top panels) and the b-number (bottom panels). The PONE values are indicated by vertical (red) lines. Notice that the variance of individual choices decreases over time, with the greatest reduction occurring during the first 5 periods. In the Sink and SaddleNeg treatments, individual choices converge to the steady state and stay relatively close to each other. In SaddlePos and Source treatments subjects' choices are more dispersed even in the last period of the experiment.
Second, we classify subjects' play using level-k reasoning extended to the dynamic environment. For each session s, we define the level-0 choices for round t > 1 as the averages of the previous round, z s t (0) :=z s t−1 . The higher level-k choices are best responses to level k − 1 choices, where we assume that subjects use the same level in their best response for both the a and b numbers. It means that recursively the level k choices are defined as z s and identify the k at which this function is minimized among all levels 0 ≤ k ≤ 4 and k = E which denotes the equilibrium play z E = (90, 20). We consider seven periods, from 2 to 8, and classify every subject as follows 17 : • "Level-k" type (k = 0, 1, 2, 3, 4): if at least in 5 periods, level-k was identified; • "Equilibrium" type: if at least in 5 periods, k = E was identified; • "Learning" type: none of the above, and the median k in periods 2 − 5 is smaller than the median k in periods 5 − 8 (E is excluded from the median calculations); • "Mixing" type: none of the above, and at least in 6 periods levels k = 0 or k = 1 were identified.
The distributions of types for each of the treatments are shown in Fig. 4. Remarkably, these distributions are similar across treatments. Almost half of the participants (20 out of 40 in the Sink and SaddlePos treatments, 17 in the SaddleNeg and 14 in the Source) are classified as either consistent level 0 or 1 types, or "mixing" types, who go back and forth. About one quarter of all participants, (between 10 in the SaddlePos and 13 in the Sink) are classified as "learning" types. With the exception of the SaddleNeg treatment, there are no "equilibrium" types.
We summarize our findings as follows: 1. First-period choices are heterogeneous with concentrations at the mid-point of the guessing interval, 50. In subsequent periods, heterogeneity is reduced, with the majority of subjects choosing numbers close to level-0 (the average choice of the previous period) or level-1. Some subjects increase their level over time. 2. The dynamics converge to the PONE only in the Sink and SaddleNeg treatments. Convergence is fastest in the SaddleNeg treatment. 3. The a-number converges to its PONE level of 90 almost monotonically in the Sink, Sad-dleNeg and SaddlePos treatments.
17 See Table 7 in Appendix D for individual levels during first eight periods and resulting classification. We look only at periods 2-8 because after convergence (or divergence) all the levels are close to each other, and we do not use period 1 in the classification, as behavior then is not dynamic.
4. The b-number converges to its PONE level of 20 in the Sink and SaddleNeg treatments with a pronounced oscillatory path in the latter treatment. 5. The b-number does not converge to its PONE level in the SaddlePos and Source treatments.

Behavioral models
In this section, we analyze the dynamic properties of various learning models. 18 Our focus is on whether such models are consistent with the features of data as found in Section 3. Understanding the properties of the models is important as it allows for generalizing beyond the specific parametrizations used in the experiment.

Homogeneous level-k learning model
In Section 3, we used a simple classification of the participants based on the idea of level-k thinking. Applying that same mechanism, we now define a homogeneous level-k learning model. Following Nagel (1995), we define level-0 players in period t > 1 as guessing the average of the a and b-numbers from the previous period. 19 Using vector notation and denoting the level-0 players' choices as z t (0), we have Agents who follow the level-0 choice in each time period can be labeled as "stubborn", since their guesses do not change throughout the experiment. In a homogeneous level-0 model, all agents are level-0 and so z t =z t−1 = · · · =z 1 . Thus, given the first period average guesses for both numbers, under the homogeneous level-0 model all subsequent guesses would remain the same in all remaining periods of the experiment. 20 Obviously, such a homogeneous level-0 model generates trajectories that are very different from those observed in our experiment.
Next we define the level-1 choice at time t as a best response to the level-0 choice at time t. It follows that level-1 players submit guesses that are equal to the previous period's targets, that is The homogeneous level-1 model assumes that all agents make level-1 choices in each period.
Using (2), we write this model as a two-dimensional dynamical system 18 Thus we study theoretical trajectories of the models. These trajectories depend only on the model's parameters and initial conditions. In every subsequent period, the trajectories are conditioned on the model's realization. In contrast, in econometric analysis of Section 5, the models are conditioned, at each time step, on actual experimental data. 19 One may argue that our level-0 is too simplistic. We have chosen this definition over possible alternatives based on our experimental data as well as for the sake of exposition. Recall that the participants had access to the past averages of the a and b numbers, as well as to the past target numbers. Classification in Section 3 suggests that both pieces of information were used. The definition of level-0 as guessing the past averages, allows us to define level-1 players as those whose guesses equal the past target values, following the idea that higher-level agents best respond to lower-level agents. 20 The average of the first period numbers plays a role in the initial conditions of the learning models. Motivated by our analysis of the first period choices (see Appendix F2.2, Supplementary material), we can further take (50, 50) as the first round level-0 choice. In the standard BC game experiment, Burchardi and Penczynski (2014) elicit level-0 beliefs of participants and find that 50 (the mid-point of the admissible range) is the modal level-0 belief. Table 3 Properties of the homogeneous level-1 model for four experimental treatments. We report the eigensystem (two eigenvalues and the corresponding eigenvectors), the attractor of the model when guesses are truncated as in (10), and the type of dynamics (monotone or oscillatory). The last two columns verify whether the dynamics match the experimental data (✔) or not (✘).

Treatment Eigensystem
Attractor and dynamics type Data ✘ † depending on the initial conditions. ‡ assuming that the a-number dynamics converges to 0.
In the interpretation of the experiment using expectations, this means that agents, in each period, expect the realized targets from the previous period. This homogeneous level-1 model is equivalent to the well-known naïve expectations model.
Proceeding further, we define for k > 1 the level-k choices for time t as guesses that best respond to the choices of the previous level, k − 1, that is, In the homogeneous level-k model, all agents make level-k choices at each period. What do the dynamics of these homogeneous learning models look like? We start with the case of k = 1, as described by system (8). This system has a unique steady state corresponding to the PONE, z E . The system is linear and its dynamic properties can be understood from the eigensystem of matrix M, see Table 3. Since for all four treatments matrix M is triangular, its eigenvalues, μ 1 and μ 2 , coincide with the diagonal elements. In fact, as explained in Section 2, we designed and named our treatments referring to the best response dynamics, which is exactly the homogeneous level-1 model (7). In particular, the phase diagrams of Fig. 1 correspond to the dynamics (8) in our four treatments. 21 We observe that in the SaddleNeg treatment, almost all trajectories diverge in contrast to the experimental results.
The level-k guesses in (9), can be written as Applying this iteratively, we find that the dynamics of the homogeneous level-k model is given by Obviously, the dynamics have a unique steady state corresponding to the PONE and are governed by the linear system with matrix M k . It follows that the homogeneous level-k model has the same conditions for convergence, in terms of the eigenvalues, as the homogeneous level-1 model. 22 Therefore, the discrepancy between the homogeneous level-k model and the experimental data in the SaddleNeg treatment will persist for any value of k. Fig. 5 column 1 illustrates the dynamics of the homogeneous level-1 (naïve expectations) model and compares these with the experimental data, column 2. The panels in column 2 show the trajectories for average guesses in a representative experimental session (session 1) of each of the four treatments (from the top to the bottom row). The panels in column 1 show the simulated trajectories of the level-1 model in each treatment. The simulated trajectories start at the point of the first period averages of a and b-numbers from the experimental session, but otherwise they are not informed by the data. To take into account the restriction of our experiment that guesses must be in the interval [0, 100], we truncate the choices in (7) as and simulate model (8) with these averages. Several observations can be made from a comparison of the simulations of the level-1 model and experimental data. First, the dynamic paths of the aand b-numbers in the simulations and in the data look very similar to one another in the Sink treatment (top row of Fig. 5). They both converge, and, remarkably, along the same direction, given by the eigenvector v 1 of the model. Second, the a-number converges both in the model and in the experiment to the PONE level of 90 in both saddle treatments (the two middle rows) and to 0 in the Source treatment (bottom row). But the model does not match the data in the dynamics of the b-number in two treatments. In the SaddleNeg treatment, the second eigenvalue of the model is negative, causing the simulated path for the b-number to jump back and force through v 1 . The dynamics in the experiment exhibits similar oscillations. However, in the experiment these oscillations converge to the PONE, while in the model they converge to a limiting 2-cycle between 0 and 50. In the Source treatment, the limiting b-number dynamics of the model oscillate between 0 and 95, whereas in the experiment the dynamics stay close to a boundary Nash equilibrium value of 38. Third, the model accurately characterizes the dynamics in the SaddlePos treatment, apart from small details such as the speed of divergence and dynamics at the boundary. Table 3 summarizes this discussion. The attractors of the simulated dynamics are the sets to which the simulated trajectories converge. We distinguish between two types of dynamics, monotone or oscillatory, depending on whether the eigenvalue is positive or negative. The last two columns indicate whether the dynamics of the simulated model agrees with the experimental data.
As mentioned before, the dynamics of the homogeneous level-k model for larger values of k exhibit similar properties. Further, if k is equal to 2 or any other even number, all eigenvalues of the model are positive, which rules out the oscillatory behavior observed in several experimental treatments. We formalize this analysis as Result 1. The homogeneous level-k learning model for any finite k cannot reproduce all the features of our experimental data. In particular, this model does not predict convergence to the PONE in the SaddleNeg treatment, and there are discrepancies in other treatments as well, including very different dynamics in the Source treatment. Note that this result holds only if we restrict ourselves to the model with a homogeneous level of iterative thinking. Given the evidence for different levels of thinking in our experimental data, presented in Fig. 4, it is natural to relax the homogeneity assumption, which we do below.

Mixed levels 0-1 learning model
Let us assume that in each period t, a fraction 1 − λ of agents make the level-0 choice of the previous period average as in (6) and the remaining fraction λ of agents play the best response and guess the previous period targets as in (7). We call this model the mixed levels 0-1 learning model. The averages in this model evolve as where we substituted the target values from (2) to the get the last expression. Note that the dynamics (11) can be written as z t =z t−1 + λ(z * t−1 −z t−1 ). When guesses are interpreted as expectations, this relation is identical to the model of adaptive expectations. 23 According to this model, a representative agent adapts their expectations in a constant proportion, λ, to the forecast error. This observation provides a novel and useful bridge between the level-k models of strategic thinking and the model of adaptive expectations.
The dynamics of the mixed levels 0-1 model (11) have a unique steady state, the PONE z E , as in the homogeneous level-k models. The linear dynamics are now governed by the matrix λM + (1 − λ)I. The eigenvalues of this matrix are easy to determine, as in all our treatments the matrix M is lower triangular. They are the weighted averages of 1 (with weight 1 − λ) and the eigenvalues of M (with weight λ). Furthermore, matrix λM + (1 − λ)I has the same system of eigenvectors as the matrix M, see Appendix F3.1, Supplementary material. Table 4 summarizes important convergence properties with the last two columns comparing the model with the data. We find that to obtain convergence to the PONE in the SaddleNeg treatment, the fraction λ of level-1 agents in the learning model cannot be too large. Specifically, when λ < 4/5, the dynamics for the b-number will converge. The 'jumping' behavior along the b-dimension, visible in the phase diagrams, and translating into the oscillatory convergence observed in the SaddleNeg treatment is consistent with the model only when λ is not too small, i.e., for λ > 2/5. Also, the median numbers in the final experimental period (see Table 2) are the same as those predicted by the mixed levels 0-1 model for the diverging b-number in the Sad-dlePos treatment (100) and the diverging a-number in the Source treatment (0). The diverging b-number in the Source treatment is predicted to be 38 for this model, which is very close to the average value of 35 observed in the experiment. We summarize these findings as follows: Result 2. The mixed levels 0-1 learning model with proportion of the level-1 agents λ ∈ ( 2 5 , 4 5 ) can reproduce all features of the experimental data. 23 The adaptive expectations model is quite popular in macroeconomics; see Nerlove (1958) and Hommes (1994) for theoretical treatment and Pfajfar and Žakelj (2016) and Bao and Duffy (2016) for recent experimental evidence. Adaptive expectations are closely related to constant gain learning models that are increasingly used in contemporary macroeconomic modeling, see Section 3.3 in Evans and Honkapohja (2001). The naïve expectations model is a special case of adaptive expectations where λ = 1. Table 4 Properties of the mixed levels 0-1 model. See the caption of Table 3 for details.
The mixed levels 0-1 model effectively averages the last period information that was available to the participants in the experiment. In Appendix F3.3, Supplementary material we discuss the models that average available information from all past periods.

Higher order mixed level-k models
Generalizing the mixed levels 0-1 model, we assume that there is an exogenous distribution, F = {f k } k≥0 ∪ f E , of agents with different levels of reasoning in the population. Level-k agents are present in proportion f k ∈ [0, 1). We also include proportion f E ≥ 0 of an 'equilibrium type' who always submits the fixed point, z E as their guess. 24 Level-0 agents use the previous period averages as in (6). Depending on assumptions made about higher level agents, we consider the two families of models, mixed level-k and cognitive hierarchy models. Mixed level-k model. In this model, level-k agents always assume that all of the other agents are of the next lower level k − 1, and thus they best respond to level k − 1 agents, as in (9). The dynamics of the model is given by: The dynamics of the general, mixed level-k model have a unique steady state (the PONE) and are linear. The eigenvalues, μ 1 and μ 2 , are the convex combinations of the k-th powers of the eigenvalues of matrix M, weighted by the corresponding fractions, f k . This property enables us to reproduce the features of the experimental data in all four treatments, similarly to the mixed levels 0-1 model of Section 4.2, which is a special case with f 0 = 1 − λ and f 1 = λ. Whereas the mixed level-k model provides more flexibility in general, there are some restrictions on the distribution of levels needed in or to match our experimental data. Consider, for example, the oscillating convergence in the SaddleNeg experimental treatment. To reproduce it, 24 We observed the presence of this type in small measure in the experiment in the classification in Section 3. The equilibrium guesses, z E , were also occasionally submitted by participants who were not classified as being an equilibrium type, see Table 7 in Appendix D.
we should have both μ 2 > −1 and μ 2 < 0 in the model. The former is ruled out when all levels k and hence powers of matrix M are odd; the latter is ruled out when all levels k are even. 25 Therefore, as a minimal requirement, both even and odd levels need to be present in the mixed model. Also note that the presence of the equilibrium type helps to stabilize the model dynamics in all treatments. Tables F.3 and F.4 of Appendix F3.2, Supplementary material present the properties of the mixed levels 1-2 and 0-1-E models, respectively.
As we will see in Section 5, the 0-1-E model with a small fraction of equilibrium types is the best in describing our data quantitatively. Fig. 5 column 3 illustrates the trajectories of the mixed levels 0-1-E model with the parameters estimated from the data (see the caption). The trajectories start at the point of the first period averages of the experimental session 1, so that the dynamics can be compared with those from the homogeneous level-1 model (column 1) and the experiment (column 2). The 0-1-E model reproduces the main features of the experimental dynamics much better than the homogeneous level-1 model. Most importantly, the dynamics of the mixed levels model are converging to the steady state in the SaddleNeg treatment as in the experimental data and exhibit divergent dynamics in the Source treatment that are similar to those observed in the experimental data (not just for session 1, but for all four sessions of these treatments as well). Truncation according to (10) implies that for the diverging treatments, simulations result in more regular dynamics than in the experiment. Similarly, in the converging treatments, simulations are smoother and closer to the eigenvector than in the experiment. This is because, aside from the initial condition, the trajectories of the learning model do not use the experimental data, and thus cannot account for discrepancies between the model's predictions and the noisier experimental dynamics at every time step.
We summarize the discussion of the mixed level-k models as the following result.

Result 3.
The mixed level-k model reproduces all features of our experimental data for many, but not all, distributions of levels. Oscillating convergence in the SaddleNeg treatment is reproduced if and only if both even and odd levels of thinking have non-zero proportions. The equilibrium type stabilizes the dynamics; its fraction should be small enough to be consistent with divergence in the SaddlePos and Source treatments.
Cognitive hierarchy model. The second family of higher order mixed-level k models introduced in Camerer et al. (2004), builds a "cognitive hierarchy" (CH), where agents of level k best respond not simply to agents of level k − 1, but rather to a distribution over all lower levels. The equilibrium type continues to submits z E , but all other types assume that all other agents' levels are lower than theirs, with the different level k types present in the proportions given by the suitably adjusted distribution F. Thus, every level-k agent first builds a perceived distribution of the levels of the other agents in the population, {g (k) } k−1 =0 , with the fraction of level-agents (with = 0, 1, . . . , k − 1) given by Then, the agent best responds to the perceived behavior of the others. Therefore, the guess of the level-k type, in deviations from the equilibrium, is given by 25 When all levels are odd, μ 2 = f 1 m 22 + f 3 m 3 22 + . . . . Since m 22 = −3/2 in the SaddleNeg treatment, μ 2 ≤ −3/2. When all levels are even, μ 2 = f 0 + f 2 m 2 22 + f 4 m 4 22 + . . . and so μ 2 > 0. In this discussion, we define the equilibrium type as even.

M g (k)
This guess is weighted with the actual fraction, f k , to represent an effect of all level-k agents on the total dynamics. When these effects are summed up over all levels of thinking in the population, we obtain the dynamics of the CH model.
With only two levels, 0 and 1, the CH model is identical to the mixed levels 0-1 model. 26 The next result shows that, in general, the CH model is a mixture of several mixed levels 0-1 models with appropriate coefficients. (The proof is in Appendix E.)

Proposition 4.1. The dynamics of the CH model with distribution
with λ K = f K , and other weights defined as follows: for any < K, λ = f / j =0 f j .
Proposition 4.1 shows that the CH model generates linear dynamics with the steady state z E . Apart from the stabilizing effect of having some equilibrium types, its convergence properties are determined from interaction of the underlying levels 0-1 models. The coefficients (λ's) of these models are f K , g K−2 , etc., i.e., the fraction of the highest level of thinking in the population, and then, for each lower level, the relative fraction of this level in the subset of the population consisting of this level and the levels below. Since matrix M is always triangular, all factors on the right side of (13) are triangular matrices. Hence, the eigenvalues of the dynamics of (13) are the products of the eigenvalues of those matrices and 1 − f E . Assume that f E = 0. It follows from Proposition 4.1 that the dynamics of the CH model will converge if all levels 0-1 models on the right side of (13) converge, and they will diverge if all of these models diverge. Combining this with Result 2, we obtain Result 4. Consider the CH model with F = {f k } K k=0 and λ i defined as in Proposition 4.1. If λ i ∈ ( 2 5 , 4 5 ) for all i, the CH model is consistent with all of our experimental findings (except for oscillations in the SaddleNeg treatment if K is even). If λ i / ∈ ( 2 5 , 4 5 ) for all i, the CH model is not consistent with the experimental data.
If some λ i 's are in the set ( 2 5 , 4 5 ) for which the mixed levels 0-1 model is consistent with the data, and some λ i 's are outside of this set, the properties of the model in relation to the experimental data depend on the exact distribution for F.

Estimation
In this section, we compare models in terms of their quantitative fit to the experimental data at the individual subject level. For a given session and period, each model M generates an indi- 26 Camerer et al. (2004) focus on the special case where F is a Poisson distribution, truncated at the maximum level of thinking in the population. The only parameter of this distribution, τ , characterizes the average cognitive level of the population. The Poisson CH model truncated to the levels 0 and 1 is equivalent to the mixed levels 0-1 model with the fraction of level-1 type given by λ = τ/(1 + τ ), establishing a one-to-one correspondence between the parameters of the two models. vidual predicted guess by conditioning on some, if any, of the experimental data available to the participant in that session and period, i.e., where we omit the session-specific index, and θ denotes any model parameters. Each model generates a one-period ahead prediction, z m i,t = (a m i,t , b m i,t ), for each individual i at time t in a given treatment and session. We compare the following models: • the Pareto-Optimal Nash Equilibrium (PONE) prediction; • the homogeneous level-k models (9) with 0 ≤ k ≤ 4; • the mixed level models (12) with various combinations of levels; • the cognitive hierarchy (CH) model (13)  For completeness, we also include the following popular models from the literature, whose detailed specifications are reported in Appendix F4, Supplementary material: • the quantal response equilibrium (QRE) model of McKelvey and Palfrey (1995); • the noisy introspection (NI) model of Goeree and Holt (2004); • the experience-weighted attraction (EWA) model of Camerer and Ho (1999); • and the structural mixed level-k learning model of Gill and Prowse (2016).
Similarly to the PONE, the QRE and NI are not learning models, as they do not condition on data from past rounds. We include them as common benchmarks to see whether learning can improve the fit. The EWA model is a popular learning model, which explains behavior in many games well and was often estimated for one-dimensional BC games. Finally, the recent Gill and Prowse (2016) model is particularly interesting for us, because it structurally specifies individual behavior and explicitly allows for "learning types" who increase their levels over time.
For the models that contain parameters, we estimate them in two ways. For the mixed level-k, EWMA and the CH models, we do not specify the error distribution explicitly and estimate the parameters by minimizing the sum of squared errors (SSE) between the model predictions and the actual data from the experiment. 27 For the QRE, NI, Gill-Prowse, and EWA models, where the distributional assumptions are explicit, we use maximum likelihood estimation. Since estimating most models requires conditioning on the lagged data, we estimate all models with the dependent variable starting from period t = 2. We assume that choices are independent between individuals, conditional on the common past information. The SSE is thus defined as where the first sum is over all individuals in the experiment. The log-likelihoods for the ML estimation depend on the specific models and are given in Appendix F4, Supplementary material.
We seek universal parameters for all treatments and sessions to ensure the models are parsimonious, yet sufficiently general. For this reason, in estimating the model's parameters θ, we minimize the SSEs or maximize the likelihoods over all treatments and sessions. Parameter estimates resulting from this estimation for different models are shown in Table 6 with associated standard errors reported in parentheses. 28 After estimating each model's parameters, we compare the performance of the models with the data. To facilitate the comparison between the models that make no explicit distributional assumptions and the more detailed models that assume specific distributions and account for overfitting, we use the out-of-sample one-step-ahead prediction root mean squared error (RMSE) criterion. To compute this RMSE, we use the so-called leave-one-out cross-validation procedure of Stone (1974). 29 The RMSE is defined as where S T is the set of all sessions in treatment T , I s is the set of all individuals in session s, and θ | −s denotes the parameters, used to generate a m i,t , b m i,t , that are estimated from −s , all available experimental data except for data from session s. 30 In particular, we estimate the parameters with one of the sessions left out, and then compute MSE of the estimated model on the data of the left-out session. After this procedure is performed four times, leaving out each session of the treatment, we average the resulting MSEs and compute the RMSE for the treatment. Consistent with the estimation, we evaluate RMSEs starting from period 2 for all models.
The out-of-sample RMSEs for all models are reported in Table 5. In the last column of the table, we report the RMSEs over all treatments, that is, the square root of the average MSE over all 16 sessions. 31 The RMSE of the best model (i.e., the smallest RMSE) for each treatment and overall is shown in boldface.
Comparing the RMSEs across all considered models, we find, unsurprisingly that the PONE prediction is the least accurate. The QRE and NI models also do not perform very well, This suggests that learning is important in our experimental setting. Moving to the learning models, we 28 We compute standard errors using a nonparametric bootstrap method (Hansen, 2019, Chapter 10.9). To account for time and session dependence, we perform bootstrap random sampling with replacement on the level of one session in each treatment. In particular, each new bootstrap sample is generated by drawing randomly with replacement four sessions for each treatment from the four actual sessions corresponding to a particular treatment. We compute parameter estimates for each of the bootstrap samples and then take their standard deviation as standard errors. We use 1, 000 bootstrap samples. This is sufficient for computing standard errors up to the two decimal places. 29 This procedure is "out-of-sample" because the data of the session for which the MSE is computed are not used to estimate the model parameter(s). This procedure has an implicit penalty for the number of parameters, i.e., overparametrized models perform better in-sample, but worse out-of-sample because of over-fitting. AIC and BIC are popular model selection criteria that have explicit penalties for the number of parameters. In contrast to these criteria, leaveone-out cross-validation does not require any distributional assumptions. Under certain conditions, it is asymptotically equivalent to AIC (Stone, 1977). 30 For the models that make explicit distributional assumptions and generate the whole predictive distribution rather than a point forecast, that is, QRE, NI, Gill-Prowse, and EWA, we use the conditional means of these predictive distributions as point predictions a m i,t and b m i,t . 31 We also computed the RMSEs of the applicable models for period 1 choices, see Appendix F4.5, Supplementary material. Table 5 Performance of different models using individual-level data for periods 2-15. The models are compared in terms of their out-of-sample, one-step-ahead RMSE using the leave-oneout procedure. The smallest RMSE for each treatment and overall is shown in boldface. Parameter estimates of the models with parameters are reported in Table 6. The mixed levels 0-1 and CH-Poisson 0-1 models have the same RMSEs as they are equivalent, see Section 4.3.

Models
Out-of-sample RMSEs   observe that the homogeneous level-k models perform rather poorly, which is consistent with Result 1 in Section 4. Among these models, the homogeneous level-0 model performs better overall due to better performance in the non-convergent SaddlePos and Source treatments, whereas the level-1 model performs better in the convergent Sink and SaddleNeg treatments. The average models generally perform better than the homogeneous level-k models. In particular, the model with a 4-lag moving window and the EWMA model are the best among all average models.
All of the remaining models have an even better overall performance except for the EWA model, which performs poorly in the two non-convergent treatments. 32 The EWA model does outperform all models for the Sink treatment. The reinforcement learning model (EWA with δ = 0) performs worse than the unrestricted EWA model which is to be expected in this dynamic learning environment. By contrast, the belief-based learning model (EWA with δ = 1) performs well, and attains the smallest RMSE for the Source treatment. 33 We observe that, in agreement with Result 3 in Section 4, the class of the mixed level models generally perform better than the other models considered in Table 5. The Poisson CH model class, which is a single-parameter disciplined version of the mixed models, performs slightly worse than the mixed models. Note that the mixed 0-1 model and CH-Poisson 0-1 model are equivalent. The mixed 0-1-E model has the lowest RMSE in all treatments except for Sink and has the lowest RMSE overall. 34 From Table 6, this model is a mixture of 41% level-0 s, 52% level-1 s and 7% equilibrium types. In terms of the parametrization in Table F.4 in Appendix F3.2, Supplementary material, for this model λ = 0.56. This model satisfies the necessary and sufficient condition (F.3) to reproduce the data features. We illustrate the simulated path of this model (not correcting for any discrepancies with experimental data during the simulations) in the right, column 3 panels of Fig. 5. Interestingly, the same, mixed 0-1-E model shows the lowest RMSE for period 1 only (Table F.7 in Appendix F4.5, Supplementary material), but in that case the mixture of levels 0, 1 and the equilibrium type are 68%, 18% and 14%, respectively. This indicates that there is some individual learning over the levels, as there is a decrease in level-0 s and an increase level-1 s moving from period 1 to the remaining periods. The fraction of the equilibrium type also reduces, which may indicate that, although some participants can compute the PONE, they soon realize that this is not a best response to the behavior of others.
By contrast with the mixed levels models, the Gill-Prowse model is a structural model that aims to model individual behavior consistently and specifies both fixed level types as well as learning types who increase their levels over time. Since the Gill-Prowse model imposes structural restrictions, its fit is not as good as the mixed 0-1-E model, but it is very close. The best performing version of the Gill-Prowse model includes levels 0, 1 and 2 and the learning types between them. Adding the equilibrium type to the Gill-Prowse model provides an even better fit to the data. The model estimates that 40% of subjects are fixed level types up to level-2 with the highest fraction being level-1 types, a small fraction of equilibrium types and 60% who are 32 Wilcox (2006) shows how the parameters of the EWA model may be biased due to individual heterogeneity. In models that pool across all individuals, any idiosyncratic errors in individual choices will make them correlated over time. Further, if such models use individual past choices to predict individual current choices, there may be an upward bias in the parameter estimates on the individual's past choices. Following Wilcox (2006), we checked an alternative specification of the EWA model by allowing individual heterogeneity in parameter λ of the model. The other parameters remained within their standard error bounds indicating no substantial bias in our case. Importantly, none of the other models that we estimate are subject to this critique because the individual choices in these models do not condition on the individual's own past choices but rather condition on the past average choices (or the functions of those averages). 33 Since δ is set to 1 and the κ estimate is 0, this belief-based model corresponds to weighted fictitious play, which is related to the adaptive expectations model, see footnote 9 in Camerer et al. (2002). The mixed levels 0-1 model, which also yields a good performance, is closely related to the adaptive expectations model as well, see Section 4.2. 34 Note that many models with more levels than in the mixed 0-1-E model, e.g., mixed 0-1-2-3-4-E model, have estimated proportions of these additional levels that are close to 0, indicating that these levels are "redundant". That is why the fit of these models is close to the fit of the mixed 0-1-E model, but not as good because of the implicit penalty for the over-parametrization. learning types. These results are generally consistent with the individual data classification, see Fig. 4.

Conclusion
We have designed and reported on an experiment examining learning behavior in linear, coupled, multivariate systems, which are widely used in game theory, microeconomics, macroeconomics and other fields. Our experiment makes use of and generalizes the well-known Beauty Contest game to two dimensions, resulting in a richer and more complex, coupled multivariate beauty contest game.
We conducted four experimental treatments with different eigenvalues for this coupled multivariate system. We found convergence (divergence) to the unique PONE in the Sink (Source) treatments. More surprising, we found that subjects are able to learn a steady state with the saddlepath property when the sign of the unstable eigenvalue is negative, as in our SaddleNeg treatment, but not when it is positive as in our SaddlePos treatment. This is an important finding since steady states with the saddlepath property are common in theoretical models used across a variety of different fields in economics. We find that, as in uni-variate models such as the original beauty contest game, negative feedback seems to play a role in disciplining convergence to equilibrium in coupled multivariate linear systems.
Finally, we have compared the performance of a variety of different learning models found in the literatures on learning in games and macroeconomics in terms of their convergence dynamics as well as the fit of these estimated models to individual level data. Indeed, we provide a kind of "Rosetta stone" linking, e.g., level-k models from behavioral game theory with the adaptive learning approach used in the econometric learning literature in macroeconomics. Among the many learning models we consider, we find that a heterogeneous-agent, mixed level-k learning model (with some measure of levels 0, 1 and the equilibrium type), outperforms all other considered learning models in terms of the out-of-sample root mean squared error of its predictions relative to individual level data and we provide some intuition for this result.

Appendix A. Motivating examples
In this appendix, we provide two motivating examples of economic models that map into the basic framework described in the Introduction.

A.1. Oligopoly market example
The first example concerns price expectations in two interrelated oligopoly markets, one for product group A and the other for product group B. Assume that N firms compete in each of these two markets on price, but produce differentiated products. For simplicity, we assume that the cost of producing products is the same across all firms, where the parameter c denotes the marginal cost of production, and q A i and q B i are the quantities produced by firm i for markets A and B, respectively.
The demand function for the product of firm i in market A is given by where p A i is the price of the product and p A = 1 N j p A j is the average price charged by all firms in market A. Parameter γ AA specifies the relationship between the products of the various firms in this market: if γ AA > 0 the goods are substitutes, whereas if γ AA < 0 they are complements. The linear demand as in (A.1) can be derived assuming that consumers have a linear-quadratic utility function.
The demand function for the product of firm i in market B is given by where p B i is the price of the product in market B and p B is the average price charged by all firms in this market. Parameter γ BB specifies the relationship between the products of various firms in market B in the same way that γ AA does for market A: if γ BB > 0 the goods are substitutes, whereas if γ BB < 0 they are complements.
When γ AB = 0 in (A.2) markets A and B become interconnected in the sense that the demand in market B depends on the average price in market A. 35 When γ AB > 0 the higher average price of products sold in market A leads to a higher demand for products in market B, and so the markets become substitutes, whereas for γ AB < 0 the markets are complements. Such betweenmarkets connections can occur when products traded in market A are the intermediate goods used in the production of goods traded in market B. Examples include the markets for oil and gasoline, or the markets for milk and cheese.
Firm i maximizes its total profit Assuming that the number of firms is sufficiently large so that firms can ignore their own impact on average prices, the first-order conditions are given by Solving this system of equations, we find that the optimal prices of firm i for markets A and B, denoted as p A * i and p B * i , can be computed as There exist a Nash Equilibrium, where all firms set the same prices in each of the markets, which are given by the elements of (I − M) −1 d. Regardless of whether firms are in this equilibrium, given the average prices in both markets, each firm maximizes its own profit by setting prices as close as possible to the target prices, p A * i and p B * i , as defined in (A.4). Indeed, from (A.3), the firm's profit can be written as 35 Another way to connect the markets would be by modeling economies/dis-economies of scope in the cost functions, say, as in Bulow et al. (1985).
where 'Const' denotes the terms that do not depend on p A i and p B i . Thus, our 2DBC game provides subjects with the same incentives as they have in the price-setting game, to be as close to each target as possible. Our payoff function (3) penalize them for deviations from the targets on both markets in the same way, which corresponds to β A = β B .

A.2. New Keynesian model example
For another motivating example of a 2D linearized system studied by macroeconomists, consider a New Keynesian (NK) model. The model consists of equations for inflation, π t , the output gap, y t , and a policy rule for the nominal interest rate, i t : In these equations, π e t+1 denotes expected inflation, y e t+1 is the expected output gap, r is the exogenously fixed real rate of interest, π is the central bank's inflation target and κ > 0, ϕ > 0 and λ > 0 are parameters. The adaptive learning literature (see Eusepi and Preston, 2018 for a recent review) treats the expectations as subjective, and not necessarily rational. Branch and McGough (2009) and Kurz et al. (2013) show that when expectations are heterogeneous, the dynamics can be described by the above relations with average expectations, π e t+1 and ȳ e t+1 , replacing π e t+1 and y e t+1 on the right side. Such a system can be rewritten in the matrix form as Individuals in the NK model with heterogeneous expectations aim to minimize errors in their expectations about inflation and output gap, π t , and y t . These variables correspond to the 'target' numbers in our 2DBC, and they indeed depend in (A.2) on the average guesses. Our 2DBC model then maps to the NK model with heterogeneous expectations, albeit with different timing. Indeed, whereas the NK model has forward looking expectations, expectations in our version of the 2DBC game are contemporaneous. 36 Suppose we consider standard calibrations of the New Keynesian model where 0 < κ ≤ 1, 0 < ϕ ≤ 1 (see, e.g., Galí, 2015). Then, one can show that for values of the policy parameter 0 < λ <λ, the system is a saddle with a positive unstable root. For values of λ >λ, the system has two stable eigenvalues −1 < μ 1 < 0 < μ 2 < 1, so that the system becomes a sink and the steady 36 Some recent "learning to forecast" experiments use systems like (A.5) and ask participants to predict the next period values of both variables. We do not pursue this approach here, as we want to stay close to the standard BC experiments. The simpler timing of our experiment leads to a cleaner theoretical analysis. Sonnemans and Tuinstra (2010) note, that for one-dimensional forecasting game experiments, the difference in the timing of forecasts (contemporaneous or one-step ahead) is not crucial for convergence properties. state is globally stable. More generally, we wish to abstract from a particular macroeconomic application and to consider a variety of different 2D linear systems as characterized by whether the steady state is globally stable (a sink), saddle-path stable, or unstable, two explosive roots (a source). Thus, the target values coincide with the individual guesses for both numbers, ensuring the maximum possible payoff of 100 points for each participant, according to (3). Any deviation from this profile will lead to a lower payoff of the deviating participant.

Appendix B. Equilibria of the 2DBC game
We will now show that asymmetric equilibria are impossible in this game. Assume that there is a pair of different strategies in an equilibrium profile of a participant. Without loss of generality, we assume that the a-numbers differ. Denote the left-most and right-most among all a-numbers in the profile as a L and a R , respectively. Then for the average of all a-numbers, ā, we must have a L <ā < a R . Let us show that if the target a * ≤ā, then the participant increases his own payoff, by replacing a R with a R − ε with a small enough ε > 0 and keeping the same b-number. 37 Indeed,when (a R , b) changes to (a R − ε, b), the average of the a-numbers decreases by ε/N , leading to a new target a * − εm 11 /N . The distance between the new target and the new strategy is a R − a * − ε(N − m 11 )/N . The target for the b-number changes to b * − εm 21 /N . Overall, there is a gain of ε(N − m 11 )/N for the a-target and there might be a maximal loss of ε|m 21 |/N for the b-target. In total, there is an increase in performance as measured by the sum of absolute deviations of the aand b-numbers by at least ε(N − m 11 − |m 21 |)/N and this quantity is strictly positive in all our treatments, when N = 10.
Since only symmetric equilibria are possible, an argument similar to the one above shows that all participants submit the same a and the same b-numbers in any equilibrium. Thus, z E is the unique NE in the interior of the strategy space. Boundary-value NE are possible only when participants, submitting boundary guesses, receive targets outside the admissible strategy space. It is then straightforward to check the following: • There are no boundary equilibria for the a-number in our treatments, except for the Source treatment where 0 and 100 are two such equilibria. In this treatment, when a = 0, strategy b = 38 completes the equilibrium profile, and when a = 100, strategy b = 18 completes the equilibrium profile. • There are no boundary equilibria for the b-number in our treatments, except for the Sad-dlePos treatment where 0 and 100 are two such equilibria. In this treatment, when b = 0 or b = 100, strategy a = 90 completes the equilibrium profile.
This completes the proofs of the statements of Section 2 as summarized in Table 1. 37 If a * ≥ā, the payoff is increased replacing (a L , b) with (a L + ε, b).

Appendix C. Experimental instructions
The following instructions (for treatment Sink) were distributed to every participant and read loudly. The equations were explained and displayed on the screen during the whole experiment. Welcome to this experiment in economic decision-making. Please read these instructions carefully as they explain how you earn money from the decisions you make in today's experiment. There is no talking for the duration of this session. If you have a question at any time during the experiment, please raise your hand and your question will be answered in private. Kindly silence and put away all mobile devices.
General information: You are in a group of 10 participants including you. There are 15 successive time periods 1, 2, . . . , 15 in this experiment. The same participants will be in your group during this experiment in all 15 periods. In each period you have to choose two numbers, an "A-number" and a "B-number". Your choice for each number must be between 0 and 100 inclusive which means that 0 or 100 are also allowed. Every participant in your group also chooses a pair of numbers, an "A-number" and a "B-number", between 0 and 100 inclusive. After you and every other participant in your group have chosen a pair of numbers, two target values will be determined as explained below, one for the "A-numbers" and another for the "B-numbers". Your earnings from this experiment will depend on how close your chosen numbers are to the corresponding target values. The closer your numbers are to the target values, the greater will be your earnings.

Determination of the target values:
In each time period, you and all other participants choose one "A-number" and one "B-number". After all participants have chosen their numbers, the target values for the "A-numbers" and "B-numbers" are determined on the basis of the average values of all 10 "A-numbers" and all 10 "B-numbers" (including your own choices). The average of all "A-numbers" is computed as the sum of all ten "A-numbers" chosen by the participants in your group in this period and divided by 10. The average of all "B-numbers" is determined as the sum of all ten "B-numbers" chosen by participants in your group in this period and divided by 10. The corresponding target values, A* and B*, are computed as follows: A* = 30 + (2/3) × Average of all "A-numbers" B* = 75 − (1/2) × Average of all "A-numbers" − (1/2) × Average of all "B-numbers".
Note that the target value, A*, depends on the choices made by all participants in your group (including yourself) of "A-numbers", whereas the target value B* depends on the choices of all participants in your group (including yourself) of both "A-numbers" and "B-numbers".
Here is an example: For simplicity suppose that there are only 3 participants in a group and in some period they submit the following "A-numbers": All results are rounded to 2 decimals. Please, note that this example is for illustration purposes only. The actual group size in the experiment is 10 participants.
About your task: The experiment lasts for 15 periods. Each period your only task is to choose two numbers, an "A-number" and a "B-number". Your goal is to choose these numbers to be as close as possible (in absolute value) to the target values A* and B* which will be determined in each period, using the procedure explained above after all participants in your group have made their choices. Both numbers you choose should be between 0 and 100, inclusive. You may enter a real number with up to 2 decimals.
About your earnings: Your decision will determine how many points you receive each period. Your earnings will be based on the sum of all your points over the 15 periods, with 1 point = 1 US cent. In addition, you are guaranteed to receive $7 as a show-up payment. Your points in each of the 15 period are based on how close your "A-number" and "B-number" are to the target values A* and B* and are calculated (by the computer program) as follows: Your payoff in points each period = 500 5 + |your "A-number" − A*| + |your "B-number" − B*|) where | · | is an absolute value (deviation), e.g., |3 − 5| = 2, |5 − 1| = 4. Notice several things. First, if you submit the exact target values for A* and B*, you receive the maximum payoff of 100 points. Second, deviations from the target values, A* or B*, have an equal effect on your payoffs; the further you are away from either target value, the lower is your payoff in points. Third, the payoff of all 10 participants in your group is determined in a similar way. Finally, all 10 participants (including you) can earn the maximum of 100 points if all choose the exact target values for A* and B*. For your convenience we provide a table on page 4 showing how your payoff changes depending on the deviations of your A and B choices from the target values, A* and B*.
Information and record keeping: At the end of each period, you will see a screen that reports the results of the just completed period. Specifically, you will be informed of: • The "A-number" and "B-number" that you submitted for the period • The average of all "A-numbers" and the average of all "B-numbers" submitted by group members for the period • The computed target values A* and B* for the period • Your points earned for the period Please record this information on your record sheet for each period under the appropriate headings. When you are done recording this information, click on the OK button in the bottom right corner of your screen.
So long as the 15th period has not yet been played, we will move to the next period decision screen. On that screen you will have to type the "A-number" and "B-number" for the current period. Additionally, you will see a history table displaying for each prior period: • your chosen "A-number" and your chosen "B-number" • averages of all "A-numbers" and all "B-number" • computed target values A* and B* • your points earned

Points table
The table gives the number of points for a given discrepancy of "A-number" from the target value A* (the first column) and a given discrepancy of "B-number" from the target value B* (the first row). The figure below shows the relation between the number of points you score (vertical axis) and the combined discrepancy |your "A-number" − A*| + |your "B-number" − B*| of your chosen numbers and the targeted values (horizontal axis). Notice that the table presents only some possibilities for your point earnings (the table is not exhaustive) and that the number of points you earn decreases more slowly as your discrepancies from the two target values increase.

Additional information
• Before the experiment starts you will have to take a short quiz which is designed to check your understanding of the instructions. • At the end of the experiment you will be asked to answer a questionnaire before you are paid.
Your answers will be processed in nameless form only. Please fill in the correct information. • During the experiment any communication with other participants, whether verbal or written, is forbidden. The use of phones, tablets or any other gadgets is not allowed. Violation of the rules can result in removal from the experiment. • You may use the back side of your record sheet as scratch paper if you wish. Do not write your name on this, only write your ID number on the front side. • Please follow the instructions carefully at all the stages of the experiment. If you have any questions or encounter any problems during the experiment, please raise your hand and the experimenter will come to help you.
Please ask any question you have now! Table 7 shows the results of the classification of participants' choices during periods 2 to 8, as defined in Section 3. The first two columns report treatment, session and participants ID in the session. Next, for each period t from 1 to 8, we report the k that minimizes the distance d t (k) in Eq. (5). The row Average reports an average level k for the corresponding period for each treatment excluding 'E' from the computation. The column Type shows our type classification based on the data for periods 2-8. This classification is an integer reflecting the level k or the type is labeled 'E', 'L', 'M' for Equilibrium, Learning and Mixing types, respectively). If no such type could be identified, then 'n/c' is reported. The frequencies found in this classification are shown, for each treatment, in Fig. 4.

Appendix D. Classification of participant choices
In addition to the classification based on distance, we analyze the choices of every participant using a regression analysis. For each participant i, we estimated the following regression assuming the same coefficients for a and b-choices a i,t+1 = β i,0 a s t (0) + β i,1 a s t (1) + (1 − β i,0 − β i,1 ) a s t (2) + ε i,t,a , b i,t+1 = β i,0 b s t (0) + β i,1 b s t (1) + (1 − β i,0 − β i,1 ) b s t (2) + ε i,t,b . (D.1) Recall that a s t (k) and b s t (k) denote the level-k choices in period t ≥ 1 in session s, where individual i was present. The regression is estimated on the same periods 2-8 that we used to classify subjects, that is in (D.1) we use t = 1, . . . , 7, leading to 14 data points for each individual. Regression (D.1) describes the individual guesses as a combination of level-0, level-1 and level-2 choices. The regression coefficients, β i,0 and β i,1 , are shown in the corresponding columns of Table 7. Coefficients that are significantly different from zero at a 5% significance level are reported in boldface. We then test, for each participant, whether their choices can be assigned to one of three levels, 0, 1 or 2, and we report the results in the last column of Table 7, Supported models. We list there the restricted models, out of the general model (D.1), that cannot be rejected at a 5% significance level with the F-test. Integers '0', '1' and '2' denote level-0, level-1 and level-2 models, that is, the model (D.1) with restrictions for '0': β i,0 = 1 and β i,1 = 0, for '1': β i,1 = 1 and β i,0 = 0, and for '2': β i,0 = β i,1 = 0. In addition, 'M' denotes the mixed level 0-1 model, that is, the model where (D.1) is restricted with β i,0 + β i,1 = 1. This model includes both the level-0 and level-1 models. Of course, participant's behavior may be consistent with several of these models, which can also be seen from the last column of Table 7. Fig. 6 visualizes the results of this individual level regression analysis in the simplex for our four treatments. This scatter plot shows the estimated coefficients (β i,0 , β i,1 ) from regression (D.1), with one point corresponding to one participant. The shape of the marker reflects the treatment that each participant belongs to (see the legend for details). We observe that for all treatments, there is a substantial heterogeneity of participants in the estimated behavior. Another tendency that is common across treatments, is that many choices are located along the edge of the simplex with 0 weight on level 2 (i.e., far from the origin), showing that very few subjects have behavior consistent with level-2. In the converging, Sink and SaddleNeg treatments, more points are observed in the top-left corner of the simplex, those that correspond either to level 1 or to the mixed level 0-1 model. By contrast, in the Source treatment, many choices are close to the bottom-right corner with many level-0 choices. Overall, it can be seen that the regressionbased analysis is consistent with the classification based on the distances to the level-k choices illustrated in Fig. 3. Many participants are consistently type 0 or 1, but some participants are also mixing between these types.

Appendix E. Proof of Proposition 4.1
Let K be the largest level of thinking in population and let H(K) denote the operator that describes the dynamics of the CH model (the map from period t − 1 to period t variables)      highest level in the population, then we can complete the model, by weighting this best response with f 3 and assigning the remaining weight 1 − f 3 to the average play of levels 0, 1 and 2 agents. This leads to the operator This is exactly what the statement claims in this case, when level 3 is the highest level in population. Continuing in the same manner, the statement is verified for any K.