Probabilistic prediction and context tree identification in the Goalkeeper game

In this article we address two related issues on the learning of probabilistic sequences of events. First, which features make the sequence of events generated by a stochastic chain more difficult to predict. Second, how to model the procedures employed by different learners to identify the structure of sequences of events. Playing the role of a goalkeeper in a video game, participants were told to predict step by step the successive directions—left, center or right—to which the penalty kicker would send the ball. The sequence of kicks was driven by a stochastic chain with memory of variable length. Results showed that at least three features play a role in the first issue: (1) the shape of the context tree summarizing the dependencies between present and past directions; (2) the entropy of the stochastic chain used to generate the sequences of events; (3) the existence or not of a deterministic periodic sequence underlying the sequences of events. Moreover, evidence suggests that best learners rely less on their own past choices to identify the structure of the sequences of events.


Introduction
The aim of this work is to model the performance of a player trying to guess successive choices displayed by an electronic video game called the Goalkeeper Game (https://game.numec.prp.usp.br/).In this game, the participant, playing the role of a goalkeeper, has to guess at each trial the next direction to where the penalty kicker will send the ball.An animation feedback then shows to which direction the ball was actually sent.The sequence of kicks is selected by a stochastic chain with memory of variable length.
Stochastic chains with memory of variable length were introduced by Rissanen (1983) 1 as a universal model for data compression.Rissanen observed that very often in experimental datasets composed by sequences of symbols, each new symbol appears to be randomly selected by taking into account a sequence of past units whose length is variable and changes as a function of the sequence of past units itself.Rissanen called a context the smallest sequence of past symbols required to generate the next one.The set of contexts can be represented by a rooted and labeled oriented tree, henceforth called context tree.The procedure to generate the sequence of symbols is defined by the context tree and an associated family of transition probabilities used to choose each next symbol, given the context associated to the sequence of past symbols at each time step.From now on, stochastic chains with memory of variable length will be called context tree models.Under suitable continuity conditions, stationary stochastic chains can be well approximated by a context tree model 2 .For that reason, they have been largely used to model biological and linguistic phenomena [3][4][5][6][7][8][9][10][11] .In the experimental protocol considered here the sequences of directions chosen by the kicker have been generated by context tree models.
In the Goalkeeper game, playing the role of the goalkeeper, the volunteer was instructed to stop the penalty kicks.Obviously, the intrinsic randomness of the algorithm used by the kicker to choose the directions makes it impossible to stop all the penalty kicks.However, the full identification of the context tree and the associated family of probability distributions used by the kicker is an important asset to increase the goalkeeper's success rate.Moreover, adopting a good strategy to face the randomness of the kicker choices might maximize the goalkeeper success rate.
Actually, two strategies have been proposed to address the problem of making correct guesses in sequences produced by stochastic chains.The kernel of the problem is to identify the structure of the chain in spite of the intrinsic randomness of its realization (see for instance, Schulzea et al. 12 and Koehler et al. 13 ).The first strategy, known as maximizing strategy, corresponds to always choosing the outcome with the higher probability.In the second strategy, called matching strategy, the participant tries to emulate the selection procedure used to generate the sequences of events.
In our experimental protocol an extra difficulty appears, namely the fact that the probability distributions used by the kicker depend on the successive contexts occurring in the sequence of his previous choices.This means that the goalkeeper must deal simultaneously with the problem of identifying the contexts and its associated transition probabilities as well as the problem of choosing a strategy.A double problem of this type was already considered by Wang et al. 14 .
In this article we address two related issues.First, which features of the stochastic chain generating the sequences of events make it more difficult to predict.Second, how to model the procedures employed by different learners to identify the structure of sequences of events.This is done through a rigorous statistical procedure to identify both the context tree and the strategy used by the goalkeeper to make his guesses.We collected data from 122 participants, each one playing the role of the goalkeeper against a kicker that used one out of four different context tree models.By analyzing their sequences of responses, we investigate whether they correctly identify the context tree model used by the kicker and which strategy they use to face the randomness of the kicker's choices.

Results
The aim of the experiment was to model the performance of a player trying to guess successive symbols displayed by an electronic video game called the Goalkeeper Game (https://game.numec.prp.usp.br/demo).Playing the role of a goalkeeper, the participant was told to guess one of the three directions to where the penalty kicker could send the ball: left, center, or right, hereafter represented by the numbers 0, 1, and 2, respectively.An animation feedback showed in which direction the ball was effectively sent (Figure 1A).
The sequences of shot directions were generated by four different context tree models (Figure 1B).Context tree models are characterized by two elements.The first element is a context tree and the second element is a family of transition probabilities indexed by the leaves of the context tree.In our experimental protocol, the four context tree models characterizing the sequences of the kicker's choices will be denoted by . The upper index k in the above notation stands for kicker.These four context tree models are represented in Figure 1B.Sequences generated by using each of these context tree models are depicted in Figure 1C.
For a fixed context tree and two different associated families of transition probabilities, we conjecture that the context tree model with higher entropy would be more difficult to learn.For the first pair (Figure 1B, left panel), changes in the transition probabilities associated to the contexts 01 and 21 increased the entropy values from 0.65 in (τ k 1 , p k 1 ) to 0.81 in (τ k 2 , p k 2 ).We also conjectured that for a fixed context tree and two different associated families of transition probabilities, the one that displays a periodic structure would be easier to learn.For the second pair (Figure 1B, right panel), sequences generated by the context tree model (τ k 3 , p k 3 ) can be described as a concatenation of strings 211 in which the symbol 1 is replaced by the symbol 0 with a small probability in an i.i.d way.For the context tree model (τ k 4 , p k 4 ), the interchange of transition probabilities associated to 01 and 21 as well as of the most probable outcome of context 2 disrupts the periodic structure displayed in the context tree model (τ k 3 , p k 3 ) without changing the entropy values (0.54 for the context tree model (τ k 3 , p k 3 ) and 0.56 for the context tree model (τ k 4 , p k 4 )).Finally, comparing the performance obtained with the context tree models (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) with the context tree models (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ) might give an indication that augmenting the number of contexts increases the learning difficulty.
A total of 122 participants was divided into four groups of 30, 31, 31 and 30, respectively.Each context tree model in Figure 1B was played by a different group of participants (see section Methods).For each participant a sample was constituted by collecting an ordered sequence of 1000 pairs in which the first element at each pair indicates the choice of the kicker at that step and the second element corresponds to that of the goalkeeper.

Time evolution of the performance per context tree model
Figure 2A shows the cumulative proportion of correct predictions across trials per participant for the four context tree models.An exploratory analysis of the cumulative proportion of correct predictions for models (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) reveals that the participants tend to lie mostly between the matching and the maximizing strategy scores as the number of trials increases.This is not the case for models (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ).A sliding window approach was employed to further explore the temporal evolution of the participants performance for each context tree model.Boxplots (Figure 2B) depict the distributions of the proportions of correct predictions across participants for each time window and each context tree model.For (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) the interquartile range is almost above the matching strategy score from the third time window on.
For (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) the median of the proportion of correct predictions across participants is above the theoretical matching strategy score from the third time window on.Also, the interquartile range of proportion of correct predictions for

2/15
One trial in the game choice feedback ready to make a decision (τ k 2 , p k 2 ) is larger than for (τ k 1 , p k 1 ), suggesting a higher performance variability.For (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ) the median of the proportion of correct predictions across participants is smaller than the theoretical matching strategy score in all time windows.In (τ k 3 , p k 3 ), the third quartile of the boxplot almost reaches the theoretical matching strategy score from the fourth time window on.Results are even worse for (τ k 4 , p k 4 ) as the third quartile is always clearly below the theoretical matching strategy score.
Finally, there is a much greater variability in the distribution of proportions of correct predictions across participants in (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ), as compared with (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ).Curiously, for (τ k 1 , p k 1 ) there are some outliers in time window 6, suggesting that the performance of some participants deteriorated towards the end of the task.

Identifying the goalkeeper strategy
To identify the strategy to which a given participant was closer to, we estimated, for each context tree model, a probability density of the proportion of correct predictions for the matching and the maximizing strategies.This was done by comparing, for each participant and each window of analysis, the likelihood that the participant's proportion of correct guesses was generated by one of the two distributions (matching vs. maximizing).See Figure 3A.Two samples of proportions of correct predictions corresponding to a goalkeeper using the matching and the maximizing strategies were simulated.This was done by generating 10000 kicker sequences of size 250 (the size of each windows of analysis) and the corresponding response sequences.Then a kernel density estimator was used to obtain a probability density estimate for each strategy.
Figure 3B depicts the proportion of participants per window of analysis that employed undermatching, matching, and maximizing strategies per context tree model.For (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) the great majority of participants laid either at the matching or the maximizing strategies in all time windows.Interestingly for (τ k 2 , p k 2 ) the proportion of participants employing the matching considerably reduced in favor of the maximizing strategy across time.For (τ k 3 , p k 3 ) most participants started by employing an undermatching strategy which was succeeded progressively by a matching strategy.Finally for (τ k 4 , p k 4 ) the undermatching strategy prevailed across time.Almost no participant achieved the maximizing strategy for (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ). Figure 3. A) A probability density of the proportion of correct predictions for matching and maximizing strategies was estimated using a kernel density estimator on simulated data.For each participant and each window of analysis, the likelihood that the participant's proportion of correct guesses was generated by one of the two estimated distributions (matching, in red vs. maximizing, in blue) is considered to decide to which strategy the participant is close to B) Proportion of participants per window of analysis that undermatched (left) matched (center) and maximized (right) per context tree model.

ANOVA of participants' performance across time windows
To access the differences in performance between the context tree models across time windows, a statistical analysis was done using a two-way mixed ANOVA.The intrinsic randomness of each of the context tree models used to guide the choice of the kicker implies that the optimal performance associated to the maximizing strategy differs from one model to another (see top dashed lines in Figure 2A).Therefore, for statistical analysis, the proportions of correct predictions obtained per participant and per time window were normalized using the theoretical maximizing strategy score of the corresponding context tree model.These normalized proportions of correct guesses were transformed using a logit transformation (see Supplementary Figure S1).
To eliminate data outliers, an univariate linear regression model was fitted to each participant's normalized proportions of correct guesses (in logit scale) as a function of the time window.Participants displaying a negative slope in the estimated regression line were excluded from the subsequent analysis (see Supplementary Figure S2).As a consequence, the final number of participants per context tree models used in the analysis are 24, 24, 27 and 26, respectively.
The two-way mixed ANOVA analysis of the participants' normalized proportions of correct guesses (in logit scale) considers the context tree model as a between subject factor and the time window as a within subject factor.In our case, the levels of the between subject factor were (τ k 1 , p k 1 ), (τ k 2 , p k 2 ), (τ k 3 , p k 3 ), (τ k 4 , p k 4 ) and the levels of the within subject factor are 1, 2, 3, 4, 5, 6.A significant interaction between the time window and the context tree model, F(265.67,8.22) = 3.04, p = 0.003, indicated that the performance evolved differently across the four context tree models.
Figure 4 shows the graph of interactions of the two-way mixed ANOVA analysis.The differences between the means at consecutive time windows per context tree model were tested to access the performance evolution for that context tree model.

5/15
A comparison of the means of the context tree models was performed per time window.To globally control the level of significance of the test with multiple comparisons, the Benjamini & Hochberg correction was used 15 .For model (τ k 1 , p k 1 ), the participants' performance strongly improved from the first to the second time window and then stabilized, with no more significant improvement (see Figure 4 and Supplementary Table S1 for exact p-values).Conversely, for model (τ k 2 , p k 2 ), significant differences appeared up to the fourth time window, then the performance stabilized and presented a significant improvement in the step to the last time window (see Figure 4 and Supplementary Table S1 for exact p-values).Besides, comparison of (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ) performance per time window revealed that the only significant difference occurs at the second time window.Changing the transitions associated to contexts 01 and 21 from deterministic in (τ k 1 , p k 1 ) to random in (τ k 2 , p k 2 ) increases the entropy of the corresponding stochastic chains from 0.65 to 0.81.As a consequence, the participants needed more time to learn the structure of the chain.
For model (τ k 3 , p k 3 ), the performance of the participants improved significantly up to the fifth time window.For (τ k 4 , p k 4 ), significant differences were detected only up to the third time window.Besides, (τ k 3 , p k 3 ) significantly differed from (τ k 4 , p k 4 ) in almost all time windows (see Figure 4 and Supplementary Table S1 for exact p-values).These results suggest that changes made to model (τ k 3 , p k 3 ) to obtain model (τ k 4 , p k 4 ) imposed a significant learning difficulty to model (τ k 4 , p k 4 ) in comparison to model (τ k 3 , p k 3 ).Significant differences in performance also appeared between (τ k 2 , p k 2 ) and (τ k 3 , p k 3 ) for all time windows.Thus, differences in performance can be assumed to occur between {(τ k 1 , p k 1 ), (τ k 2 , p k 2 )} and {(τ k 3 , p k 3 ), (τ k 4 , p k 4 )}.

Does the goalkeeper identify the context tree used by the kicker?
To retrieve the structure of the context tree governing the goalkeeper choices from the collected data, we introduce a statistical model selection procedure (see Methods, section Statistical model selection procedure), performed separately for each participant data and each time window.Using this statistical procedure, we retrieved the context tree and the associated family of
For each context tree model (τ k i , p k i ) and each time window j, we end up with a set of trees { τv, j i , v ∈ V i }, where V i is the subset of participants that played against the kicker using the context tree model (τ k i , p k i ).The mode context tree 11 of this set of trees is computed to summarize the result of the set of participants.
Figure 5 presents the mode context tree computed per time window for each context tree model.This is highlighted in a tree structure that contains all possible past strings up to length 4 that can be identified as a context.
It can be verified that the mode context tree matches that of the kicker's context tree as early as in the first time window for models (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ).Nevertheless, a greater consensus around the contexts used by the kicker is observed in (τ k 1 , p k 1 ) than in (τ k 2 , p k 2 ) for all time windows.This suggests that context tree model (τ k 2 , p k 2 ) is more difficult to learn than context tree model (τ k 1 , p k 1 ).For models (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ), the mode context tree matches that of the kicker's context tree in the third and fourth time window, respectively.The fact that a higher number of participants misidentified the kicker's contexts indicates that these models are more difficult to learn.To identify these models we used the responses of each goalkeeper and the kicker choices within a sliding window of length 250 pacing at 150 trials.The nodes at each tree structure represent the strings that different goalkeepers identified as a context.Each node is colored from light pink to dark red according to the proportion of participants identifying the node as a context.Thick lines highlight the mode context tree.The leaves of the mode context tree are the strings that were identified as contexts more often across participants.A) For each context tree and each participant, a sample consisted in an ordered sequence of 1000 pairs of events, each pair corresponding to the successive directions chosen by the kicker and the corresponding guesses of the goalkeeper.B) At each step, we use the string of past directions chosen by the kicker and the successive prediction made by the goalkeeper to estimate a transition probability.B1, B2) To retrieve the context tree used by the goalkeeper, we prune the tree of candidate contexts.Starting from the leaves, we prune the tree branches using the BIC criterion.B3) The penalty constant in the BIC is chosen so as to minimize the proportion of prediction errors 3 .C) For each time window, the mode context tree was estimated from the retrieved set of context trees.

2Figure 1 .
Figure 1.(A) Acting as a goalkeeper, the participant must guess, at each step, to where the next penalty kick will be shot by pressing the corresponding keyboard arrow.The options are left, center or right, represented by the symbols 0, 1, and 2, respectively.An animation feedback shows to which direction the ball was effectively sent.(B) Context tree models governing the kicker's choices and their corresponding entropy values.(C) Examples of sequences selected by the kicker using each one of the four context tree models.(D) Graph representation of the context tree models governing the kicker's choices.

Figure 2 .
Figure 2. (A) Time evolution from trial 100 to trial 1000 of the cumulative proportion of correct guesses for each context tree model.(B) Boxplots of proportions of correct guesses across participants in a sliding window of length 250 pacing at 150 trials for each context tree model.The proportions of correct guesses that could be achieved by a goalkeeper using the matching (bottom line in (A) and black square marker in (B)) and the maximizing (top line in (A) and black circle in (B)) strategies are indicated.

Figure 4 .
Figure 4. Interaction graph corresponding to the two-way mixed ANOVA analysis using the logit transformation of the normalized proportions of correct predictions as dependent variable and the context tree and time window as factors.Marginal means and 95% confidence intervals of the means are represented with dots and bars, respectively.For each context tree model, the significance level of the difference between successive time windows is indicated using the following convention: * * * for a p-value in the interval [0, 0.0001), * * for a p-value in the interval [0.0001, 0.01), * for a p-value in the interval [0.01, 0.05), • for a p-value in the interval [0.05, 0.1), null for a p-value in the interval [0.1, 1].The same convention is used to indicate the significant level of the difference between the means of (τ k 1 , p k 1 ) and (τ k 2 , p k 2 ), (τ k 2 , p k 2 ) and (τ k 3 , p k 3 ), and (τ k 3 , p k 3 ) and (τ k 4 , p k 4 ), for each time window.

Figure 5 .
Figure5.The context trees modeling the goalkeepers choices are summarized for each of the four context tree models.To identify these models we used the responses of each goalkeeper and the kicker choices within a sliding window of length 250 pacing at 150 trials.The nodes at each tree structure represent the strings that different goalkeepers identified as a context.Each node is colored from light pink to dark red according to the proportion of participants identifying the node as a context.Thick lines highlight the mode context tree.The leaves of the mode context tree are the strings that were identified as contexts more often across participants.

FPEFigure 7 .
Figure7.A) For each context tree and each participant, a sample consisted in an ordered sequence of 1000 pairs of events, each pair corresponding to the successive directions chosen by the kicker and the corresponding guesses of the goalkeeper.B) At each step, we use the string of past directions chosen by the kicker and the successive prediction made by the goalkeeper to estimate a transition probability.B1, B2) To retrieve the context tree used by the goalkeeper, we prune the tree of candidate contexts.Starting from the leaves, we prune the tree branches using the BIC criterion.B3) The penalty constant in the BIC is chosen so as to minimize the proportion of prediction errors 3 .C) For each time window, the mode context tree was estimated from the retrieved set of context trees.