Intelligent systems for analyzing soccer games : The weighted centroid Sistemas Inteligentes para el análisis de fútbol : centroide ponderado

New, intelligent systems have been developed recently to improve the quality of match analysis. These systems analyze the tactical behavior of the teams. However, the existing methods leave room for improvement. Thus, the main goal of this study is to refine the team centroid metric by considering all of the players on the team and the ball position. Furthermore, this study analyzes the relationship between the centroids of the two opposing teams. One 11-on-11 soccer match was analyzed to test the new centroid algorithm. The results provided strong evidence of the positive relation between the centroids of the two teams over time in the -axis ( = 0.781) and the -axis ( = 0.707). This study confirmed the results of previous studies that analyzed the relationship between team centroids. Furthermore, it was possible to prove the effectiveness of the new tactical metric and its relevance for adding information during a


Introduction 123456
Athletic performance consists of a complex series of interrelationships among a wide variety of performance variables (Borrie, Jonsson and Magnusson, 2002).Therefore, the structures and configurations of play should be considered as a whole rather than analyzed in an individual manner (Clemente, Couceiro, Martins, Dias and Mendes, 2012).Systems with many dynamically interacting elements can produce rich and varied patterns of behavior that are clearly different from the behavior of individual players.Following this line of thought, McGarry et al. (2002) proposed that the rich and varied patterns that arise in team sports are the result of selforganization among many coupled oscillators (e.g., players).Thus, for many team sports specific methods and metrics are required to analyze and evaluate the dynamic collective behavior, i.e., the tactical behavior (Clemente, 2012).
For the game of soccer, several match analysis techniques have been developed to assist coaches in decision making (Carling, Williams and Reilly, 2005).Notational analysis is the most common method for match analysis (Clemente, Couceiro, Martins and Mendes, 2012).However, this method does not make use of intrinsic knowledge of the procedures that lead to the results used in the analysis (Lees, 2002).Recently, intelligent systems have been developed to reach a deeper understanding of team behaviors, i.e., collective tactical behavior (Passos et al., 2011;Frencken et al., 2011).Through automatic tracking of player movements, it is possible to identify with reasonable accuracy the players' positions on the field.The position information can be analyzed to determine the collective behavior during a match.One of the most relevant and most used metrics is the centroid, which represents the team's geometrical center.

Related work: Team Centroid
The first application of the centroid method was presented by Frencken and Lemmink (2008) at the Fifth World Congress on Science and Football.This analysis method was applied to a 4-on-4 soccer game; only 9 offensive plays that resulted in shots on goal (excluding long shots) were recorded and analyzed.The centroid was defined as the average position of all of the players on a team (excluding the goalkeeper).From the centroid, three measures were derived: i) the x-distance (m), representing the longitudinal displacement; ii) the y-distance (m), representing the lateral displacement; and iii) the radial distance (m), comprising both the longitudinal and lateral displacements.For example, for an 11-man team the centroid defined by Frencken and Lemmink (2008) would be as shown in Figure 1.The results suggested an in-phase relationship between the centroids of the two teams; i.e., the motions of the two centroids were coupled.Frencken and Lemmink (2008) also noted that in 7 of 9 goal-scoring opportunities, the distance between the two centroids was nearly zero or the positions of the two centroids reversed; i.e., the attacking team's centroid was between the defending team's goal and its centroid.Nevertheless, these specific situations cannot be generalized.Considering the defensive tactical principles of concentration and unit, it is to be expected that most of the time the centroid of the defending team would be closer to that team's goal to prevent the attacking team from penetrating.Goal-scoring opportunities are generated by defensive imbalances and may not represent the majority of collective behavior, only an occasional situation.Furthermore, a 4-on-4 game may not represent the collective behavior in an 11-on-11 game.Yue et al. (2008) developed the concept of the geometrical centroid, representing the same analysis of centroid.Its formula was: Analyzing an 11-on-11 game, Yue et al. (2008) calculated centroids for 92 temporal series.The authors showed that the method can provide useful information for coaching and for predicting match outcomes.Lames et al. (2010) analyzed the final match of the 2006 FIFA World Cup between Italy and France.The centroid was calculated excluding 9 players and only considering the difference between the maximum and minimum positions of the players except the goalkeeper: This formulation is substantially different from that of Frencken and Lemmink (2008).At any instant, only two players determine the team's centroid (see Figure 2).There could be a case where nine players are at the maximum point and only one player is at the minimum point.Thus, this method may be misleading.Nevertheless, using 25 unspecified recordings, the authors presented results that corroborated the tendency of the team centroids to be coupled, as observed by Frencken and Lemmink (2008) in games with fewer players on a side.The in-phase relationship was momentarily absent when possession of the ball was lost or gained.
The centroid method was applied to a 5-on-5 basketball game by Bourbousson et al. (2010).The centroid was calculated as in Frencken and Lemmink (2008).Six sequences of play recorded during one professional basketball match and of sufficient duration to include intermittent changes in ball possession were analyzed.
The results confirmed those of previous studies (Frencken and Lemmink, 2008;Lames et al, 2010); i.e., an in-phase relationship between the team centroids was observed, except for changes in ball possession and acyclic events.In the longitudinal axis, Bourbousson et al. (2010) showed that the defensive team spent less time changing their positions.The authors found strong evidence of anti-phase behavior in the lateral axis due to contraction or expansion; an example is shown in Figure 3.Despite the different player positions in the lateral field axis (cf. Figure 3), the position of the centroid is the same in both cases.Thus, the anti-phase relationship may be due to transitions of ball possession followed by a return to the in-phase state rather than to expansion/contraction.Frencken et al. (2011) analyzed team centroids in 5-on-5 games.
Using the formulation used in Frencken and Lemmink (2008), Frencken et al. (2011) analyzed 19 open plays that resulted in goals being scored.Using the Pearson correlation test, the authors (Frencken et al., 2011) showed high and positive correlations between centroids in the longitudinal and lateral axes, suggesting that the two centroids tend to move in the same direction during the game, i.e., predominantly maintaining an in-phase relationship.Similar to the results of Bourbousson, Sève and McGarry (2010), it appeared that there was a higher correlation between centroids in the longitudinal axis, demonstrating their prevalence and association with offensive actions that resulted in scoring a goal (Frencken et al., 2011).
Calculating centroid positions in a manner similar to that of Frencken and Lemmink (2008), Duarte et al. (2012) analyzed 3 vs. 3 sub-phases of play in soccer games and confirmed that the predominant state was in-phase.As in two previous studies (Frencken et al., 2011;Bourbousson et al., 2010), the correlation between centroids was higher in the longitudinal axis.The results from the statistical analysis showed significantly superior centroid mean values at the moment of ball control by the passing player, compared to the moment of assisted pass or the ball crossing the defensive line.
In a study by Bartlett et al. (Bartlett, Button, Robins, Dutt-Mazumder and Kennedy, 2012) groups of open attacking plays: i) those leading to goals, ii) those leading to a kick or a header that did not score a goal; iii) those that resulted in an active loss of possession; and iv) other plays in which possession was lost passively.Using calculations of the average player positions (the authors did not specify whether the goalkeeper was included), the authors found that the centroids of the two teams were highly correlated in the longitudinal and lateral axes.The authors (Bartlett et al., 2012) suggested that when comparing groups of plays, the correlation between the centroids is higher in the plays that lead to goals or shots on goal than in those that result in loss of the ball, which was contrary to the authors' expectations.However, it may be speculated that plays with less instability and efficacy showed lower correlation values because of an imbalance, resulting in ineffective attacker actions.

Statement of Contribution
Previous studies presented general results for team centroids that showed a predominantly in-phase relationship.These studies typically analyzed the longitudinal and lateral axes, and the highest values of positive correlation between centroids arose in the longitudinal axis.
It has been widely suggested that in-phase relationships are broken only by particular events such as a loss of ball possession or defense-to-attack or attack-to-defense transitions.However, the analyses of previous studies mostly focused on the centroid, i.e., investigating only the synchronized behavior of teams.A systemic analysis is fundamental and should not be dismissed; however, the centroid method can and should be properly exploited, particularly for an online match analysis.For this purpose, some changes can and should be implemented.
Only three of the studies discussed (Yue et al., 2008;Lames et al., 2010;Bartlett et al., 2012) calculated the centroid of an 11-man soccer team.Furthermore, one of these three studies (Lames et al., 2010) does not satisfy the requirements of observation, as discussed previously.Additionally, the goalkeeper is typically excluded from the centroid calculation (Frencken et al., 2011), and the position of the ball and the influence of the players closest to the ball are ignored.
Thus, the main goal of this study is to revise the centroid methodology to include the positions of the goalkeeper and the ball.Furthermore, this study analyzes the relationship between the centroids of the teams and the movement of the centroids in relation to the state of ball possession.

Participants
The tactical metrics were evaluated in an 11-on-11 soccer game.
The analysis was performed during an official soccer match between two professional teams in the Portuguese premier league.

Material
The actions of both teams were captured using a digital camera (GoPro Hero with 1280 x 960 pixels resolution) with a frame rate of 30 frames per second.The camera was placed at an elevation above the field to capture the entire field.

Procedures
Play was captured using a digital camera (GoPro Hero with 1280 × 960 pixels resolution) with a frame rate of 30 frames per second.
The camera was placed 15 meters above the field and 10 meters from the touchline at mid-field to capture the entire field.The field dimensions were in 104×68 meters.The first step in collecting the data was to record the players' movements using the digital camera as previously described.Because the camera had a field of view of 180º, it was not necessary to move the camera, thus ensuring consistent reference points on the images.The field was calibrated using 19 markers positioned on the field lines.After recording the soccer match, the physical space was calibrated using a direct linear transformation (DLT), which measures the positions of the elements (i.e., the players and the ball) in pixels in the metric space (Abdel-Aziz and Karara, 1971).
Following calibration, the positions of the players were tracked, and the virtual coordinates were transformed into physical coordinates at each second, thus providing the Cartesian (x and y) positions of the players during the match.The entire process associated with this approach (i.e., detection and identification of player trajectories, spatial transformations, and computation of the metrics) was performed using the MATLAB (R2013) programming environment.The process of identifying the virtual positions of the players and the ball in each frame was performed manually.For a more detailed description of the process, see Couceiro, Clemente and Martins (2013).
For the sake of efficiency, only the time when the ball was in play was considered, and the periods when the ball was not on the field (i.e., out of bounds) were excluded from the analysis.Because the methodology proposed here has some computational complexity, each second corresponded to an analyzed instant for each player and the ball.

Calculation Procedures: Weighted Centroid
Although the goalkeeper's movements are more limited, they should not be excluded from the centroid calculation; i.e., if the ball is closest to the goalkeeper, he or she will be more relevant than any forward player.Thus, assigning weights to the players' positions in relation to the ball should be considered in the centroid computation (cf. Figure 4).2011), the centroids of the teams can provide three measures: i) the ‫-ݔ‬distance (݉) representing the lengthwise displacement (i.e., down-field); ii) the ‫-ݕ‬distance (݉) representing the lateral displacement (i.e., across the field); and iii) the radial distance (݉) comprising both the lengthwise and lateral displacements.These measures were obtained based on the centroid position relative to the origin ܱ, i.e., (0,0), which was defined at the center of the field. (3) The position of the ݅ ௧ player is defined as ‫ݔ(‬ , ‫ݕ‬ ).The relevance of each player to the team's centroid, i.e., the weight w i , could be based on the Euclidean distance from each player to the ball (Clemente et al., 2013), i.e., where ‫ݔ(‬ , ‫ݕ‬ ) corresponds to the position of the ball and d max is the Euclidean distance of the farthest player from the ball at each iteration (Clemente et al., 2013).Thus, closer players have higher weights than farther players.

Statistical Analysis
To compute the correlations for the tactical metrics and the teams, the Spearman test of positive and negative variables was used.The correlation tests were performed using the software SPSS version 19 (IBM Corp.) with a significance level of 5%.
A one-way ANOVA was performed to determine if there were statistically significant differences between a team's centroid with and without possession of the ball.The assumption of a normal distribution in the one-way ANOVA for the three practice conditions (i.e., conservative, neutral and risky) was investigated using the Kolmogorov-Smirnov test with the Lilliefors correction.It was found that the distributions were not normal in the dependent variable.The distributions were not normal because n = 110, but by the Central Limit Theorem (Maroco and Bispo, 2003;Pedrosa and Gama, 2004) we assumed a normal distribution (Akritas and Papadatos, 2004).The analysis of homogeneity was performed using the Levene test.It was found that there was no uniformity of practice under the previously mentioned conditions.However, despite the lack of homogeneity, the F-test (ANOVA) is robust to homogeneity violations when the number of observations in each group is equal or approximately equal (Vicent, 1999;Pestana and Gageiro, 2010;Maroco, 2010), as in our case.As with the assumption of normality, violating this assumption does not radically change the F-value.A classification of effect size (i.e., the measure of the proportion of the total variation in the dependent variable explained by the independent variable) was performed as in Maroco (2010) and Pallant (2011).This analysis was performed using SPSS with a significance level of 5%.

Results
Spearman's correlation test showed strong evidence of the positive relation between the wCentroids of the two teams over time in the y-axis ‫ݎ(‬ ௦ = 0.707).This relationship can be observed in Figure 5, where the oscillations are similar for both teams.Similarly, the centroids of the teams in the ‫-ݔ‬axis showed a very high positive correlation ‫ݎ(‬ ௦ = 0.781).This relationship can be observed in Figure 6.The wCentroid positions in the ‫-ݔ‬axis showed statistically significant differences, with a small difference between the moments with and without possession of the ball for Team A (F = 86.171;p-value = 0.001; ߟ ଶ = 0.052; Power = 1.000; small effect size) and Team B (F = 43.553;p-value ≤ 0.001; ߟ ଶ = 0.027; Power = 1.000; small effect size).In both cases, the results suggested that when not in possession of the ball, teams move closer to their defensive zone.
The wCentroid positions in the y-axis showed statistically significant differences with a very small difference between the moments with and without possession of the ball for team A (F = 11.545;pvalue = 0.001; ߟ ଶ = 0.007; Power = 0.994; very small effect size); no differences were found for Team B (F = 0.213; p-value = 0.809; ߟ ଶ = 0.001; Power = 0.083; very small effect size).

Discussion
First, it is important to emphasize the significance of the new method of calculating centroids proposed in this study.Including the position of the ball and all of the team members and their importance in relation to the ball in the calculation of the centroid Distance from the ball improves its usefulness.As suggested previously, if the ball is closer to the goalkeeper, his or her influence will be substantially higher than that of the other team members.The following example compares the three methods of calculating centroids (cf. Figure 7).In considering the figure, the watertight nature of the centroid of Lames et al. (2010), where only two players determine the team's centroid, should be noted.This definition is unlikely to fully capture the dynamics of the team.The metric with the highest scientific application (Frencken and Lemmink, 2008) does not include the ball position.Thus, this metric implies that the player farthest from the ball is equally as crucial as the one closest to the ball, thus distorting the influence of the player closest to the ball, i.e., his or her influence in the center of the game (Costa et al., 2010).
The proposed metric is more inclusive, considering all members of the team and the ball position, integrating all data as determinants of the centroid and its subsequent interpretation.
Considering Spearman's correlation test, it was possible to observe strong evidence of the positive relation between the centroids of the two teams over time.Additionally, it was possible to verify regular oscillations of the centroids between positive and negative values in the ‫-ݕ‬axis, a reflection of the attempts by the attacking team to unbalance the defense by moving the centroid away from the center of the field, which the defenders normally maintain to prevent the advance of their opponents.Evidence of the importance of an imbalance can be observed in the sequence that led to the goal by Team A (cf. Figure 6), where the team took possession of the ball on one side of the field, i.e., negative y values of the centroid, subsequently moved to the other side, i.e., positive y values of the centroid, and immediately shifted back to the other side again, thus unbalancing the defensive formation of the opponents.Hence, one can then observe that while on the offensive, lateral attacks are fundamental to overcoming the defense (Lucchesi, 2001).
Similar to the results in the ‫-ݕ‬axis, a high positive correlation was found between the centroids of the two teams in the ‫-ݔ‬axis (i.e., lengthwise), confirming the tendency toward an in-phase relation between the teams over time because they try to maintain a defensive balance to protect their goal (Frencken et al., 2011).It is important to note that the positive values in the graph indicate that Team B is on the defensive and the negative values indicate that Team A is on the defensive.Nevertheless, through Figure 7 it is possible to verify that Team A defends by maintaining a larger distance in relation to the opponents and, conversely, Team B allows a greater proximity of Team A.
In the case where Team A is defending, the ability to maintain a greater distance between team centroids may suggest a smaller dispersion of the Team A players or a higher dispersion of the Team B players.This would represent a playing style with fewer players involved on offense and consequently result in a higher dispersion in the ‫-ݔ‬axis.Thus, it is possible to conclude that the centroid metric should include an indicator of the dispersion of the players.
We next consider the ability to detect defensive imbalances.Whenever the centroid of the attacking team is very close to that of the defending team (the distance is nearly 0), there is a greater chance that the attacking team will score (Frencken and Lemmink, 2008).If this situation occurs frequently during a match, it should be detected and corrected; the players should be repositioned to ensure the in-phase relationship and adjust the distance between the centroids.
The proposed metrics offer more possibilities in the analysis of soccer because it is easy to adjust them with automatic tracking systems.These metrics can provide greater knowledge about tactical behavior and thus measure the accomplishment of playing principles (Costa et al., 2010).It is also possible to measure the distance between centroids and identify the oscillations with various styles of play.This information can be useful for identifying the characteristics of tactical behaviors and collective organization.
The relevance of such methods should be understood in the context of the principles of the game and not interpreted without considering the dynamics.

Conclusions
The main goal of this paper was to propose a modification of the centroid metric used in the analysis of soccer games.Including the positions of all team members and the position of the ball allows a greater understanding of team behaviors.Furthermore, this intelligent system improves match analysis, allowing new feedback and understanding during a soccer game.An analysis using the revised definition of the centroid revealed strong correlations between the team centroids in the lateral and longitudinal directions.Additionally, it was concluded that winning teams, when on the defensive, maintained a separation between their own centroid and that of the opposing team, which made the defense more effective.

Figure 2 .
Figure 2. Example of the centroid as defined by Lames, Ertmer and Walter (2010)

Figure 3 .
Figure 3. Example of lateral positioning in expanding and contracting actions

Figure 4 .
Figure 4. Proposed centroid calculation considering the players' positions in relation to the ballAccording toFrencken et al. (2011), the centroids of the teams can provide three measures: i) the ‫-ݔ‬distance (݉) representing the lengthwise displacement (i.e., down-field); ii) the ‫-ݕ‬distance (݉) representing the lateral displacement (i.e., across the field); and iii) the radial distance (݉) comprising both the lengthwise and lateral displacements.These measures were obtained based on the centroid position relative to the origin ܱ, i.e., (0,0), which was defined at the center of the field.

Figure 5 .
Figure 5. wCentroids of Teams A and B in the ‫-ݕ‬axis for a period of 100 seconds

Figure 6 .
Figure 6.wCentroids of Teams A and B in the ‫-ݔ‬axis for a period of 100 seconds