Automatic attribute construction for basketball modelling

Vračar, Petar; Štrumbelj, Erik; Kononenko, Igor

doi:10.1007/s10115-019-01361-2

Automatic attribute construction for basketball modelling

Regular Paper
Published: 13 April 2019

Volume 62, pages 541–570, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

354 Accesses
2 Citations
Explore all metrics

Abstract

We address the problem of automatic extraction of patterns in the sequence of events in basketball games and construction of statistical models for generating a plausible simulation of a match between two distinct teams. We present a method for automatic construction of an attribute space which requires very little expert knowledge. The attributes are defined as the ratio between the number of entries and exits from higher-level concepts that are identified as groups of similar in-game events. The similarity between events is determined by the similarity between probability distributions describing the preceding and the following events in the observed sequences of game progression. The methodology is general and is applicable to any sports game that can be modelled as a random walk through the state space. Experiments on basketball show that automatically generated attributes are as informative as those derived using expert knowledge. Furthermore, the obtained simulations are in line with empirical data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

A survey of methods for time series change point detection

Article 08 September 2016

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

Notes

The computation of silhouette coefficients is based on the maximal distance between items instead of average distance. The goal is a more conservative cut of the dendrogram that results in smaller and more cohesive clusters.
The data were obtained from https://stats.nba.com.
In sports, the two dummy events correspond to the start and the end of each period.
The value of the attribute in the last row of Table 2 is one minus the sum of the values of the attributes in the first and third rows of Table 2.
The impact of the event type ADREB will be represented in the selection, albeit indirectly, even though it will not be explicitly present in the numerator of any of the selected attributes.
At first glance, it seems that only Dur is required, but then we lose the ability to begin the simulation at an arbitrary starting point.
The rules of the sport determine which event types can follow each other. Picking the attribute PrevEvt as the root node leads to a natural division of event types into disjoint subsets, depending on what can follow.
As we have seen before, not all the sequence elements correspond to explicit feeds in the play-by-play data. In this particular example, the elements from category 0 represent immediate change in possession after made baskets.
Theoretically, there should be 24600 simulations, but games from the beginning of the season cannot be simulated because the teams’ skills cannot be estimated.
We excluded matches that went to overtime. We also excluded each team’s first home and away game because the teams’ skills have yet to be estimated.
The predicted score margin for a game is calculated as the average score margin in the generated simulations of that game.

References

Baghal T et al (2012) Are the “four factors” indicators of one factor? an application of structural equation modeling methodology to NBA data in prediction of winning percentage. J Quant Anal Sports 8(1):1–14
Google Scholar
Berri DJ (2008) A simple measure of worker productivity in the national basketball association. Bus Sport 3:1–40
Google Scholar
Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 75:1–3
Article Google Scholar
Cervone D, D’Amour A, Bornn L, Goldsberry K (2016) A multiresolution stochastic process model for predicting basketball possession outcomes. J Am Stat Assoc 111(514):585–599
Article MathSciNet Google Scholar
Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1
Google Scholar
Chang Y-H, Maheswaran R, Su J, Kwok S, Levy T, Wexler A, Squire K (2014) Quantifying shot quality in the nba. In: Proceedings of the 8th annual MIT sloan sports analytics conference. MIT, Boston
Chawla S, Estephan J, Gudmundsson J, Horton M (2017) Classification of passes in football matches using spatiotemporal data. ACM Trans Spat Algorithms Syst 3:6
Google Scholar
Cintia P, Giannotti F, Pappalardo L, Pedreschi D, Malvaldi M (2015) The harsh rule of the goals: Data-driven performance indicators for football teams. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), IEEE, 36678 pp. 1–10
Clemente FM, Martins FML, Mendes RS et al (2016) Social network analysis applied to team sports analysis. Springer, Berlin
Book Google Scholar
Elo A (1961) New USCF rating system. Chess life 16:160–161
Google Scholar
Epstein ES (1969) A scoring system for probability forecast of ranked categories. J Appl Meteorol 8:985–987
Article Google Scholar
Franks A, Miller A, Bornn L, Goldsberry K et al (2015) Characterizing the spatial structure of defensive skill in professional basketball. Annal Appl Stat 9:94–121
Article MathSciNet Google Scholar
Gabel A, Redner S et al (2012) Random walk picture of basketball scoring. J Quant Anal Sports 8(1):1–18
Google Scholar
Good IJ (1952) Rational decisions. J R Stat Soc Series B (Methodological), pp 107–114
MathSciNet Google Scholar
Gudmundsson J, Horton M (2017) Spatio-temporal analysis of team sports. ACM Comput Surv 50:22
Article Google Scholar
Hollinger J (2003) Pro Basketball Prospectus 2003–2004. Brassey’s, San Francisco
Google Scholar
Hvattum LM, Arntzen H (2010) Using ELO ratings for match result prediction in association football. Int J Forecast 26:460–470
Article Google Scholar
Kononenko I (1995) On biases in estimating multi-valued attributes. In: Ijcai. 95: 1034–1040
Kubatko J, Oliver D, Pelton K, Rosenbaum DT (2007) A starting point for analyzing basketball statistics. J Quant Anal Sports 3:1–22
MathSciNet Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Annal Math Stat 22:79–86
Article MathSciNet Google Scholar
Langville AN, Meyer CD (2012) Who’s# 1?: the science of rating and ranking. Princeton University Press, Princeton
Book Google Scholar
Le HM, Carr P, Yue Y, Lucey P (2017) Data-driven ghosting using deep imitation learning. In: 2017 MIT sloan sports analytics conference
Lucey P, Bialkowski A, Monfort M, Carr P, Matthews I (2014) Quality vs quantity: improved shot prediction in soccer using strategic features from spatiotemporal data. In: Proceedings of the 8th annual MIT sloan sports analytics conference. pp 1–9
Mehrasa N, Zhong Y, Tung F, Bornn L, Mori G (2018) Deep learning of player trajectory representations for team activity analysis. In: 2018 MIT sloan sports analytics conference
Oliver D (2004) Basketball on paper: rules and tools for performance analysis. Potomac Books Inc, Potomac
Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Štrumbelj E, Vračar P (2012) Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. Int J Forecast 28:532–542
Article Google Scholar
Teramoto M, Cross CL (2010) Relative importance of performance factors in winning NBA games in regular season versus playoffs. J Quant Anal Sports 6(3):1–17
MathSciNet Google Scholar
Vračar P, Štrumbelj E, Kononenko I (2016) Modeling basketball play-by-play data. Expert Syst Appl 44:58–66
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
Petar Vračar, Erik Štrumbelj & Igor Kononenko

Authors

Petar Vračar
View author publications
You can also search for this author in PubMed Google Scholar
Erik Štrumbelj
View author publications
You can also search for this author in PubMed Google Scholar
Igor Kononenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petar Vračar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vračar, P., Štrumbelj, E. & Kononenko, I. Automatic attribute construction for basketball modelling. Knowl Inf Syst 62, 541–570 (2020). https://doi.org/10.1007/s10115-019-01361-2

Download citation

Received: 12 May 2018
Revised: 25 March 2019
Accepted: 05 April 2019
Published: 13 April 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10115-019-01361-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic attribute construction for basketball modelling

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A survey of methods for time series change point detection

A survey of Bayesian Network structure learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic attribute construction for basketball modelling

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A survey of methods for time series change point detection

A survey of Bayesian Network structure learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation