Skip to main content
Log in

Automatic attribute construction for basketball modelling

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We address the problem of automatic extraction of patterns in the sequence of events in basketball games and construction of statistical models for generating a plausible simulation of a match between two distinct teams. We present a method for automatic construction of an attribute space which requires very little expert knowledge. The attributes are defined as the ratio between the number of entries and exits from higher-level concepts that are identified as groups of similar in-game events. The similarity between events is determined by the similarity between probability distributions describing the preceding and the following events in the observed sequences of game progression. The methodology is general and is applicable to any sports game that can be modelled as a random walk through the state space. Experiments on basketball show that automatically generated attributes are as informative as those derived using expert knowledge. Furthermore, the obtained simulations are in line with empirical data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The computation of silhouette coefficients is based on the maximal distance between items instead of average distance. The goal is a more conservative cut of the dendrogram that results in smaller and more cohesive clusters.

  2. The data were obtained from https://stats.nba.com.

  3. In sports, the two dummy events correspond to the start and the end of each period.

  4. The value of the attribute in the last row of Table 2 is one minus the sum of the values of the attributes in the first and third rows of Table 2.

  5. The impact of the event type ADREB will be represented in the selection, albeit indirectly, even though it will not be explicitly present in the numerator of any of the selected attributes.

  6. At first glance, it seems that only Dur is required, but then we lose the ability to begin the simulation at an arbitrary starting point.

  7. The rules of the sport determine which event types can follow each other. Picking the attribute PrevEvt as the root node leads to a natural division of event types into disjoint subsets, depending on what can follow.

  8. As we have seen before, not all the sequence elements correspond to explicit feeds in the play-by-play data. In this particular example, the elements from category 0 represent immediate change in possession after made baskets.

  9. Theoretically, there should be 24600 simulations, but games from the beginning of the season cannot be simulated because the teams’ skills cannot be estimated.

  10. We excluded matches that went to overtime. We also excluded each team’s first home and away game because the teams’ skills have yet to be estimated.

  11. The predicted score margin for a game is calculated as the average score margin in the generated simulations of that game.

References

  1. Baghal T et al (2012) Are the “four factors” indicators of one factor? an application of structural equation modeling methodology to NBA data in prediction of winning percentage. J Quant Anal Sports 8(1):1–14

    Google Scholar 

  2. Berri DJ (2008) A simple measure of worker productivity in the national basketball association. Bus Sport 3:1–40

    Google Scholar 

  3. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 75:1–3

    Article  Google Scholar 

  4. Cervone D, D’Amour A, Bornn L, Goldsberry K (2016) A multiresolution stochastic process model for predicting basketball possession outcomes. J Am Stat Assoc 111(514):585–599

    Article  MathSciNet  Google Scholar 

  5. Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1

    Google Scholar 

  6. Chang Y-H, Maheswaran R, Su J, Kwok S, Levy T, Wexler A, Squire K (2014) Quantifying shot quality in the nba. In: Proceedings of the 8th annual MIT sloan sports analytics conference. MIT, Boston

  7. Chawla S, Estephan J, Gudmundsson J, Horton M (2017) Classification of passes in football matches using spatiotemporal data. ACM Trans Spat Algorithms Syst 3:6

    Google Scholar 

  8. Cintia P, Giannotti F, Pappalardo L, Pedreschi D, Malvaldi M (2015) The harsh rule of the goals: Data-driven performance indicators for football teams. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), IEEE, 36678 pp. 1–10

  9. Clemente FM, Martins FML, Mendes RS et al (2016) Social network analysis applied to team sports analysis. Springer, Berlin

    Book  Google Scholar 

  10. Elo A (1961) New USCF rating system. Chess life 16:160–161

    Google Scholar 

  11. Epstein ES (1969) A scoring system for probability forecast of ranked categories. J Appl Meteorol 8:985–987

    Article  Google Scholar 

  12. Franks A, Miller A, Bornn L, Goldsberry K et al (2015) Characterizing the spatial structure of defensive skill in professional basketball. Annal Appl Stat 9:94–121

    Article  MathSciNet  Google Scholar 

  13. Gabel A, Redner S et al (2012) Random walk picture of basketball scoring. J Quant Anal Sports 8(1):1–18

    Google Scholar 

  14. Good IJ (1952) Rational decisions. J R Stat Soc Series B (Methodological), pp 107–114

    MathSciNet  Google Scholar 

  15. Gudmundsson J, Horton M (2017) Spatio-temporal analysis of team sports. ACM Comput Surv 50:22

    Article  Google Scholar 

  16. Hollinger J (2003) Pro Basketball Prospectus 2003–2004. Brassey’s, San Francisco

    Google Scholar 

  17. Hvattum LM, Arntzen H (2010) Using ELO ratings for match result prediction in association football. Int J Forecast 26:460–470

    Article  Google Scholar 

  18. Kononenko I (1995) On biases in estimating multi-valued attributes. In: Ijcai. 95: 1034–1040

  19. Kubatko J, Oliver D, Pelton K, Rosenbaum DT (2007) A starting point for analyzing basketball statistics. J Quant Anal Sports 3:1–22

    MathSciNet  Google Scholar 

  20. Kullback S, Leibler RA (1951) On information and sufficiency. Annal Math Stat 22:79–86

    Article  MathSciNet  Google Scholar 

  21. Langville AN, Meyer CD (2012) Who’s# 1?: the science of rating and ranking. Princeton University Press, Princeton

    Book  Google Scholar 

  22. Le HM, Carr P, Yue Y, Lucey P (2017) Data-driven ghosting using deep imitation learning. In: 2017 MIT sloan sports analytics conference

  23. Lucey P, Bialkowski A, Monfort M, Carr P, Matthews I (2014) Quality vs quantity: improved shot prediction in soccer using strategic features from spatiotemporal data. In: Proceedings of the 8th annual MIT sloan sports analytics conference. pp 1–9

  24. Mehrasa N, Zhong Y, Tung F, Bornn L, Mori G (2018) Deep learning of player trajectory representations for team activity analysis. In: 2018 MIT sloan sports analytics conference

  25. Oliver D (2004) Basketball on paper: rules and tools for performance analysis. Potomac Books Inc, Potomac

    Google Scholar 

  26. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  27. Štrumbelj E, Vračar P (2012) Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. Int J Forecast 28:532–542

    Article  Google Scholar 

  28. Teramoto M, Cross CL (2010) Relative importance of performance factors in winning NBA games in regular season versus playoffs. J Quant Anal Sports 6(3):1–17

    MathSciNet  Google Scholar 

  29. Vračar P, Štrumbelj E, Kononenko I (2016) Modeling basketball play-by-play data. Expert Syst Appl 44:58–66

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petar Vračar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vračar, P., Štrumbelj, E. & Kononenko, I. Automatic attribute construction for basketball modelling. Knowl Inf Syst 62, 541–570 (2020). https://doi.org/10.1007/s10115-019-01361-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01361-2

Keywords

Navigation