skip to main content
10.1145/3626252.3630813acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

Confidence vs Insight: Big and Rich Data in Computing Education Research

Published:07 March 2024Publication History

ABSTRACT

There are now many large datasets available for programming education research. They tend to be very large-scale, but often lack context or detailed participant information. This ''big data'' is in contrast to the ''rich data'' that has generally been collected from smaller, qualitative studies, with detailed context and participant information. Big data is often criticised for its lack of context, while rich data is often criticised for its small sample size which makes generalisable conclusions dubious. In this position paper we examine the constraints, advantages, and disadvantages of each type of data, and discuss how they can provide differing information on phenomena in programming education research. We argue that both types of data are useful and that we should value the potential findings of each, as well as encourage their combination in order to provide a complete picture of how people learn to program.

References

  1. Efthimia Aivaloglou and Felienne Hermans. 2016. How Kids Code and How We Know: An Exploratory Study on the Scratch Repository. In ICER 2016. ACM, 53--61. https://doi.org/10.1145/2960310.2960325Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Maria Ijaz Baig, Liyana Shuib, and Elaheh Yadegaridehkordi. 2020. Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education 17, 1 (2020), 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ryan S Baker, Taylor Martin, and LisaMRossi. 2016. Educational data mining and learning analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (2016), 379--396.Google ScholarGoogle ScholarCross RefCross Ref
  4. Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of the Evolution of Introductory Programming Education Research. In SIGCSE 2019. ACM, 338--344. https://doi.org/10.1145/3287324.3287432Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George EP Box and Norman R Draper. 1987. Empirical model-building and response surfaces. John Wiley & Sons.Google ScholarGoogle Scholar
  6. Neil C. C. Brown and Amjad Altadmri. 2017. Novice Java Programming Mistakes: Large-Scale Data vs. Educator Beliefs. ACM Trans. Comput. Educ. 17, 2, Article 7 (may 2017), 21 pages. https://doi.org/10.1145/2994154Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Neil C. C. Brown, Amjad Altadmri, Sue Sentance, and Michael Kölling. 2018. Blackbox, Five Years On: An Evaluation of a Large-Scale Programming Data Collection Project. In ICER 2018. ACM, 196--204. https://doi.org/10.1145/3230977. 3230991Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Neil C. C. Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A Large Scale Repository of Novice Programmers' Activity. In SIGCSE 2014. ACM, 223--228. https://doi.org/10.1145/2538862.2538924Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77--91.Google ScholarGoogle Scholar
  10. Kathryn Cunningham, Rahul Agrawal Bejarano, Mark Guzdial, and Barbara Ericson. 2020. "I'm Not a Computer": How Identity Informs Value and Expectancy During a Programming Activity. (2020). https://doi.org/10.22318/icls2020.705Google ScholarGoogle ScholarCross RefCross Ref
  11. Brian Danielak. 2022. How Code Takes Shape: Studying a Student's Program Evolution. Cognition and Instruction 40, 2 (2022), 266--303.Google ScholarGoogle ScholarCross RefCross Ref
  12. Brian A Danielak, Ayush Gupta, and Andrew Elby. 2014. Marginalized identities of sense-makers: Reframing engineering student retention. Journal of Engineering Education 103, 1 (2014), 8--44.Google ScholarGoogle ScholarCross RefCross Ref
  13. Sharon J. Derry, Roy D. Pea, Brigid Barron, Randi A. Engle, Frederick Erickson, Ricki Goldman, Rogers Hall, Timothy Koschmann, Jay L. Lemke, Miriam Gamoran Sherin, and Bruce L. Sherin. 2010. Conducting Video Research in the Learning Sciences: Guidance on Selection, Analysis, Technology, and Ethics. Journal of the Learning Sciences 19, 1 (2010), 3--53. https://doi.org/10.1080/10508400903452884Google ScholarGoogle ScholarCross RefCross Ref
  14. Catherine D'ignazio and Lauren F Klein. 2023. Data feminism. MIT press.Google ScholarGoogle Scholar
  15. Deborah A. Fields, Yasmin B. Kafai, and Michael T. Giang. 2017. Youth Computational Participation in the Wild: Understanding Experience and Equity in Participating and Programming in the Online Scratch Community. ACM Trans. Comput. Educ. 17, 3, Article 15 (Aug. 2017), 22 pages. https://doi.org/10.1145/3123815Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sally Fincher, Sebastian Dziallas, and Daniel Knox. 2019. Space, Place and Practice in Computing Education. In UKICER 2019. ACM, Article 11, 7 pages. https: //doi.org/10.1145/3351287.3351297Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sally Fincher, Raymond Lister, Tony Clear, Anthony Robins, Josh Tenenberg, and Marian Petre. 2005. Multi-Institutional, Multi-National Studies in CSEd Research: Some Design Considerations and Trade-Offs. In ICER 2005. ACM, 111-- 121. https://doi.org/10.1145/1089786.1089797Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sally Fincher and Josh Tenenberg. 2007. Warren's Question. In ICER 2007. ACM, 51--60. https://doi.org/10.1145/1288580.1288588Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christian Fischer, Zachary A. Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. 2020. Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (2020), 130--160. https://doi.org/10.3102/ 0091732X20903304Google ScholarGoogle ScholarCross RefCross Ref
  20. Janice D Gobert, Michael Sao Pedro, Juelaila Raziuddin, and Ryan S Baker. 2013. From log files to assessment metrics: Measuring students' science inquiry skills using educational data mining. Journal of the Learning Sciences 22, 4 (2013), 521--563.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jamie Gorson and Eleanor O'Rourke. 2020. Why do CS1 Students Think They're Bad at Programming? Investigating Self-efficacy and Self-assessments at Three Universities. In ICER 2020. 170--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lindsey Ann Gouws, Karen Bradshaw, and Peter Wentworth. 2013. Computational Thinking in Educational Activities: An Evaluation of the Educational Game Light-Bot. In ITiCSE 2013. ACM, 10--15. https://doi.org/10.1145/2462476.2466518Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Catherine Greene. 2019. Big Data and the Reference Class Problem. What Can We Legitimately Infer about Individuals? Computer Ethics-Philosophical enquiry (CEPE) proceedings 2019, 1 (2019), 7.Google ScholarGoogle Scholar
  24. Philip J. Guo. 2018. Non-Native English Speakers Learning Computer Programming: Barriers, Desires, and Design Opportunities. In CHI 2018. ACM, 1--14. https://doi.org/10.1145/3173574.3173970Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Paulina Haduong. 2019. "I like computers. I hate coding": a portrait of two teens'experiences. Information and Learning Sciences 120, 5/6 (2023/03/23 2019), 349--365. https://doi.org/10.1108/ILS-05--2018-0037Google ScholarGoogle ScholarCross RefCross Ref
  26. Orit Hazzan and Clifford A. Shaffer. 2015. Big Data in Computer Science Education Research. In SIGCSE 2015. ACM, 591--592. https://doi.org/10.1145/2676723. 2677318Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kieran Healy. 2017. Fuck nuance. Sociological Theory 35, 2 (2017), 118--127.Google ScholarGoogle ScholarCross RefCross Ref
  28. Benjamin Mako Hill and Andrés Monroy-Hernández. 2017. A longitudinal dataset of five years of public activity in the Scratch online community. Scientific Data 4, 1 (31 Jan 2017), 170002. https://doi.org/10.1038/sdata.2017.2Google ScholarGoogle ScholarCross RefCross Ref
  29. Peter E. J. Kemp, Billy Wong, and Miles G. Berry. 2019. Female Performance and Participation in Computer Science: A National Picture. ACM Trans. Comput. Educ. 20, 1, Article 4 (Nov. 2019), 28 pages. https://doi.org/10.1145/3366016Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rochelle King, Elizabeth F Churchill, and Caitlin Tan. 2017. Designing with data: Improving the user experience with A/B testing. " O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  31. Colleen M. Lewis. 2012. The Importance of Students' Attention to Program State: A Case Study of Debugging Behavior. In ICER 2012. ACM, 127--134. https: //doi.org/10.1145/2361276.2361301Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alex Lishinski, Aman Yadav, Jon Good, and Richard Enbody. 2016. Learning to program: Gender differences and interactive effects of students' motivation, goals, and self-efficacy on performance. In ICER 2016. ACM, 211--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jesús Moreno and Gregorio Robles. 2014. Automatic detection of bad programming habits in scratch: A preliminary study. In FIE 2014. IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  34. Laura K. Nelson. 2020. Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research 49, 1 (2020), 3--42. https://doi.org/ 10.1177/0049124117729703Google ScholarGoogle ScholarCross RefCross Ref
  35. Paul Ohm. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA l. Rev. 57 (2009), 1701.Google ScholarGoogle Scholar
  36. Alannah Oleson, Benjamin Xie, Jean Salac, Jayne Everson, F Megumi Kivuva, and Amy J Ko. 2022. A Decade of Demographics in Computing Education Research: A Critical Review of Trends in Collection, Reporting, and Use. In ICER 2022. 323--343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Erin Ottmar, David Landy, Erik Weitnauer, and Rob Goldstone. 2015. Graspable mathematics: Using perceptual learning technology to discover algebraic notation. In Integrating touch-enabled and mobile devices into contemporary mathematics education. IGI Global, 24--48.Google ScholarGoogle Scholar
  38. Chris Palaguachi, E Cox, and C D'Angelo. 2022. Audio Analysis of Teacher Interactions with Small Groups in Classrooms. In General Proceedings of the 15th International Conference on Computer-Supported Collaborative Learning 2022.Google ScholarGoogle Scholar
  39. Thomas Price, Rui Zhi, and Tiffany Barnes. 2017. Evaluation of a Data-Driven Feedback Algorithm for Open-Ended Programming. International Educational Data Mining Society (2017).Google ScholarGoogle Scholar
  40. David Pritchard. 2015. Frequency Distribution of Error Messages. In PLATEAU 2015. ACM, 1--8. https://doi.org/10.1145/2846680.2846681Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mehran Sahami, Jace Kohlmeier, Peter Norvig, Andreas Paepcke, and Amin Saberi. 2014. Panel: Online Learning Platforms and Data Science. In L@S 2014. ACM, 137--138. https://doi.org/10.1145/2556325.2579110Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Carsten Schulte. 2013. Reflections on the Role of Programming in Primary and Secondary Computing Education. In WiPSCE 2013. ACM, 17--24. https: //doi.org/10.1145/2532748.2532754Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Elaine Seymour and Nancy M Hewitt. 1997. Talking about leaving: why undergraduates leave the sciences. Westview Press, Boulder, CO.Google ScholarGoogle Scholar
  44. Bruce L. Sherin and Clark A. Chinn. 2022. Microgenetic Methods (3 ed.). Cambridge University Press, 217--237. https://doi.org/10.1017/9781108888295.014Google ScholarGoogle ScholarCross RefCross Ref
  45. RajShrestha Shrestha, Juho Leinonen, Albina Zavgorodniaia, Arto Hellas, and John Edwards. 2022. Pausing While Programming: Insights from Keystroke Analysis. In ICSE-SEET 2022. Association for Computing Machinery, New York, NY, USA, 187--198. https://doi.org/10.1145/3510456.3514146Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Mario Luis Small. 2009. "How many cases do I need?': On science and the logic of case selection in field-based research. Ethnography 10, 1 (2009), 5--38. https://doi.org/10.1177/1466138108099586Google ScholarGoogle ScholarCross RefCross Ref
  47. Brett Smith. 2018. Generalizability in qualitative research: misunderstandings, opportunities and recommendations for the sport and exercise sciences. Qualitative Research in Sport, Exercise and Health 10, 1 (2018), 137--149. https://doi.org/10.1080/2159676X.2017.1393221 arXiv:https://doi.org/10.1080/2159676X.2017.1393221Google ScholarGoogle ScholarCross RefCross Ref
  48. Joanna Smith and Helen Noble. 2014. Bias in research. Evidence-based nursing 17, 4 (2014), 100--101.Google ScholarGoogle Scholar
  49. Richard E Snow. 1991. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of consulting and clinical psychology 59, 2 (1991), 205.Google ScholarGoogle ScholarCross RefCross Ref
  50. Josh Tenenberg. 2019. Qualitative Methods for Computing Education. Cambridge University Press, 173--207. https://doi.org/10.1017/9781108654555.008Google ScholarGoogle ScholarCross RefCross Ref
  51. Sara Vogel. 2020. Translanguaging About, With, and Through Code and Computing: Emergent Bi/Multilingual Middle Schoolers Forging Computational Literacies. Ph.D. Dissertation.Google ScholarGoogle Scholar
  52. Qianxiang Wang, Wenxin Li, and Tao Xie. 2014. Educational Programming Systems for Learning at Scale. In L@S 2014. ACM, 177--178. https://doi.org/10. 1145/2556325.2567868Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wengran Wang, Yudong Rao, Archit Kwatra, Alexandra Milliken, Yihuan Dong, Neeloy Gomes, Sarah Martin, Veronica Catété, Amy Isvik, Tiffany Barnes, Chris Martens, and Thomas Price. 2023. A Case Study on When and How Novices Use Code Examples in Open-Ended Programming. In ITiCSE 2023. ACM, 82--88. https://doi.org/10.1145/3587102.3588774Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. David Weintrop, Heather Killen, Talal Munzar, and Baker Franke. 2019. Block- Based Comprehension: Exploring and Explaining Student Outcomes from a Read-Only Block-Based Exam. In SIGCSE 2019. ACM, 1218--1224. https://doi. org/10.1145/3287324.3287348Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Svetlana Yarosh and Mark Guzdial. 2007. Narrating data structures: the role of context in CS2. In ICER 2007. ACM, New York, NY, USA, 87--98. https://doi.org/ 10.1145/1288580.1288592Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rui Zhi, ThomasWPrice, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop at EDM.Google ScholarGoogle Scholar

Index Terms

  1. Confidence vs Insight: Big and Rich Data in Computing Education Research

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1
      March 2024
      1583 pages
      ISBN:9798400704239
      DOI:10.1145/3626252

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 March 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,595of4,542submissions,35%

      Upcoming Conference

      SIGCSE Virtual 2024
      SIGCSE Virtual 2024: ACM Virtual Global Computing Education Conference
      November 30 - December 1, 2024
      Virtual Event , USA
    • Article Metrics

      • Downloads (Last 12 months)64
      • Downloads (Last 6 weeks)51

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader