research-article

Confidence vs Insight: Big and Rich Data in Computing Education Research

Authors:
Neil C. C. Brown

King's College London, London, United Kingdom

King's College London, London, United Kingdom

0000-0001-6086-2479
View Profile

,
Mark Guzdial

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA

0000-0003-4427-9763
View Profile

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1March 2024Pages 158–164https://doi.org/10.1145/3626252.3630813

Published:07 March 2024Publication History

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Pages 158–164

ABSTRACT

There are now many large datasets available for programming education research. They tend to be very large-scale, but often lack context or detailed participant information. This ''big data'' is in contrast to the ''rich data'' that has generally been collected from smaller, qualitative studies, with detailed context and participant information. Big data is often criticised for its lack of context, while rich data is often criticised for its small sample size which makes generalisable conclusions dubious. In this position paper we examine the constraints, advantages, and disadvantages of each type of data, and discuss how they can provide differing information on phenomena in programming education research. We argue that both types of data are useful and that we should value the potential findings of each, as well as encourage their combination in order to provide a complete picture of how people learn to program.

References

Efthimia Aivaloglou and Felienne Hermans. 2016. How Kids Code and How We Know: An Exploratory Study on the Scratch Repository. In ICER 2016. ACM, 53--61. https://doi.org/10.1145/2960310.2960325Google ScholarDigital Library
Maria Ijaz Baig, Liyana Shuib, and Elaheh Yadegaridehkordi. 2020. Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education 17, 1 (2020), 1--23.Google ScholarCross Ref
Ryan S Baker, Taylor Martin, and LisaMRossi. 2016. Educational data mining and learning analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (2016), 379--396.Google ScholarCross Ref
Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of the Evolution of Introductory Programming Education Research. In SIGCSE 2019. ACM, 338--344. https://doi.org/10.1145/3287324.3287432Google ScholarDigital Library
George EP Box and Norman R Draper. 1987. Empirical model-building and response surfaces. John Wiley & Sons.Google Scholar
Neil C. C. Brown and Amjad Altadmri. 2017. Novice Java Programming Mistakes: Large-Scale Data vs. Educator Beliefs. ACM Trans. Comput. Educ. 17, 2, Article 7 (may 2017), 21 pages. https://doi.org/10.1145/2994154Google ScholarDigital Library
Neil C. C. Brown, Amjad Altadmri, Sue Sentance, and Michael Kölling. 2018. Blackbox, Five Years On: An Evaluation of a Large-Scale Programming Data Collection Project. In ICER 2018. ACM, 196--204. https://doi.org/10.1145/3230977. 3230991Google ScholarDigital Library
Neil C. C. Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A Large Scale Repository of Novice Programmers' Activity. In SIGCSE 2014. ACM, 223--228. https://doi.org/10.1145/2538862.2538924Google ScholarDigital Library
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77--91.Google Scholar
Kathryn Cunningham, Rahul Agrawal Bejarano, Mark Guzdial, and Barbara Ericson. 2020. "I'm Not a Computer": How Identity Informs Value and Expectancy During a Programming Activity. (2020). https://doi.org/10.22318/icls2020.705Google ScholarCross Ref
Brian Danielak. 2022. How Code Takes Shape: Studying a Student's Program Evolution. Cognition and Instruction 40, 2 (2022), 266--303.Google ScholarCross Ref
Brian A Danielak, Ayush Gupta, and Andrew Elby. 2014. Marginalized identities of sense-makers: Reframing engineering student retention. Journal of Engineering Education 103, 1 (2014), 8--44.Google ScholarCross Ref
Sharon J. Derry, Roy D. Pea, Brigid Barron, Randi A. Engle, Frederick Erickson, Ricki Goldman, Rogers Hall, Timothy Koschmann, Jay L. Lemke, Miriam Gamoran Sherin, and Bruce L. Sherin. 2010. Conducting Video Research in the Learning Sciences: Guidance on Selection, Analysis, Technology, and Ethics. Journal of the Learning Sciences 19, 1 (2010), 3--53. https://doi.org/10.1080/10508400903452884Google ScholarCross Ref
Catherine D'ignazio and Lauren F Klein. 2023. Data feminism. MIT press.Google Scholar
Deborah A. Fields, Yasmin B. Kafai, and Michael T. Giang. 2017. Youth Computational Participation in the Wild: Understanding Experience and Equity in Participating and Programming in the Online Scratch Community. ACM Trans. Comput. Educ. 17, 3, Article 15 (Aug. 2017), 22 pages. https://doi.org/10.1145/3123815Google ScholarDigital Library
Sally Fincher, Sebastian Dziallas, and Daniel Knox. 2019. Space, Place and Practice in Computing Education. In UKICER 2019. ACM, Article 11, 7 pages. https: //doi.org/10.1145/3351287.3351297Google ScholarDigital Library
Sally Fincher, Raymond Lister, Tony Clear, Anthony Robins, Josh Tenenberg, and Marian Petre. 2005. Multi-Institutional, Multi-National Studies in CSEd Research: Some Design Considerations and Trade-Offs. In ICER 2005. ACM, 111-- 121. https://doi.org/10.1145/1089786.1089797Google ScholarDigital Library
Sally Fincher and Josh Tenenberg. 2007. Warren's Question. In ICER 2007. ACM, 51--60. https://doi.org/10.1145/1288580.1288588Google ScholarDigital Library
Christian Fischer, Zachary A. Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. 2020. Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (2020), 130--160. https://doi.org/10.3102/ 0091732X20903304Google ScholarCross Ref
Janice D Gobert, Michael Sao Pedro, Juelaila Raziuddin, and Ryan S Baker. 2013. From log files to assessment metrics: Measuring students' science inquiry skills using educational data mining. Journal of the Learning Sciences 22, 4 (2013), 521--563.Google ScholarCross Ref
Jamie Gorson and Eleanor O'Rourke. 2020. Why do CS1 Students Think They're Bad at Programming? Investigating Self-efficacy and Self-assessments at Three Universities. In ICER 2020. 170--181.Google ScholarDigital Library
Lindsey Ann Gouws, Karen Bradshaw, and Peter Wentworth. 2013. Computational Thinking in Educational Activities: An Evaluation of the Educational Game Light-Bot. In ITiCSE 2013. ACM, 10--15. https://doi.org/10.1145/2462476.2466518Google ScholarDigital Library
Catherine Greene. 2019. Big Data and the Reference Class Problem. What Can We Legitimately Infer about Individuals? Computer Ethics-Philosophical enquiry (CEPE) proceedings 2019, 1 (2019), 7.Google Scholar
Philip J. Guo. 2018. Non-Native English Speakers Learning Computer Programming: Barriers, Desires, and Design Opportunities. In CHI 2018. ACM, 1--14. https://doi.org/10.1145/3173574.3173970Google ScholarDigital Library
Paulina Haduong. 2019. "I like computers. I hate coding": a portrait of two teens'experiences. Information and Learning Sciences 120, 5/6 (2023/03/23 2019), 349--365. https://doi.org/10.1108/ILS-05--2018-0037Google ScholarCross Ref
Orit Hazzan and Clifford A. Shaffer. 2015. Big Data in Computer Science Education Research. In SIGCSE 2015. ACM, 591--592. https://doi.org/10.1145/2676723. 2677318Google ScholarDigital Library
Kieran Healy. 2017. Fuck nuance. Sociological Theory 35, 2 (2017), 118--127.Google ScholarCross Ref
Benjamin Mako Hill and Andrés Monroy-Hernández. 2017. A longitudinal dataset of five years of public activity in the Scratch online community. Scientific Data 4, 1 (31 Jan 2017), 170002. https://doi.org/10.1038/sdata.2017.2Google ScholarCross Ref
Peter E. J. Kemp, Billy Wong, and Miles G. Berry. 2019. Female Performance and Participation in Computer Science: A National Picture. ACM Trans. Comput. Educ. 20, 1, Article 4 (Nov. 2019), 28 pages. https://doi.org/10.1145/3366016Google ScholarDigital Library
Rochelle King, Elizabeth F Churchill, and Caitlin Tan. 2017. Designing with data: Improving the user experience with A/B testing. " O'Reilly Media, Inc.".Google Scholar
Colleen M. Lewis. 2012. The Importance of Students' Attention to Program State: A Case Study of Debugging Behavior. In ICER 2012. ACM, 127--134. https: //doi.org/10.1145/2361276.2361301Google ScholarDigital Library
Alex Lishinski, Aman Yadav, Jon Good, and Richard Enbody. 2016. Learning to program: Gender differences and interactive effects of students' motivation, goals, and self-efficacy on performance. In ICER 2016. ACM, 211--220.Google ScholarDigital Library
Jesús Moreno and Gregorio Robles. 2014. Automatic detection of bad programming habits in scratch: A preliminary study. In FIE 2014. IEEE, 1--4.Google ScholarCross Ref
Laura K. Nelson. 2020. Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research 49, 1 (2020), 3--42. https://doi.org/ 10.1177/0049124117729703Google ScholarCross Ref
Paul Ohm. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA l. Rev. 57 (2009), 1701.Google Scholar
Alannah Oleson, Benjamin Xie, Jean Salac, Jayne Everson, F Megumi Kivuva, and Amy J Ko. 2022. A Decade of Demographics in Computing Education Research: A Critical Review of Trends in Collection, Reporting, and Use. In ICER 2022. 323--343.Google ScholarDigital Library
Erin Ottmar, David Landy, Erik Weitnauer, and Rob Goldstone. 2015. Graspable mathematics: Using perceptual learning technology to discover algebraic notation. In Integrating touch-enabled and mobile devices into contemporary mathematics education. IGI Global, 24--48.Google Scholar
Chris Palaguachi, E Cox, and C D'Angelo. 2022. Audio Analysis of Teacher Interactions with Small Groups in Classrooms. In General Proceedings of the 15th International Conference on Computer-Supported Collaborative Learning 2022.Google Scholar
Thomas Price, Rui Zhi, and Tiffany Barnes. 2017. Evaluation of a Data-Driven Feedback Algorithm for Open-Ended Programming. International Educational Data Mining Society (2017).Google Scholar
David Pritchard. 2015. Frequency Distribution of Error Messages. In PLATEAU 2015. ACM, 1--8. https://doi.org/10.1145/2846680.2846681Google ScholarDigital Library
Mehran Sahami, Jace Kohlmeier, Peter Norvig, Andreas Paepcke, and Amin Saberi. 2014. Panel: Online Learning Platforms and Data Science. In L@S 2014. ACM, 137--138. https://doi.org/10.1145/2556325.2579110Google ScholarDigital Library
Carsten Schulte. 2013. Reflections on the Role of Programming in Primary and Secondary Computing Education. In WiPSCE 2013. ACM, 17--24. https: //doi.org/10.1145/2532748.2532754Google ScholarDigital Library
Elaine Seymour and Nancy M Hewitt. 1997. Talking about leaving: why undergraduates leave the sciences. Westview Press, Boulder, CO.Google Scholar
Bruce L. Sherin and Clark A. Chinn. 2022. Microgenetic Methods (3 ed.). Cambridge University Press, 217--237. https://doi.org/10.1017/9781108888295.014Google ScholarCross Ref
RajShrestha Shrestha, Juho Leinonen, Albina Zavgorodniaia, Arto Hellas, and John Edwards. 2022. Pausing While Programming: Insights from Keystroke Analysis. In ICSE-SEET 2022. Association for Computing Machinery, New York, NY, USA, 187--198. https://doi.org/10.1145/3510456.3514146Google ScholarDigital Library
Mario Luis Small. 2009. "How many cases do I need?': On science and the logic of case selection in field-based research. Ethnography 10, 1 (2009), 5--38. https://doi.org/10.1177/1466138108099586Google ScholarCross Ref
Brett Smith. 2018. Generalizability in qualitative research: misunderstandings, opportunities and recommendations for the sport and exercise sciences. Qualitative Research in Sport, Exercise and Health 10, 1 (2018), 137--149. https://doi.org/10.1080/2159676X.2017.1393221 arXiv:https://doi.org/10.1080/2159676X.2017.1393221Google ScholarCross Ref
Joanna Smith and Helen Noble. 2014. Bias in research. Evidence-based nursing 17, 4 (2014), 100--101.Google Scholar
Richard E Snow. 1991. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of consulting and clinical psychology 59, 2 (1991), 205.Google ScholarCross Ref
Josh Tenenberg. 2019. Qualitative Methods for Computing Education. Cambridge University Press, 173--207. https://doi.org/10.1017/9781108654555.008Google ScholarCross Ref
Sara Vogel. 2020. Translanguaging About, With, and Through Code and Computing: Emergent Bi/Multilingual Middle Schoolers Forging Computational Literacies. Ph.D. Dissertation.Google Scholar
Qianxiang Wang, Wenxin Li, and Tao Xie. 2014. Educational Programming Systems for Learning at Scale. In L@S 2014. ACM, 177--178. https://doi.org/10. 1145/2556325.2567868Google ScholarDigital Library
Wengran Wang, Yudong Rao, Archit Kwatra, Alexandra Milliken, Yihuan Dong, Neeloy Gomes, Sarah Martin, Veronica Catété, Amy Isvik, Tiffany Barnes, Chris Martens, and Thomas Price. 2023. A Case Study on When and How Novices Use Code Examples in Open-Ended Programming. In ITiCSE 2023. ACM, 82--88. https://doi.org/10.1145/3587102.3588774Google ScholarDigital Library
David Weintrop, Heather Killen, Talal Munzar, and Baker Franke. 2019. Block- Based Comprehension: Exploring and Explaining Student Outcomes from a Read-Only Block-Based Exam. In SIGCSE 2019. ACM, 1218--1224. https://doi. org/10.1145/3287324.3287348Google ScholarDigital Library
Svetlana Yarosh and Mark Guzdial. 2007. Narrating data structures: the role of context in CS2. In ICER 2007. ACM, New York, NY, USA, 87--98. https://doi.org/ 10.1145/1288580.1288592Google ScholarDigital Library
Rui Zhi, ThomasWPrice, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop at EDM.Google Scholar

Index Terms

Confidence vs Insight: Big and Rich Data in Computing Education Research
1. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

The anatomy of big data computing

Advances in information technology and its widespread growth in several areas of business, engineering, medical, and scientific studies are resulting in information/data explosion. Knowledge discovery and decision-making from such rapidly growing ...
Read More
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data Research

This paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Read More
Big data

We use structuralism and functionalism paradigms to analyze the origins of big data applications.Current trends and sources of big data.Processing technologies, methods and analysis techniques for big data are compared in detail.We analyze major ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1
March 2024
1583 pages
ISBN:9798400704239
DOI:10.1145/3626252
General Chairs:
Ben Stephenson
University of Calgary, Canada
,
Jeffrey A. Stone
Penn State University
,
Program Chairs:
Lina Battestilli
North Carolina State University, USA
,
Samuel A. Rebelsky
Grinnell College
,
Libby Shoop
Macalester College
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data
rich data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,595of4,542submissions,35%
Upcoming Conference
SIGCSE Virtual 2024

Sponsor:

sigcse

SIGCSE Virtual 2024: ACM Virtual Global Computing Education Conference

November 30 - December 1, 2024

Virtual Event , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Confidence vs Insight: Big and Rich Data in Computing Education Research

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

The anatomy of big data computing

Responsible Big Data Analytics for E-Business Services

Big data