ABSTRACT
There are now many large datasets available for programming education research. They tend to be very large-scale, but often lack context or detailed participant information. This ''big data'' is in contrast to the ''rich data'' that has generally been collected from smaller, qualitative studies, with detailed context and participant information. Big data is often criticised for its lack of context, while rich data is often criticised for its small sample size which makes generalisable conclusions dubious. In this position paper we examine the constraints, advantages, and disadvantages of each type of data, and discuss how they can provide differing information on phenomena in programming education research. We argue that both types of data are useful and that we should value the potential findings of each, as well as encourage their combination in order to provide a complete picture of how people learn to program.
- Efthimia Aivaloglou and Felienne Hermans. 2016. How Kids Code and How We Know: An Exploratory Study on the Scratch Repository. In ICER 2016. ACM, 53--61. https://doi.org/10.1145/2960310.2960325Google ScholarDigital Library
- Maria Ijaz Baig, Liyana Shuib, and Elaheh Yadegaridehkordi. 2020. Big data in education: a state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education 17, 1 (2020), 1--23.Google ScholarCross Ref
- Ryan S Baker, Taylor Martin, and LisaMRossi. 2016. Educational data mining and learning analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (2016), 379--396.Google ScholarCross Ref
- Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of the Evolution of Introductory Programming Education Research. In SIGCSE 2019. ACM, 338--344. https://doi.org/10.1145/3287324.3287432Google ScholarDigital Library
- George EP Box and Norman R Draper. 1987. Empirical model-building and response surfaces. John Wiley & Sons.Google Scholar
- Neil C. C. Brown and Amjad Altadmri. 2017. Novice Java Programming Mistakes: Large-Scale Data vs. Educator Beliefs. ACM Trans. Comput. Educ. 17, 2, Article 7 (may 2017), 21 pages. https://doi.org/10.1145/2994154Google ScholarDigital Library
- Neil C. C. Brown, Amjad Altadmri, Sue Sentance, and Michael Kölling. 2018. Blackbox, Five Years On: An Evaluation of a Large-Scale Programming Data Collection Project. In ICER 2018. ACM, 196--204. https://doi.org/10.1145/3230977. 3230991Google ScholarDigital Library
- Neil C. C. Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A Large Scale Repository of Novice Programmers' Activity. In SIGCSE 2014. ACM, 223--228. https://doi.org/10.1145/2538862.2538924Google ScholarDigital Library
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77--91.Google Scholar
- Kathryn Cunningham, Rahul Agrawal Bejarano, Mark Guzdial, and Barbara Ericson. 2020. "I'm Not a Computer": How Identity Informs Value and Expectancy During a Programming Activity. (2020). https://doi.org/10.22318/icls2020.705Google ScholarCross Ref
- Brian Danielak. 2022. How Code Takes Shape: Studying a Student's Program Evolution. Cognition and Instruction 40, 2 (2022), 266--303.Google ScholarCross Ref
- Brian A Danielak, Ayush Gupta, and Andrew Elby. 2014. Marginalized identities of sense-makers: Reframing engineering student retention. Journal of Engineering Education 103, 1 (2014), 8--44.Google ScholarCross Ref
- Sharon J. Derry, Roy D. Pea, Brigid Barron, Randi A. Engle, Frederick Erickson, Ricki Goldman, Rogers Hall, Timothy Koschmann, Jay L. Lemke, Miriam Gamoran Sherin, and Bruce L. Sherin. 2010. Conducting Video Research in the Learning Sciences: Guidance on Selection, Analysis, Technology, and Ethics. Journal of the Learning Sciences 19, 1 (2010), 3--53. https://doi.org/10.1080/10508400903452884Google ScholarCross Ref
- Catherine D'ignazio and Lauren F Klein. 2023. Data feminism. MIT press.Google Scholar
- Deborah A. Fields, Yasmin B. Kafai, and Michael T. Giang. 2017. Youth Computational Participation in the Wild: Understanding Experience and Equity in Participating and Programming in the Online Scratch Community. ACM Trans. Comput. Educ. 17, 3, Article 15 (Aug. 2017), 22 pages. https://doi.org/10.1145/3123815Google ScholarDigital Library
- Sally Fincher, Sebastian Dziallas, and Daniel Knox. 2019. Space, Place and Practice in Computing Education. In UKICER 2019. ACM, Article 11, 7 pages. https: //doi.org/10.1145/3351287.3351297Google ScholarDigital Library
- Sally Fincher, Raymond Lister, Tony Clear, Anthony Robins, Josh Tenenberg, and Marian Petre. 2005. Multi-Institutional, Multi-National Studies in CSEd Research: Some Design Considerations and Trade-Offs. In ICER 2005. ACM, 111-- 121. https://doi.org/10.1145/1089786.1089797Google ScholarDigital Library
- Sally Fincher and Josh Tenenberg. 2007. Warren's Question. In ICER 2007. ACM, 51--60. https://doi.org/10.1145/1288580.1288588Google ScholarDigital Library
- Christian Fischer, Zachary A. Pardos, Ryan Shaun Baker, Joseph Jay Williams, Padhraic Smyth, Renzhe Yu, Stefan Slater, Rachel Baker, and Mark Warschauer. 2020. Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (2020), 130--160. https://doi.org/10.3102/ 0091732X20903304Google ScholarCross Ref
- Janice D Gobert, Michael Sao Pedro, Juelaila Raziuddin, and Ryan S Baker. 2013. From log files to assessment metrics: Measuring students' science inquiry skills using educational data mining. Journal of the Learning Sciences 22, 4 (2013), 521--563.Google ScholarCross Ref
- Jamie Gorson and Eleanor O'Rourke. 2020. Why do CS1 Students Think They're Bad at Programming? Investigating Self-efficacy and Self-assessments at Three Universities. In ICER 2020. 170--181.Google ScholarDigital Library
- Lindsey Ann Gouws, Karen Bradshaw, and Peter Wentworth. 2013. Computational Thinking in Educational Activities: An Evaluation of the Educational Game Light-Bot. In ITiCSE 2013. ACM, 10--15. https://doi.org/10.1145/2462476.2466518Google ScholarDigital Library
- Catherine Greene. 2019. Big Data and the Reference Class Problem. What Can We Legitimately Infer about Individuals? Computer Ethics-Philosophical enquiry (CEPE) proceedings 2019, 1 (2019), 7.Google Scholar
- Philip J. Guo. 2018. Non-Native English Speakers Learning Computer Programming: Barriers, Desires, and Design Opportunities. In CHI 2018. ACM, 1--14. https://doi.org/10.1145/3173574.3173970Google ScholarDigital Library
- Paulina Haduong. 2019. "I like computers. I hate coding": a portrait of two teens'experiences. Information and Learning Sciences 120, 5/6 (2023/03/23 2019), 349--365. https://doi.org/10.1108/ILS-05--2018-0037Google ScholarCross Ref
- Orit Hazzan and Clifford A. Shaffer. 2015. Big Data in Computer Science Education Research. In SIGCSE 2015. ACM, 591--592. https://doi.org/10.1145/2676723. 2677318Google ScholarDigital Library
- Kieran Healy. 2017. Fuck nuance. Sociological Theory 35, 2 (2017), 118--127.Google ScholarCross Ref
- Benjamin Mako Hill and Andrés Monroy-Hernández. 2017. A longitudinal dataset of five years of public activity in the Scratch online community. Scientific Data 4, 1 (31 Jan 2017), 170002. https://doi.org/10.1038/sdata.2017.2Google ScholarCross Ref
- Peter E. J. Kemp, Billy Wong, and Miles G. Berry. 2019. Female Performance and Participation in Computer Science: A National Picture. ACM Trans. Comput. Educ. 20, 1, Article 4 (Nov. 2019), 28 pages. https://doi.org/10.1145/3366016Google ScholarDigital Library
- Rochelle King, Elizabeth F Churchill, and Caitlin Tan. 2017. Designing with data: Improving the user experience with A/B testing. " O'Reilly Media, Inc.".Google Scholar
- Colleen M. Lewis. 2012. The Importance of Students' Attention to Program State: A Case Study of Debugging Behavior. In ICER 2012. ACM, 127--134. https: //doi.org/10.1145/2361276.2361301Google ScholarDigital Library
- Alex Lishinski, Aman Yadav, Jon Good, and Richard Enbody. 2016. Learning to program: Gender differences and interactive effects of students' motivation, goals, and self-efficacy on performance. In ICER 2016. ACM, 211--220.Google ScholarDigital Library
- Jesús Moreno and Gregorio Robles. 2014. Automatic detection of bad programming habits in scratch: A preliminary study. In FIE 2014. IEEE, 1--4.Google ScholarCross Ref
- Laura K. Nelson. 2020. Computational Grounded Theory: A Methodological Framework. Sociological Methods & Research 49, 1 (2020), 3--42. https://doi.org/ 10.1177/0049124117729703Google ScholarCross Ref
- Paul Ohm. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA l. Rev. 57 (2009), 1701.Google Scholar
- Alannah Oleson, Benjamin Xie, Jean Salac, Jayne Everson, F Megumi Kivuva, and Amy J Ko. 2022. A Decade of Demographics in Computing Education Research: A Critical Review of Trends in Collection, Reporting, and Use. In ICER 2022. 323--343.Google ScholarDigital Library
- Erin Ottmar, David Landy, Erik Weitnauer, and Rob Goldstone. 2015. Graspable mathematics: Using perceptual learning technology to discover algebraic notation. In Integrating touch-enabled and mobile devices into contemporary mathematics education. IGI Global, 24--48.Google Scholar
- Chris Palaguachi, E Cox, and C D'Angelo. 2022. Audio Analysis of Teacher Interactions with Small Groups in Classrooms. In General Proceedings of the 15th International Conference on Computer-Supported Collaborative Learning 2022.Google Scholar
- Thomas Price, Rui Zhi, and Tiffany Barnes. 2017. Evaluation of a Data-Driven Feedback Algorithm for Open-Ended Programming. International Educational Data Mining Society (2017).Google Scholar
- David Pritchard. 2015. Frequency Distribution of Error Messages. In PLATEAU 2015. ACM, 1--8. https://doi.org/10.1145/2846680.2846681Google ScholarDigital Library
- Mehran Sahami, Jace Kohlmeier, Peter Norvig, Andreas Paepcke, and Amin Saberi. 2014. Panel: Online Learning Platforms and Data Science. In L@S 2014. ACM, 137--138. https://doi.org/10.1145/2556325.2579110Google ScholarDigital Library
- Carsten Schulte. 2013. Reflections on the Role of Programming in Primary and Secondary Computing Education. In WiPSCE 2013. ACM, 17--24. https: //doi.org/10.1145/2532748.2532754Google ScholarDigital Library
- Elaine Seymour and Nancy M Hewitt. 1997. Talking about leaving: why undergraduates leave the sciences. Westview Press, Boulder, CO.Google Scholar
- Bruce L. Sherin and Clark A. Chinn. 2022. Microgenetic Methods (3 ed.). Cambridge University Press, 217--237. https://doi.org/10.1017/9781108888295.014Google ScholarCross Ref
- RajShrestha Shrestha, Juho Leinonen, Albina Zavgorodniaia, Arto Hellas, and John Edwards. 2022. Pausing While Programming: Insights from Keystroke Analysis. In ICSE-SEET 2022. Association for Computing Machinery, New York, NY, USA, 187--198. https://doi.org/10.1145/3510456.3514146Google ScholarDigital Library
- Mario Luis Small. 2009. "How many cases do I need?': On science and the logic of case selection in field-based research. Ethnography 10, 1 (2009), 5--38. https://doi.org/10.1177/1466138108099586Google ScholarCross Ref
- Brett Smith. 2018. Generalizability in qualitative research: misunderstandings, opportunities and recommendations for the sport and exercise sciences. Qualitative Research in Sport, Exercise and Health 10, 1 (2018), 137--149. https://doi.org/10.1080/2159676X.2017.1393221 arXiv:https://doi.org/10.1080/2159676X.2017.1393221Google ScholarCross Ref
- Joanna Smith and Helen Noble. 2014. Bias in research. Evidence-based nursing 17, 4 (2014), 100--101.Google Scholar
- Richard E Snow. 1991. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of consulting and clinical psychology 59, 2 (1991), 205.Google ScholarCross Ref
- Josh Tenenberg. 2019. Qualitative Methods for Computing Education. Cambridge University Press, 173--207. https://doi.org/10.1017/9781108654555.008Google ScholarCross Ref
- Sara Vogel. 2020. Translanguaging About, With, and Through Code and Computing: Emergent Bi/Multilingual Middle Schoolers Forging Computational Literacies. Ph.D. Dissertation.Google Scholar
- Qianxiang Wang, Wenxin Li, and Tao Xie. 2014. Educational Programming Systems for Learning at Scale. In L@S 2014. ACM, 177--178. https://doi.org/10. 1145/2556325.2567868Google ScholarDigital Library
- Wengran Wang, Yudong Rao, Archit Kwatra, Alexandra Milliken, Yihuan Dong, Neeloy Gomes, Sarah Martin, Veronica Catété, Amy Isvik, Tiffany Barnes, Chris Martens, and Thomas Price. 2023. A Case Study on When and How Novices Use Code Examples in Open-Ended Programming. In ITiCSE 2023. ACM, 82--88. https://doi.org/10.1145/3587102.3588774Google ScholarDigital Library
- David Weintrop, Heather Killen, Talal Munzar, and Baker Franke. 2019. Block- Based Comprehension: Exploring and Explaining Student Outcomes from a Read-Only Block-Based Exam. In SIGCSE 2019. ACM, 1218--1224. https://doi. org/10.1145/3287324.3287348Google ScholarDigital Library
- Svetlana Yarosh and Mark Guzdial. 2007. Narrating data structures: the role of context in CS2. In ICER 2007. ACM, New York, NY, USA, 87--98. https://doi.org/ 10.1145/1288580.1288592Google ScholarDigital Library
- Rui Zhi, ThomasWPrice, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop at EDM.Google Scholar
Index Terms
- Confidence vs Insight: Big and Rich Data in Computing Education Research
Recommendations
The anatomy of big data computing
Advances in information technology and its widespread growth in several areas of business, engineering, medical, and scientific studies are resulting in information/data explosion. Knowledge discovery and decision-making from such rapidly growing ...
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data ResearchThis paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Comments