Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets

doi:10.5281/zenodo.7646659

Published March 15, 2023 | Version v1

Journal article Open

Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets

1. Utah State University

Analysis of programming process data has become popular in computing education research and educational
data mining in the last decade. This type of data is quantitative, often of high temporal resolution,
and it can be collected non-intrusively while the student is in a natural setting. Many levels of granularity
can be obtained, such as submission, compilation, edit, and keystroke events, with keystroke-level logs
being the most fine-grained of commonly used dataset types. However, the lack of open datasets, especially
at the keystroke level, is notable. There are several reasons for this failing, with the most prominent
being the challenges of deidentification that are peculiar to keystroke log data. In this paper, we present
the public release of two fully deidentified keystroke datasets that are the first of their kind in terms of
both event and metadata richness. We describe our collection technique and properties of the data along
with deidentification techniques that, while not fully relieving researchers of significant effort, at least
reduce and streamline manual work in hopes that researchers will release similar datasets in the future.

Files

581Edwards1To31.pdf

Files (664.4 kB)

Name	Size	Download all
581Edwards1To31.pdf md5:b0051f4687ab37fb6d373a7281bcaf47	664.4 kB	Preview Download

Additional details

Is cited by: Journal article: https://jedm.educationaldatamining.org/index.php/JEDM/article/view/581 (URL)

	All versions	This version
Views	185	184
Downloads	96	95
Data volume	72.4 MB	71.8 MB

Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets

Creators

Description

Files

581Edwards1To31.pdf

Files (664.4 kB)

Additional details

Related works