Input Determination for Models Used in Predicting Student Performance

To capture and code as much as possible of student behavior and environment to apply learning analytics in conventional classrooms, patterns among successful inputs of existing online learning / learning management systems can be identified, to find existing but uncaptured classroom data. The goal of this review is to suggest proposals on expanded use of learning analytics in traditional classrooms. Predictors from learning models used in online learning can be applied to the traditional classroom and analogues may be found for unavailable predictors. Approaches used in developing these predictors can be used to develop predictors for conventional classrooms. Existing data can be used, or data that is convenient may be captured, with emphasis on approaches that work on smaller groups, where training of individual models should be attempted. The data collected should be simple to obtain, support the users otherwise, or have provable benefits.


Introduction
Student performance prediction has become a viable means to improving academic performance and course content in online learning. Predictive models such as artificial neural networks, decision trees and linear regression are used to transform inputs (e.g. past performance, social background, learning system usage patterns, test results) into outputs (course completion, expected grade, difficulties encountered, personalized learning suggestions). Often, existing quantitative data (e.g. from Massive Open Online Courses -MOOCs) drive model design, especially when applying such models to the conventional classroom (using a Learning Management System -LMS), and the person delivering the course is a passive participant in designing models and delivering data. This produces a "streetlight effect", "looking for solutions where it is easy to search, not where the real solution might be" (Ochoa et al., 2016), as only data from online learning is used, and even there only data that can be acquired easily. Therefore, one current approach to expanding data available is multimodal learning analytics, where automated systems are used to capture multimodal sensory data from collaborative, real-world, noncomputer mediated environments (Ochoa et al., 2016).
Another approach may be to persuade educators and learners to directly input previously unused data into learning analytics systems, by either producing a more effective learning analytics system (one that improves learning) or by providing direct benefits from this data capture (for example checklists like by recording the time when feedback is provided / works are returned to students, pedagogical personnel are assured they will never forget to return and comment on a student's work). In seeking to capture and code as much as possible of student behavior and environment to apply learning analytics to a mostly conventional classroom, the most successful inputs (predictors) among existing models can be identified, categorized and their common characteristics determined. Together with a study of formative and summative assessment methods (e.g. types of feedback and how it can be captured) and factors affecting student performance in classroom (e.g. environmental factors), this allows identifying existing data in classrooms that are not captured by current learning management systems, thus allowing expanded use of learning analytics and student performance prediction in traditional classrooms, with a focus on personalized suggestions.
The goal of the paper is to identify patterns among inputs used in existing models of student learning (based on online learning and learning management system data mining) that can then be applied also to the traditional classroom within its existing workflow, by looking at the features used by existing and proposed learning analytics approaches.
Research question: how characteristics common to effective predictors of student performance can be used to identify predictors among data produced in the traditional classroom?

Methodology
A literature review is performed where inputs captured and features discovered in existing learning analytics systems are characterized, along with the methods used to identify features and the modelling approaches employed. The learning setting and the stakeholders who are expected to benefit from the studied approach are recorded. Studies that describe their methodology in an exhaustive manner are highlighted.
Searches were performed using Google Scholar, Ex Libris Primo, and using the search features of academic databases such as ACM Digital Library, EBSCO Academic Search Ultimate JSTOR, ScienceDirect, Wiley Online Library, etc.
Search terms used were either broad ("learning analytics dataset ", "classroom data", "multimodal learning analytics") or more narrow, prejudicial guesses ("classroom comfort performance", "student age learning analytics"). A significant limitation of the study is that the narrow guesses did produce some articles that were not found otherwise, therefore suggesting that the search results using broad terms were not exhaustive. Also, conceptual articles are excluded from the study, even though some did identify promising approaches. The focus of this study is on articles about specific methodologies used in learning analytics or educational data mining, therefore excluding review articles.
An attempt is made to identify measures in online learning that may have analogues in the traditional classroom (e.g., seating patterns and communication in chatrooms) or for which proxies may be found (e.g., screen size and lighting quality, where the proxy is the classroom number).
The corresponding outputs are recorded where possible, with a focus on those that allow providing feedback for individual students or for course/curriculum deliverers/designers (to allow improving the success of future students in this course).

Results
A total of 250 articles responding to the search queries were selected based on their titles. Of these, 53 could be broadly classified as review articles, and 114 were either conceptual, or dealt with further aspects of learning analytics such as data presentation or providing feedback. Of the 85 articles that supported the effort to identify features that could be used in developing an LMS, 68 deal with analytics in MOOCs or datasets of existing LMSs, where at least 15 of these provide significant methodological detail on the inputs used, and are described in Table 1. An attempt to broaden the data available to learning analytics practitioners was found to be covered in 11 articles, as described in Table 2. Combine an interactive learning environment with physical monitoring Monitor interactions with a tangible user interface using a Kinect sensor Multimodal analytics is very promising even when using unsupervised learning algorithms Successful learning analytics approaches use fine-grained longitudal data, where it is often impossible to predict which specific measures will be the best predictors. Therefore, at least during the prototyping stages, more inputs than would be used in production should be requested, while in production it should be easy for users, educators and learners alike, to volunteer more data than requested by default. A dashboard could be provided for monitoring the effectiveness of features at predicting student performance, with the ability to suggest features in addition to the built in ones, as Veeramachaneni et al. (2014) have shown the success of user-invented inputs to student models.
If data are fine-grained enough, it becomes possible to train models for individual students, that Yang et al. (2017) have shown to be more effective than general models; in addition, to improve such models, or adapt models to a specific cohort, it may be possible to undertake sessions where additional sensors are used to collect additional physical data for model training that Ezen-Can et al. (2015) have shown to improve performance of models. If resources permit, multimodal data have been shown to be promising (Schneider and Blikstein, 2015) and consideration should be given even to simply recording speech (Worsley et al., 2011), if privacy issues are accounted for. In addition, model performance during the beginnings of a course may be improved using directed surveys, which should also be given to dropouts (Hone, Said, 2016).
If possible, textual, not quantitative, data should be collected, as it has been shown to provide information on student state, often early, though this does seem to require a coded training set, both of low effort to predict dropouts (Ramesh et al., 2014) or if the model developer puts in more effort, to monitor student learning gains (Wang et al., 2015). In addition, this would permit applying LA techniques to classes where some of the most popular LA outputs are unavailable, as for example in classes where dropping out is not possible, motivation does drop and this may become visible in student textual output.
When constructing features, theoretical studies of learner behaviour from other pedagogic research fields can be used to inform model development (e.g. Jo et al., 2015) or to develop tools to monitor student learning strategies (e.g. Tabuenca et al., 2015).
Process mining that identifies activity patterns has been shown to be a powerful predictor (Maldonado-Mahauad et al., 2018), therefore as much as possible information on event timing should be collected. In addition to monitoring students, instructor activity and interaction with learning analytics and students should be monitored, as multiple studies have shown these to have a significant impact on learner performance (e.g. Gašević et al., 2015, Ma et al., 2015, Leeuwen et al., 2015. Interventions are possible early by using demographic data that is already known about a learner at the beginning combined with models trained on large datasets (Fernandes et al., 2019), though this may require access to datasets relevant to the specific region or educational system, which presents privacy issues and requires governmental stakeholder involvement.
If possible, adaptations can be developed for specific types of courses. For example, programming courses benefit from instrumented integrated development environments (Blikstein et al., 2014) and it may be possible to develop generalizable data collection techniques for any computer based activity using an instrumented virtual machine (Pardo and Kloos, 2011), though both approaches may require unfeasible amounts of data processing even for small cohorts.

Discussion
Recently, there has been more focus on increasing the visibility into models of learning and on involving learning personnel in designing, modifying and running those models. Providing inputs and recognizing the features they represent determines the success of such models. Therefore, recognizing existing successes and applying them to formative assessment methods may be a means of recognizing additional inputs to and features used in models, while involving educators. Applying learning models to the traditional classroom as an integrated part of the learning management (school record keeping/grading) systems may allow to expand their use, while simultaneously increasing the predictive power and effectiveness of (personalized) suggestions, both by using existing data, and by providing tools for educators to transform the existing feedback they provide into data than can be used as inputs for models.
It is evident that for a learning analytics platform to be successful, it needs to either provide the ability to collect data close to effortlessly or to provide benefits to those providing the data independent of any learning analytics output. The ways a learning analytics platform can be valuable to teachers include, for example, giving them access to tools that reduce their effort or improve their confidence. The course structures enforced by MOOCs have been shown to be successful (Hone, Said, 2016) so if there is the desire to provide blended learning or online learning opportunities, providing such structures in the LMS can attract both educators and students as users, as Zacharis (2015) has shown optional content to be capable of improving motivation and as this would improve educator confidence that their courses are structured according to best practices.
Another approach to providing value with relatively low effort is instrumenting already existing resources, as Prasad et al. (2016) have shown by developing an epub (an electronic book format) viewer to monitor interactions similarly to how a MOOC does; many courses in schools use books that are available as PDFs, therefore instrumenting such books may provide clickstream-like information on reading habits without the requirement to redevelop course resources. In addition, for those subjects that perform field studies, techniques from mobile learning may be adapted, with success shown if the developed applications actually support the student in their learning goals required by the existing course content while providing LA data (Fulantelli et al., 2014).
As activity patterns are a powerful predictor of learner performance, an LMS should integrate as much of the "bookkeeping" functions of a school (or be integrated with existing systems), as this would permit collecting data without additional effort from the teacher (e.g., absences, lesson times, students that study together, etc.) or to collect additional data while providing value to the teacher (e.g., when a teacher provides feedback/hands back work he/she records this interaction in the LMS using a checklist; this provides event timing data while permitting the teacher certainty that he has spoken with the specific student about the specific work and provided required feedback). Rienties and Toetenel (2016) have shown that it is possible to identify successful course design patterns, again providing value to those educators who maintain their (often legally required) course (design) documentation in an LMS. An LMS may be capable of identifying learning patterns that are effective (Maldonado-Mahauad et al., 2018) and these patterns can then be suggested to other students, which applies even to flipped classrooms (Jovanović et al., 2017), again providing value.
Collaborative learning is gaining more and more providence, hence tools that support it may also gain acceptance more easily. Donia et al. (2018) have shown that it is possible to improve teamwork through relatively low effort, by using peer feedback tools that already exist; if attempting to more closely monitor team performance, Hernández-García et al. (2018) have shown this to be possible, with caveats such as the need to verify that LMS/wiki users correspond to the actual students performing work. Another approach may be to transfer existing students chats used for collaborative activities to a monitored environment, as Ferguson and Shum (2011) have shown the possibility of analysing real time chat, though students may not be willing considering privacy issues or the need to discuss among themselves.
It has been shown that providing feedback, at least in the form of a learning analytics dashboard available to students, can be detrimental to their performance (Lonn et al., 2015); therefore, availability of LA output should be carefully evaluated, and there should be monitoring, if possible, of when a teacher provides feedback so that effective types of feedback can be found.

Conclusion
Predictors used in learning models in online learning can be applied to the traditional classroom. Analogues may be found for predictors that are not available in the conventional classroom. Common characteristics and categorisation of predictors may be used to identify predictors among existing data, including data provided to students (e.g. formative feedback) that is not captured by existing learning management systems used. As a conventional classroom may have small cohorts, approaches that work on smaller groups should be preferred and personalized training of models should be attempted. An LMS applied to a conventional classroom would ideally require minimum additional effort, therefore the data collected should either be simple to obtain, support the users in other ways, or be provable as providing beneficial analytics.