An Open edX Extension for Parallel Programming Assignments with Automatic Configurable Grading

randres.velasquez@udea.edu.co Abstract —Computing devices of all types have almost converged to using central processing units featuring multiple processing cores. In order to develop efficient software for such devices, programmers need to learn how to write parallel programs. We present an infrastructure to support parallel programming assignments for online courses. We developed an extension to the Open edX platform with a backend that handles the execution of student codes on a cluster lab. The web user interface offers instructors a wide range of configuration options for the programming assignments as well as a flexible definition of criteria for automatic grading. We have successfully integrated the software with Open edX and tested it with a real parallel programming cluster


Introduction
The need for software engineers around the globe is ever increasing, with most countries being unable to meet the demands of the digital transformation that is rapidly happening in all areas of the economy [1] [2]. While many educational institutions and governments are rushing to increase their capacity for training software developers, there's a need to include some form of parallel programming training in order to keep up with technology changes [3]. Today, most computers are equipped with multiple processing cores that are capable of running, not only multiple applications simultaneously, but also a single parallelized program faster. While parallel programming is not required or at least not worth the effort for many types of programs, a significant and critical amount of applications can greatly benefit from it. Unfortunately, learning parallel programming requires a significant effort from learners and instructors, and any tools that help tackle such a challenge are welcomed.
Massive open online courses (MOOCs) are playing an ever more important role in increasing the capacity of training software engineers. Platforms such as Coursera, edX and Udacity have millions of students, many of whom are following computer science courses. At the same time, universities and colleges are using learning management systems such as Moodle or Blackboard to make teaching resources more iJEP -Vol. 11, No. 4, 2021 readily available to students and to virtualize to some extend their courses. In any case, programming courses require a significant amount of practice and instructors need ways of evaluating the student's level of accomplishment.
Code autograders are software that help automate the evaluation of a program, typically by comparing its outputs against previously computed correct outputs. By using such a tool, the instructor can easily monitor, even with large groups, which students are being able to provide a correct solution for a programming problem formulation and thus, reaching the learning goals. In the case of parallel programming assignments, instructors are also interested in evaluating the speedups achieved by the student's parallel codes compared to the sequential versions, or in measuring how performance scales with an increased number of cores, among others, which brings further challenges to an autograder for parallel code.
In this work we set to find out the benefits and drawbacks of implementing a flexible plugin for teaching parallel programing on Open edX, one the most popular LMS platforms. Thus, we present here a software that integrates and extends Open edX to specifically manage parallel programming assignments with automatic grading. From the user viewpoint, it provides one module for the instructor to configure an assignment and one module for the student to submit an assignment solution. We have carefully designed the assignment configuration module to let the instructor specify different kinds of tests, use various performance metrics and compute the grade according to the importance he/she wants to give to every test. Our autograder has the ability to run performance tests that allow the instructor to go further than only correctness, yet with the advantage of an automatic tool that allows him to manage larger groups or use the time for other course activities.
We have implemented the software, integrated it with an Open edX instance and linked it to a computing cluster lab to run the student assignments. The cluster lab has been built after a study that allowed us to select efficient single-board computers (SBCs) to be used as compute nodes, rather than using standard computers.
In Section 2, we present other works with a similar aim and in Section 3 we tell the steps we followed to develop and test the platform. In Section 4 we give an overview of the platform and its details are in Section 5 and 6. The prototype computing cluster lab is presented in Section 7. Finally, Section 8 and 9 discuss the platform developed and draw some conclusions.

Related Works
Before the dominance of today's MOOC platforms, there were early attempts at building virtual platforms for teaching programing [4], some of them specifically for parallel programming [5]. Such works represented a bigger development effort as they had to build the whole platform from scratch. The availability of MOOCs in parallel programming is relatively new. By 2017, only four courses in the subject where available [6]. Professor Wen-Mei Hwu of the University of Illinois Urbana-Champaign offered the first online parallel programming course in 2012 on the Coursera platform [7]. The course focused on teaching the essential parallel programming concepts for programming multi-core CPUs and GPUs using OpenCL or CUDA. A CUDA course released in 2013 at the Udacity platform was very similar but focused specifically on teaching how to program with CUDA without much discussion of the fundamentals of parallelism [8]. The specialization "Clouds, Distributed Systems, and Networking" on the Coursera platform was a suite of courses focused on cloud computing [9]. Another course titled "Parallel programming" also at the Coursera platform teaches shared-memory parallelism in the Scala programming language [10]. A common characteristic of the first online courses that appeared was to be highly focused on the language rather than teaching the concepts of parallel algorithms.
Recently, new courses have been released, aimed at achieving a broader learning of parallel programming and high-performance computing (HPC). Sarkar et al. [11] described the process and challenges faced while they constructed a Coursera Specialization on Parallel, Concurrent, and Distributed Computing in Java. Besides the challenges associated with curriculum development, authors highlight technical challenges such as mini-project development, a custom parallel programming library for supporting the course, and automatic grading of student submissions. In [6], Sarkar et al. reported that, by far, the most demanding task for maintaining an online course was answering questions in the forum. Additionally, the authors reported that the performance tests are often the primary cause for a bad learner experience. Authors recommended to account for the volatility of results and keep speedup expectations low. The course uses the auto-grading platform provided by Coursera, where student submissions are executed within containers and limited to four cores.
Mullen et al. [12] tackled the need for training and education in HPC. They reported an experience in building a course with the Open edX platform but they found it challenging for supporting hands-on HPC experience.
Solutions for integrating MOOC platforms with hands-on training have been reported [13] [14]. Dakkak et al. [13] proposed an online GPU development platform that provides students with a user-friendly scalable GPU computing platform. In this work, authors presented the original, revised, and upcoming WebGPU designs that address the requirements and challenges of offering sophisticated computing resources to many students. They developed an autograder for GPU codes using Open edX. The work presented here has a similar intention, but it is not for GPU programming. Instead, we aim at multicore CPU parallel programming using models such as OpenMP and Pthreads.
Staubitz et al. [14] introduced CodeOcean, a web platform to provide practical programming exercises for MOOCs. CodeOcean provides online courses with automated feedback and assessment of programming tasks. A web application provides the development environment for learners as well as an administration backend for instructors. Docker is used for code execution and assessment.
Carbunescu et.al. presented in [15] a comprehensive analysis of the different issues that arise when trying to implement an autograder for parallel codes. They warned about the challenges posed by, for example, the fact that many parallel programs may produce non identical outputs even if executed on the same hardware and software environment. In the same paper they also present an autograder developed as an extension of Moodle and their experience of using it for a bachelor course. They claimed that autograders often require customizations to meet the particularities of every course, particularly for the case of parallel programming.
While numerous authors [16][17] [18] have built and studied parallel computing clusters based on SBCs, our paper presents for the first time (to the best of our knowledge), one such state of the art cluster connected to an online autograder for teaching parallel programming. Here we show which tools are required as well as their benefits and drawbacks.
This paper is a continuation of our preliminary results published in [19], where the platform was presented with less detail and the focus was on the study performed to determine the best SBC to build the cluster. In this work, we present the whole platform in detail, discussing the strengths and limitations of the most relevant components.

Methodology
To develop the infrastructure for an online learning platform, that allowed us to find out the benefits and drawbacks of implementing a flexible plugin for teaching parallel programing, we went through a three-stage process: 1. We have defined all use cases to provide web services to support configurable parallel programming assignments on the remote cluster lab. The use cases were defined in a few working sessions among students and professors who teach parallel programming. The project team devised then the platform architecture as presented in Section 4 using the web services paradigm to allow for a clear separation and concurrent development of the frontend (Web Interface) and the backend (Lab Controller). We selected Open edX as the learning platform based on the positive experiences we had as students in the past with edX. We set out to study the technical options for developing an extension and decided to use Xblocks [20] as well as the Javascript language, Node.js [21] and Bootstrap [22][22] to achieve the more complex functionality of the Assignment Configuration module and the Lab Controller. 2. We built a cluster lab with SBC nodes in order to incorporate that technology trend as part of the parallel programming learning experience. A specialized software infrastructure was set up in order to provide a queuing system for orderly running student assignments. This part of the platform is managed by the Lab Controller. 3. Through multiple iterations in which typical parallel programming assignments were used, the frontend and the backend were refined until we were ready to integrate them. During the integration, common bugs were corrected and, in the end, we were able to validate the platform through careful functional testing. Once the platform was ready, a few controlled tests were carried out with teachers and students.

Platform Overview
The overall goal of the system presented here is to offer an online platform for teaching parallel programming. We developed two modules for Open edX to allow for the configuration, submission and automatic grading of parallel programming assignments. To achieve this, the platform is composed of three main blocks as shown in  The Web Interface is the platform's frontend and is, in general, the one provided by the Open edX platform with the configurations that the course creator chooses. All the resources and the organization of a parallel programming course are not discussed in this paper. Instead, we will explain one specific but very useful resource (see Section 5 and 6) we developed for any such a course.
Our contribution lies in the design and implementation from scratch of the two modules necessary for having configurable parallel programming assignments with automatic grading within Open edX. These two modules are: the one for the instructor to configure the assignment (Assignment Configuration inFig 1) and the one for the student to submit an assignment (Student's Assignment inFig 1). In our testing setup, the Open edX instance as well as the extension modules we developed, run on a cloud server but they could naturally run on any server provided the capacity is appropriate for the amount of users.
The Lab Controller is the platform's backend, receiving web service requests from the Web Interface, for example to run an assignment sent by a student, and issues the proper commands to the job scheduler, the Portable Batch System (PBS) [23] in our implementation. Thus, the PBS master receives execution requests that are then issued to the job scheduling agents (PBS slave) on the corresponding nodes to execute the code. Parameters, input and output data are handled by the Lab Controller as well as execution results to compute the grade. The Lab Controller should ideally run on an independent machine on the same network as the nodes where the student codes are run. However, it could also share the same machine with the assignment jobs as long as the hardware capacity and user traffic permits.
The so-called Cluster Lab is the infrastructure where the parallel programming assignments run. Although we have built a cluster based on SBC nodes (see Section 7), the other platform components (Web Interface and Lab Controller) are not restricted to working with such hardware and they only require one or more machines with PBS software installed to act as job scheduler.

Web Interface
Open edX is the learning management system (LMS) software used by edX, one of the largest e-learning providers today with millions of students around the world. The Open edX software is open source and is available on GitHub. We have installed Ironwood codenamed version (2019) of Open edX on a Linux server and developed an extension that provides two modules: one for the configuration of assignments on the instructor's view and one for the submission of assignments on the student's view. The Open edX installation plus the two modules make up the frontend from the point of view of the parallel programming assignment application.

Assignment configuration
When an instructor wants to set up a parallel programming assignment, he/she must access Open edX through the Content Management System (CMS) and add an instance of the parallel programming assignment Xblock. It will then offer the instructor's view with a wizard to configure an assignment in four steps. Although so far, our system only supports C/C++ as programming languages for the assignments, it could be easily extended to other languages. We hope to do so in the coming versions.
Step 1: Templates and input/output: The first step allows the instructor to specify the source and header files as well as a makefile to configure any compilation or linking parameters, if needed. At least one source file must be marked as editable as only editable files can be modified by the student. Such files can be blank or have segments of code for the student to complete. The Student Assignment module will not allow to modify the makefile so that it is not manipulated to influence performance metrics.
In the first step, the instructor should also specify the inputs and outputs of the program as shown in the example of Table 1. Inputs and outputs can be variable arguments or files, and can be marked as execution parameters or not (see Step 2). Outputs can be also marked as performance type, which allows them to be used for grading (see Step 3). Finally, the first step asks the instructor to specify the following limits for any execution resulting from a student's submission of the assignment being configured: • Core count: Maximum number of cores that may be used.
• Memory: Maximum system memory that may be consumed. These limits need to be set, on the one hand, for a test execution requested by a student before the actual submission and are meant to prevent the Cluster Lab to easily saturate by buggy codes. On the other hand, the limits should also be configured for all the executions defined by the instructor and are applied only when the student submits his / her assignment for evaluation and grading. They are meant to block attempts at cheating performance metrics.
Step 2: Test vectors and execution parameters: Next, test vectors for the evaluation of student's assignments are configured. Each test vector is a set of values and files to match the program inputs defined in Step 1. An execution is a set of values for all the execution parameters in Step 1. The idea is to allow for example a scalability evaluation of the program, having a fixed test vector but multiple executions with varying number of threads. Each test vector can have one or more executions and at least one test vector and one execution must be set up for an assignment. Table 2 shows an example with two test vectors of three inputs. The first test vector has only one execution associated whereas test vector 2 has three executions. Notice that all test vectors must have the same amount of inputs and all executions must have the same number of parameters. Both must match the ones defined in Step 1. Notice the structure of Table 2 matches Table 1. Step 3: Grading: In this step a weight is given as a percentage to every execution and to every test vector. Then, the instructor should specify a function to compute the grade of the assignment, using the performance results of the student and the reference as variables, as shown in Table 3's example. Notice all percentages of test vectors must total 100%, as well as all executions of a test vector.
Every execution may have a cost function to compute a grade that will be weighted with the other grades of other executions and tests in order to produce the assignment grade. The cost function may be built based on the performance outputs defined on Step 1 and the standard performance metrics the platform always measures: wall time, memory used and cpu utilization. The cost function may compare the student's performance values obtained with those of the instructor's solution. The expression to represent the cost function must use Javascript syntax.
For every execution, the platform will verify its correctness by comparing its outputs to the ones produced by the reference (instructor's) solution. If they match, the cost function will determine the grade, if they do not, that execution will get a zero grade. Then, the assignment grade would be computed by our platform as: grade = (40*(4)+60*(10*5+20*0+70*3)/100)/100 = 3.16 This particular evaluation configuration might seem a little harsh, but it is obviously only an example. By adding more executions such as Execution 1 of Test vector 2, in which only by submitting a code that produces correct outputs the student can get a 5.0, the final assignment grade will likely be higher in a real scenario. The flexibility of the grading configuration allows the instructor to carefully design the assignment evaluation by tuning not only the weights of every execution and test vector but also the cost function of every execution.
Step 4: Reference solution: Finally, the instructor must submit his/her solution code of the assignment. This code is run once for every execution that has been configured and the performance metrics are stored to be evaluated against the student's as specified in the cost function defined in Step 3.
After the reference solution is submitted, all tests will be applied so the process will naturally take some time. Once they finish, the status of the assignment configuration will switch to finished and only then it can be enabled for the students to start working on it.

Student's assignment
The student interface is meant to let him/her submit an assignment solution and it is composed of the following parts: • Templates: To let the student download a set of source files, some of which are meant for him / her to complete in order to obtain a program that satisfies the problem defined in the assignment. • Student solution: Let the student download his/her latest submitted code version of the assignment solution. • Code editor: A large text field with syntax highlighting for the student to write the code and see the code of any of the assignment source files. • Compiling: To compile the student's code and see the compiler's output. When the student issues a compiling action, the code is sent to the Lab Controller where it is compiled in the same tools' environment as the cluster nodes where the code is meant to run. • Test: To let the student set input values for the program and execute the binary produced in the latest successful compilation. When the student issues a test, the inputs are sent to the Lab Controller and it executes the latest compiled binary of that student on the Cluster Lab. The program's outputs as well as the performance metrics are sent back to the frontend and shown to the student. These test executions have resource limitations as defined by the instructor and are handled in a low priority queue by PBS master, possibly sharing cluster nodes with other executions. • Submit: When the student is satisfied with the test results, he/she can choose to submit his/her assignment solution. Then, the Lab Controller will run it with all test vectors and execution parameters defined by the instructor and unknown to the student. It may naturally take a while and in the end the student will be able to see his/her grade. Submit executions have different resource limitations meant to prevent cheating performance metrics and are sent to a high priority queue where they enjoy an isolated execution in order to minimize performance metrics noise. The assignment solution can be updated as long as the deadline has not passed.

Lab Controller
The Lab Controller is a custom software we have developed as a collection of web services to attend assignment compilation and execution requests from the online learning platform. It is a backend software developed using Node.js [21], the PBSJS [24] module to interact with the PBS master and a MongoDB database to store assignment data (IDs, inputs, outputs, etc). The job scheduling software requires the Lab Controller to run on a machine located on the same network as the cluster nodes.

6.1
Assignment submission use case Fig 2 shows the basic flow of steps that are followed when a student develops an assignment through the application presented in Section 5.2. After writing the code, the student needs to compile it. If successful, he/she can proceed to run the generated binary (with the inputs of his/her choice). If the student is satisfied with the results, he/she can submit the assignment for evaluation. In this case, the Lab Controller will run the student's binary multiple times using the various test vectors and execution parameters provided by the instructor when he/she configured the assignment. The Lab Controller will also compare the student's outputs with the reference outputs from the instructor and report back a grade based on passed and non-passed tests, and on the cost functions defined.
When the Lab Controller needs to launch an assignment execution, it does so through PBS (see Section 6.2) in order to guarantee the code will run exclusively on the allocated cluster nodes so that performance metrics are not distorted by competing executions.
For every assignment that has been configured by the instructor, the Lab Controller keeps an organization of files as follows: Instructor: Single folder containing the reference solution Notice that the above lists refer only to the assignment information that is stored as files. Naturally, other assignment data such as user IDs, performance metrics, grades, etc, are stored in a MongoDB database.

Fig. 2.
Platform components interaction for a student assignment submission.

Job scheduler
The Portable Batch System (PBS) [23] is a software for managing job scheduling in HPC environments. It was created 20 years ago for NASA, and it has evolved into a sophisticated tool with ample documentation and support. Today, PBS has joined the OpenHPC [25] initiative for an open-source HPC software stack with the Linux Foundation. In this project, we are using PBS Professional Open Source Project version 19.0.0.
PBS has four main components: Server, Scheduler, Communication and MoM. In a cluster platform with a server acting as a PBS frontend and multiple execution nodes, like the one we are using in this work, the Server, Scheduler, and Communication components are typically installed on the PBS frontend (PBS master). In contrast, the MoM component or Job Executor is installed at each execution node (PBS slave). The Server is in charge of receiving job execution requests and, along with the Scheduler, sends the jobs to the corresponding execution nodes through the Communication component. Dedicated routing and execution queues are defined to serve as communication channels to dispatch jobs to the corresponding nodes. At each execution node, the MoM component is in charge of placing the job into execution.
We have used PBS to achieve the following: • Queues to manage different priorities (e.g. instructors' executions have higher priority) • Isolated execution of assignment source codes in order to evaluate performance.
• Limit the resources that assignment executions can use such as walltime, memory and number of cpus. • Monitor the resources consumed by assignment executions.

7
Cluster Lab In the current prototype, our online learning platform is connected through the Lab Controller to a cluster of SBCs. We have chosen such devices, instead of traditional desktop-or server-like clusters, given their advantages in terms of low cost, low power consumption and recent utilization for edge computing. In fact, we acquired seven different cards (Vim3Pro, Orange Pi4B, Odroid N2, Hikey 970, Jetson Nano, Raspberry Pi4B, and Latte Panda) and evaluated them in terms of processing latency (by running standard benchmarks), price, and software compatibility [19]. Such devices are based on ARM processors typically found on embedded electronic devices and cellphones. In the study we also compared the price and computing power of SBCs against high-performance workstation processors and showed that SBC cores were on average six times cheaper while only two times slower than Intel Xeon cores. In the end, we selected the Vim3Pro and the Orange Pi4B as the best overall SBCs of the ones we studied and built a cluster with 20 and 32 such cards respectively.
On every SBC node of the Cluster Lab runs the PBS slave. We installed only the MoM component on it, as a daemon that listens to commands from the PBS master to run the assignment jobs isolatedly. The MoM reports back to the master when a job is completed so that the Lab Controller can collect the results from the network storage and send them back to the platform frontend. Finally, the MoM is able to kill any execution job that has exceeded the resources granted to the process.
Although we have taken the time to setup this exotic computing cluster, all the software developed will run smoothly on a standard cluster setup based on commodity computers.

Discussion
This paper presents the first version of our platform and as such, it is undeniably not mature yet. However, the first set of tests done by professors that teach parallel programming and a few students, allows us to analyze some of the advantages they found.

LMS base platform
Most of the major LMS platforms such as Udemy, Coursera, Blackboard or Udacity, among others, provide some form of interface for developing extensions. However, they are paid platforms which limit their adoption for educational institutions in developing countries, like ours. On the other hand, there are open source LMSs such as Open edX, Google Open Online Education, Open HPI, Open OLAT, Forma.lms, and notably Moodle, with wide adoption across the world. At our institution, the platform of choice is Moodle but we chose Open edX because of its simpler user interface and wide adoption by large institutions and reputed enterprises. With this project we have got to know Open edX much better from the technical point of view and are now able to make more informed decisions when picking an LMS for a specific need.

Assignment templates
One feature that has been greatly appreciated is the possibility for the instructor to submit templates for the student assignments. Instructors have the option of submitting incomplete source files for the students to edit and submit entire source files that can be made non-editable by students while still part of the solution. By doing so, students can focus on developing the key parts of the parallel algorithms that the instructor wants to evaluate, without spending too much time on program accessories that are not within the course scope. It is often the case that assignments are either too simple when trying to consider reasonable developing efforts or too long when trying to evaluate problems with appropriate complexity.
This templates' feature also allows the instructor to insert instrumentation code in order to obtain any specific performance metrics for a more in-depth analysis of the student's submissions.

Assignment evaluation
Beyond the basic metrics of correctness and execution time, parallel codes require more metrics: memory increase, strong scaling, weak scaling, kernel times, among others. An important part of the learning process is to be able to observe the impact of different algorithmic and language decisions on the various performance dimensions. Our platform not only computes a grade but also allows the instructor and the student to get an insight into the challenging task of performance tuning. The feedback obtained through these more complete metrics allow the student to better comprehend what he/she is doing and learning.
Naturally, a more elaborated evaluation requires the instructor to spend more time in configuring the assignment. Furthermore, the more metrics he/she wants to evaluate, the more executions he/she needs to configure and the longer it will take for student submissions to execute on the cluster. Nonetheless, it should be noticed that our platform saves the instructor the time he/she would need to spend in creating scripts to perform the tests or even worse, to do them manually.

Management of executions
Job schedulers are mature tools from the HPC community that efficiently manage shared computing clusters. They use priority settings in order to assign resources according the predefined policies. In our case, PBS allows the platform to give priority to student submissions for which performance metrics are critical over student self-tests that are mostly used for verifying correctness. It can also handle specific resource boundaries for each execution to prevent cheating and enable exclusive executions to reduce noise in time measurements. Finally, the job scheduler acts as an abstraction layer between the platform backend (Lab Controller) and the cluster hardware, so no changes in the backend are needed when, for instance, the number of nodes in the cluster changes.

Conclusion
We have developed an Open edx extension for parallel programming assignments. In this paper, we presented its features and the way it is used. Moreover, we argued how its flexibility has a positive impact on the parallel programming learning process.
We found that Open edx has good documentation and community support, and provides an easy-to-use interface for students. However, the installation process was rather annoying, requiring more hacks than we expected. When adding an Xblock one needs to recompile the whole Open edx which is also annoying.
The Xblocks API turned out to be a good way of developing the student view but too limited for the assignment configuration module that required more support from the frontend to make the interface friendly. We eventually developed it with Javascript tools and embedded it easily into Open edx. When compared to Moodle, for example, Open edx has far less configuration options and resources, but perhaps because of this, it presents the student a very easy and linear interface to navigate.
Although we have performed initial tests of the platform with a limited set of instructors and students, a larger scale pilot is required to obtain a more realistic assessment.