Concepts for E-Assessments in STEM on the Example of Engineering Mechanics How to Assess Complex Engineering Problems Electronically

We discuss if and how it is possible to develop meaningful eassessments in Engineering Mechanics. The focus is on complex example problems, resembling traditional paper-pencil exams. Moreover, the switch to eassessments should be as transparent as possible for the students, i.e., it shouldn’t lead to additional difficulties, while still maintaining sufficiently high discrimination indices for all questions. Example problems have been designed in such a way, that it is possible to account for a great variety of inputs ranging from graphical to numerical and algebraic as well as string input types. Thanks to the implementation of random variables it is even possible to create an individual set of initial values for every participant. Additionally, when dealing with complex example problems errors carried forward have to be taken into account. Different approaches to do so are detailed and discussed, e.g., predefined paths for sub-questions, usage of students’ previous inputs or decision trees. The main finding is that complex example problems in Engineering Mechanics can very well be used in e-assessments if the design of these questions is well structured into meaningful sub-questions and errors carried forward are accounted for. Keywords—Engineering Mechanics, e-assessment, STEM, higher education, complex problems

However, the practical part of the exam, where the students have to solve two complex example problems, is still utilizing a classical paper-pencil approach. On the other hand, students, as well as teachers, typically give excellent feedback on the already implemented e-assessments. Thus, it is planned to take automated testing one step further by changing the exam to a full e-assessment, completely substituting the paper-pencil part. Moreover, such an approach might help to further foster objectivity of exams, since automated testing completely omits possible subjectivity in evaluating paper-pencil exams that might occur, especially when teachers personally know their students. Additionally, a substantial time-saving effect is expected once enough questions have been designed. The latter is because obviously, apart from rephrasing certain problematic questions or correcting small typos now and then, full e-assessments do not have to be corrected manually anymore. At the moment such manual corrections take about 330 hours of correction work per year in the authors' case, considering that approx. 1000 individual examseach consisting of 2 paper-pencil example problems -have to be reviewed, where it takes about 10 minutes of correction time for each example-problem. In order to check whether such a switch to full e-assessments is possible in Engineering Mechanics with a reasonable amount of effort, research has been conducted recently within a bachelor's thesis [4]. Its main goal was to show possibilities to test students' abilities to solve complex example problems in an automated way. Furthermore, the aim is to fully retain the level of complexity of current paper-pencil exams. Thus, it is necessary to evaluate errors carried forward in the eassessments in close analogy to manually corrected paper-pencil exams. This, in turn, makes it necessary to consider various approaches as well as types of questions offered by Moodle and third-party plugins to lay out a roadmap towards full eassessments in Engineering Mechanics.
The main questions to answer in this publication are: Is it possible to design complex example problems in such a way that the analytical skills necessary to solve them can be tested with sufficient validity within an e-assessment? Which limitations must be considered when switching from a classical paper-pencil approach to automated testing?

Current Research in Automated Testing in STEM
The use of digital media supporting learning processes is increasing. Still, the implementation of e-assessments, especially in STEM disciplines, poses many difficulties. Creating an environment that is as transparent as possible, in the sense that it does not impose additional barriers to the students' approaches to solve a specific example problem, becomes increasingly difficult as example problems become increasingly complex.
Within a recently finished master's thesis, a PHP-based quiz-application was created to support physics lectures at Graz University of Technology [5,6]. This solution allows for maximum freedom and flexibility when creating questions. Moreover, it is possible to directly implement the quizzes into Moodle using LTI (Learning Tools Interoperability) protocols.
Gamage et al. [7] discuss the effectiveness of multimodal quizzes in teaching and assessing a theoretical engineering course for third-year undergraduate students, Hydraulics and Hydrology. According to that publication, such quizzes are efficient to replace conventional assessments and benefit time-poor academics. The quizquestions used are comparable to the ones discussed in a conference paper of the authors [3]. However, in the present paper more complex problems are discussed. Furthermore, [8][9][10] highlight the benefits of online quizzes such as improving student motivation, enhancing understanding and active learning and deterring cheating if questions are not too easy. When talking about formative assessments immediate high quality and detailed feedback is crucial as it enhances and reinforces student learning [11,12]. An elaborate discussion on students' satisfaction with different formative assessments is provided in [13]. A. Rasila et al. look at the interplay of automated assessments and conceptual understanding in mathematics. They argue that the pedagogical background of teaching and presenting mathematical knowledge is heavily based on concepts that emerged from using traditional books. Thus, information technology is anticipated to be a game-changer in learning and teaching mathematics [14]. These findings are based on experiences with an automated assessment system called STACK [15][16][17][18][19][20], as discussed later in this work. Furthermore, A. Rasila developed a collaborative e-assessment material bank for STACK called Abacus [21] and provides experiences with automatic assessments in the field of mathematics [22][23][24][25], while [26] discusses the use of STACK in circuit theory.
An approach combining e-assessments and learning analytics can be found in [27], where the aim is to develop a simulator that is able to provide automatic but nevertheless personalized feedback to students based on their level of activity. Moreover, activities provided to the students are personalized based on their level of ability. The publication aims to develop a framework for this so-called "assessment analytics".
However, most of the cited research discusses e-assessments in formative scenarios, either in the form of homework assignments or as a general means of learning. Taking the existing literature one step further, the paper at hand details works on (automated) e-assessments in formative as well as (final) summative scenarios.

Types of assessments
The problem of objectivity of oral and the time-consuming evaluation of paperpencil exams is discussed in [28]. These disadvantages can be eliminated by using Learning Management Systems (LMS). When creating such e-assessments it is necessary to not simply transfer the contents to digital media, but rather consider methodical and organizational aspects in order to maintain validity, objectivity, and reliability [29]. Assessments are differentiated as shown in Table 1. Summative assessments are used in order to determine and prove a level of proficiency. Formative assessments, on the other hand, are expected to foster the process of learning while being part of the learning process itself. Diagnostic assessments can further be divided into voluntary self-assessment tests (SAT) as well as qualifying examinations.

Types of questions
Test questions for e-assessments might either be convergent or divergent. Convergent questions are characterized by a clearly defined set of solutions and can, therefore, be realized, e.g., with multiple-choice questions. Whereas the evaluation of such tasks is easy, the biggest challenge for the teacher is to formulate the question and generate reasonable distractors. Convergent questions are best suited for factual knowledge, even though they might also be used in combination with graphical representations in order to assess comprehension [29].
Divergent questions, on the other hand, are best for testing background knowledge, general approaches as well as explanations, as they require the student to work constructively in order to be able to solve the task [29].
Especially for the solution of example problems in the field of Engineering Mechanics, it is necessary to combine theoretical knowledge and computational skills in order to solve the given tasks. Thus, divergent questions requiring the student to provide inputs into blank fields are the preferred choice for the authors. For accompanying convergent questions enough plausible distractors are provided in order to ensure meaningful results.

Learning analytics
Benchmarking has been identified as one of the goals of learning analytics [30]. It might help to spot weaknesses in a learning environment or in certain teaching activities themselves. A detailed literature review of learning analytics in higher education can be found in [31]. The possibilities arising in conjunction with e-assessments are virtually endless and will not be discussed in this work. For more details on the application of learning analytics and its implications, the reader is referred to [32,33].

Prerequisites
The aim of the work at hand is to create a full e-assessment that is as transparent as possible for the students while offering the teachers the possibility to assess a variety of input types. Transparency, in that context, means that the technology used to create and conduct the e-assessment should interfere as little as possible with a specific student's own approach to solve a given example-problem. Furthermore, exampleproblems should be constructed such that grading can be done in close analogy to a classical paper-pencil exam. One of the main aspects considered is the need to take the possibility of errors carried forward into account in automated grading. The first idea is to apply one-way navigation where students are only allowed to navigate forward in the examination environment. After having made a final input for a specific sub-question, the correct answers for that sub-question are displayed to the student to be used for further calculations. This means, however, that students cannot change already registered inputs, so revision of their previous answers is not possible. Moreover, disclosing the information whether a sub-question has been answered correctly has some further drawbacks, e.g., it implicitly hints at the general validity of the approach the student has chosen. More importantly, though, it may deject the student thus creating an undesirable psychological bias on the final outcome of the exam.
A more advanced method to account for errors carried forward is to calculate succeeding sub-questions based on the students' previous inputs. Such an option is available using the STACK question type [15][16][17][18]. This is not only more elegant, but it is also more transparent in the sense that it does not require interfering in the progress of the exam since errors carried forward are taken into account quietly without having to pass the information on to the student. In combination with decision trees, which we are planning to elaborate on in a future publication, there are almost no limits to grading with that second approach.
Additionally, an important aspect is that not only numerical but also algebraic input is accepted as this is very common in Engineering Mechanics. This, of course, requires the underlying algorithm to treat algebraically equivalent inputs as equal.
A further requirement for transparency is a clear layout and easy-to-understand input syntax that allows effortless navigation and a good overview in order to guarantee that the students can completely focus on the example problems and are not confused by the examination environment in any way.

Variable numeric question "Dimension a shaft" using one-way navigation
This example problem requires the student to dimension a shaft. The instructions are followed by an explanation of how to use the exam-environment.
In order to ensure a clear layout, the question is divided into several logical subsections with corresponding headlines displayed in the navigation bar on the left-hand side as shown in Fig. 1. All sub-questions belonging to one subsection are displayed on one page during the exam. The use of different pages allows a clear subdivision of the complex example problem. Moreover, this approach makes it possible to provide certain correct answers to sub-questions wherever necessary. This is done in order to enable the student to always use correct intermediate results, as explained earlier. Of course, the one-way navigation is vital in combination with these subdivisions.
As the student moves forward in the quiz following the predefined direction, already answered questions appear greyed out in the navigation section and canfor obvious reasons -not be changed anymore.  Once the free-body diagram is set up, the values of the forces are to be provided. Such numerical values can best be tested using the variable numeric question type [34]. It allows for random numbers as input data for the given problem thus individualizing the question for each student. This obviously requires the question designer to provide a mathematical function of the input data in a system-specific syntax that reproduces the correct result. The individual numerical values for each student are taken from a range within well-defined limits in order to ensure physically meaningful results for each combination of variables.
To prevent errors from being carried forward all the way to the final solution, different subdivisions are created, as already explained above. Whenever necessary for further calculations, the correct solutions are displayed, which implies that the student cannot change any entries at a later stage. In order to enable the student to use the corrected intermediate results for calculations in his or her preferred way, not only the numerical value is shown, but also a solution including the given variables as shown in Fig. 3. Wherever possible displaying correct solutions is avoided to not discourage the student. In order to be able to solve this specific example problem, it is also necessary to calculate the torsional moment and identify the position of its maximum. This is realized using a multiple-choice single-select question. Asking for the numerical value of the maximum, by contrast, requires the "variable numeric" question type explained above. The syntax to be used to formulate mathematical expressions in that question type is shown in Fig. 4. It is important to note that numerical values must be typed into the answer field by the student in the given unit system but without adding the unit symbol. This must be clearly stated in the working instructions of the exam.

STACK question "Calculate the forces and moments"
STACK question type. STACK (System for Teaching and Assessment using a Computer algebra Kernel) is a question type that utilizes the computer algebra system (CAS) Maxima [35]. Due to its variety of options and possibilities regarding input values, correction and grading, it is suitable for many applications in STEM disciplines.
Creation of the exam question using STACK. A lot of emphasis has been put into a clear layout. To arrive at a well readable representation, proper display of vectors and matrices plays a crucial role. The question at hand has deliberately been chosen in order to experiment with exactly that, as it requires input in the form of vectors. This is, however, not possible with the previously described methods, as variable numeric questions are, e.g., restricted to one input field per question. The question type STACK, on the contrary, provides several input fields allowing for more than one entry to be evaluated within a single question. Moreover, input fields may be displayed in the format of vectors and even matrices automatically, if the linked solution is a vector or a matrix. STACK, furthermore, allows evaluating algebraic expressions.
To exploit that feature, the example problem at hand expects students' inputs to be algebraic, using only the given variables. For grading, the inputs are compared with the solutions with regards to algebraic equivalence, meaning they do not need to be represented in exactly the same way as the master solution.
In this specific example problem, students are expected to calculate the existing forces and moments shown in Fig. 5. The absolute value of the vector S is of particular importance. We are going to describe the step-by-step procedure necessary to set up such an example problem using STACK. STEP 1 -Definition of variables. When starting to create a question using STACK in Moodle, one must define variables, as shown in Fig. 6. They might be used for subsequent calculations or expressions. In this example, the variables have been set to be 3x1 vectors, including initial values given in the description of the example problem.
STEP 2 -Definition of input fields. In the next step, working instructions are given. At this point, input fields are defined as shown in Fig. 7. While creating the question, one can choose the name of the variables to store students' inputs. These variables may be used subsequently for further calculations or display purposes. STEP 3 -Definition of correct answers. The input variables are linked to the correct solutions. This may also be an already defined variable. In the case of the example, the input variable "anss" has been linked to the variable "s" defined in the first step. Here, algebraic equivalence with strict syntax was chosen, as displayed in Fig. 8. Strict syntax requires the student to type in a multiplication sign whenever needed and thus, for example, interprets aligned variables without multiplication signs as one single variable. An example is the occurring variable "rho" which can only be interpreted using strict syntax, as it otherwise would be interpreted as "r*h*o". If the correct solution is a vector or a matrix, the input field will automatically be displayed accordingly.    7. Definition of input fields. Students' input is stored in "anss". STEP 4 -Feedback Variables. Next, so-called feedback variables can optionally be introduced which are accessible during the grading process. These include expressions, initially defined variables or input variables that contain the student's inputs. As shown in Fig. 9, the final solutions which will be used for grading, are calculated based on the input variable "anss" that stores the student's result for the absolute val-ue of the vector S. If the student obtained a wrong value for the vector S but did not make any further mistakes, the subsequent results are still considered correct.
STEP 5 -Feedback Tree. Finally, defining a feedback tree is mandatory when using STACK. One can choose, which variables should be compared in what manner as shown in Fig. 10. Various methods are available, with algebraic equivalence, string, string sloppy and numeric amongst them. For this specific task, algebraic equivalence is a suitable choice. The tree structure allows to add or subtract a certain number of points or even reset the points to zero altogether depending upon the outcome of the check at each node, i.e., if a comparison yields true or false. This even allows to skip certain checks, e.g., if they logically cannot be true due to incorrect prior inputs. The simplest possible feedback tree is displayed in Fig. 11. Considerations regarding more complex feedback trees will be discussed in a future publication.
From a student's point of view, the full example problem is well-structured. After entering the algebraic result, the student clicks the "check"-button to have the inputs displayed in classical mathematical notation, with STACK showing how the input has been interpreted by the underlying CAS, see Fig. 12. A second click on the same button confirms the input.
Advantages of STACK. Clearly, the option to take errors carried forward into account offers a lot of opportunities. It is possible to create complex example problems that process the student's input while providing random numbers as initial values. Hence, there is no need for one-way navigation anymore as discussed earlier. Furthermore, it is not necessary to display the correct values of intermediate results for the purpose of avoiding double-counting of errors. Moreover, the possibility to use algebraic as well as numerical and other types of input formats side by side allows STACK to be used in a wide variety of applications. It is especially helpful when trying to assess complex example problems as this is typically the case in Engineering Mechanics.

Further Remarks
The presented approach aims at fully substituting paper-pencil exams using automated testing. However, solving typical paper-pencil exams in Engineering Mechanics usually requires considerable computational effort. Consequently, students need to be allowed to use draft paper in order to take notes during working on the example problems. Ultimately it is planned to entirely dispense with this draft paper for grading in order to arrive at fully automated testing. During the transition phase, however, it is sensible to collect all the draft papers from the students and cross-check the solutions of the e-assessments with the calculations found on the draft paper. Such a procedure is expected to help gain further insights into how a complex example problem should be designed in an automated testing environment. Thus, these cross-checks can be used to refine questions and the overall design of the exam to finally arrive at a methodologically sound e-assessment for Engineering Mechanics. Furthermore, collecting all draft paper can also help when unexpected problems occur during the eassessment, e.g., leading to incomplete solutions within the LMS. In such a case the draft paper could be used as a safety net to ensure proper grading of the exam. In any case, students will always need to be allowed to use draft paper for calculations, sketches, etc. even if it is ultimately not used for grading anymore. Solving complex example problems by mere "thinking" about the problem is neither deemed practical nor feasible and thus not considered at all.    10. Using feedback trees several elements can be compared using different methods. In this case, "anss" and "s" are compared with regards to algebraic equivalence. Fig. 11. Simplest possible feedback tree always leading to the next node (flipped by 90°). The green path represents correct answers and the pertaining grading scheme, whereas the red one is followed when an incorrect answer has been given.

Fig. 12.
Students can enter their solutions in vector format. STACK previews the input, which is especially helpful in combination with more complex results.

Limitations
As long as the draft paper used by the students during their calculations is collected and can be used as a safety net, limitations are kept within reasonable bounds. Of course, certain classes of example problems are better suited for automated testing than others, while some types of example problems might have to be excluded at all. Furthermore, free-body diagrams can only be asked using drag and drop questions, which is clearly easier than drawing a free-body diagram from scratch as is required in the paper-pencil approach. Nonetheless, generally speaking most of the example problems that can be formulated for paper-pencil exams can also be reasonably transferred to electronic questions using the framework described above. However, when switching to a full e-assessment without taking the draft paper into account anymore the limitations increase. Often in STEM disciplines the particular method, i.e., the path that led to a certain solution is as important as the solution itself and thus plays an important role when correcting paper-pencil exams. Apparently, with fully automated testing as described above this path to the solution becomes somewhat ob-scured and cannot be tested anymore, at least not to the extent of a paper-pencil approach. One possibility to mitigate this problem is to clearly define which sub-steps on the way to a final solution are of special importance and then design the individual questions of an example problem such that these sub-steps can be validly tested individually. In such a case meaningful allocation of points is of special importance in order to ensure a valid exam.

Conclusion and Outlook
The present work has shown possibilities to switch from classical paper-pencil exams to full e-assessments in Engineering Mechanics using the LMS Moodle. Several types of questions are available as a basis to assess complex example problems electronically. Not only numerical values can be evaluated, but also graphical relations, e.g., free-body diagrams can be part of a question which is vital for Engineering Mechanics. Of special relevance for the authors is the possibility of algebraic comparisons using the STACK question type. In combination with complex decision trees and the opportunity to take errors that have been carried forward into account, it is indeed possible to closely digitize classical paper-pencil exams in Engineering Mechanics without sacrificing informative value. Clearly, a lot of effort has to be put into constructing comprehensive e-assessments, e.g., to track the path to a students' final solution. Furthermore, e-assessments have to be evaluated continuously starting with their use in formative exams, e.g., to allow the students to evaluate their own knowledge and monitor their learning progress during a course or lecture or in SAT prior to summative exams. Only then it is advisable to introduce complex problems into summative exams. It is recommended to start with a combination of one electronic and one paper-pencil question to generate further insight into the pros and cons of electronic questions.
Obviously, Engineering Mechanics is by far not the only discipline where sophisticated e-assessments are needed. The challenges faced when transferring complex, computationally intensive example problems from paper-pencil to e-assessments are similar in most STEM fields. Intuitive handling and display, accounting for errors carried forward and complex grading schemes or correct handling of various types of student input, all of those aspects have to be taken into account in order to set up an eassessment environment that is as transparent as possible for the students. Thus, the approach described in the work at hand is considered highly relevant for many STEM disciplines.