1 Introduction

With advancement of new emerging technologies such as Internet of things (IoT) (Zhou et al. 2021; Failed 2020a) in educational environments, e-learning systems have used to intelligent techniques for increasing performance of learning in cloud-based computer‐assisted learning applications (Zhou et al. 2019a). In higher education and e-learning systems, the coronavirus (COVID-19) epidemic conditions have caused a major troubles and technical problems to universities (Demuyakor 2020), institutions and research centers across all countries, which many universities, institutions and research centers have led to virtual education systems such as learning management systems (LMS) (Díaz-Antón and Pérez 2006). Unexpected changes on the educational system have caused students and teachers to face many problems in how to teach and understand the course concepts in the context of virtual education. IoT-based assisted learning systems have been applied popularly in modem societies such as smart devices (Lv et al. 2021a), sensors, radio-frequency identification (RFID) and actuators. More and more data have been accumulated in cloud data centers of virtual education systems and there are many conceal and unknown patterns or knowledge in student and teacher information based on behavioral aspects of teaching and learning in virtual education systems. So, data mining as powerful technique is to search and detect valued data pattern or knowledge. Analyzing quality of experience (QoE) factors (Jiang et al. 2021) for behavioral aspects of teaching and education between lecturers and students is a critical issue based on COVID-19 side effects in virtual education systems.

Data mining is the process of abstracting unknown but latently useful information and knowledge hidden in numerous, uncompleted, noise-interweaving, blurry, stochastic and practical application data (Lou et al. 2021a). Based on data mining techniques including association rules mining, classification, clustering, sequence pattern, prediction and trend analysis, predicting QoE factors can be influenced on performance of virtual education systems for next semesters in universities and colleges (Lv et al. 2021b; Zuo et al. 2017). According to above-mentioned issues, this paper presents a prediction model based on association rules mining as one important and basic technique and supervised techniques focused on behavioral aspects of teaching and learning to evaluate the QoE factors of virtual education systems. The suggested model applies IoT-based assisted learning systems to continuously monitor the behavioral aspects of students and teachers according to existing structural attributes in virtual education systems. The presented model detects the indications of interactive negotiations between a third party framework to provide e-learning environments for students and teacher. The proposed prediction model is developed for detecting the QoE factors and identifying changes in educational levels of students and teachers with data mining methods. The main contributions of this paper are as follows:

  • Proposing a prediction model for evaluating performance of virtual education systems based on COVID-19 side effects.

  • Detecting behavioral aspects of teaching and learning based on QoE factors in virtual education systems.

  • Applying association rules mining and classification algorithms to predict behavioral aspects of teaching and learning based on QoE factors.

  • Evaluating the behavioral aspects based on QoE factors using association rules mining and supervised methods to predict the e-learning satisfactoriness in virtual education systems.

The structure of this paper is organized as follows: Sect. 2 presents a brief review and analysis of the current related works in this field. Section 3 presents the proposed prediction model in detail that includes the behavioral aspects based on QoE factors for students in the virtual education system. Section 4 demonstrates the experimental results of the suggested prediction model over statistical examination using existing supervised methods. Section 5 provides the conclusion and new research directions as future work in this field.

2 Related works

This section illustrates a review on some relevant studies that applied data mining methods on educational systems, e-learning models and teaching styles.

Ashraf, et al. (Ashraf et al. 2020) proposed a prediction approach to evaluate academic student papers in educational data mining. They used boosting algorithm with a combined synthetic minority oversampling technique and J48 classifier to compare with Naïve Bayes algorithm. The main weakness of this research is ignoring feature selection and data preprocessing on educational data set.

In other work, Isma’il et al. (2020) presented a new course recommender model to predict the expenditure of admission positions for undergraduate students. They analyzed admission procedure to some departments and schools such as health and medical science, business and economics, agriculture and management faculties. In this research, the authors did not discussed about main attributes of data set and their values. Also, there is no novelty in the prediction model. Just they have applied Naïve Bayes (NB), support vector machine (SMV), Kth-nearest neighbor (KNN) and decision tree (DT) classifiers. Finally, they have evaluated accuracy as only prediction factor in this research without any cross fold integration.

According to Troussas, et al. (Troussas et al. 2020), the authors have proposed a new prediction method for detecting interaction activities of students in learning modalities. In this method, important activities such as student traits are detected using fuzzy logic and refined important rules are analyzed as fuzzy output based on a decision-making approach. Just 40 samples as student activities were applied to detect important learning modalities. Also, there is no technical evaluation on accuracy, precision and recall factors.

Also, Aslam (2019) presented a new statistical neutrosophic testing analysis for predicting teaching test cases for students in university. In this analysis, an enhanced analysis of variance (ANOVA) algorithm was applied to evaluate important aspects of students in university. The main weakness of this research is that the authors ignored feature selection to improve the quality of the testing analysis in t-test and p-test strategies. To analyze physical aspects of education students, Zhu (2019) presented an association rules mining method based on a mutual exclusion model on the Apriori algorithm to minimize the time complexity of rule mining. The author in this research has ignored the classification method on the optimal selected features based on important extracted rules. In the same work, Qiang (2019) proposed a prediction method to enhance the correctness and increase the performance of the physical education based on stable split argument and field programmable gate arrays (FPGA). There is no evaluation on selected features based on the FPGA method using C4.5 classifier. Also, the author just evaluates the proposed model without any comparison of other algorithms.

In prediction of e-learning systems, Azzi et al. (2019) presented a new classification model for categorizing the learning style of the teachers in e-learning environments. The authors extracted behavioral aspects of the teachers on different perspectives. Also, they applied the fuzzy C-means algorithm to cluster existing behavioral aspects of teachers. The authors just compared the number of courses in the evaluation results without comparing other clustering approaches. Daghestani et al. (2020) presented a gamified component-based learning architecture for the learning management system (LMS). The authors evaluated the proposed architecture using some classifiers such as NB, KNN and DT algorithms. In the evaluation results, some important prediction metrics such as error detection, accuracy and sensitivity have been ignored.

In other work (Assunção Flores and Gago 2020), authors have discussed advantages and weaknesses of teaching aspects on educational systems during COVID-19 pandemic in Portugal. Also, they provided a synthetic analysis on the existing remote teaching techniques for educational context as theory and practice. Also, to analyze COVID-19 side effects on online education systems (Chen et al. 2020), authors provided various aspects of students satisfaction for educational system during COVID-19 pandemic. They analyzed existing related coefficients of the effectiveness factors with applied some rule-based hypothesis. The authors just evaluated the applied data set using neural network classification. Also, some critical factors to detect teaching aspects such as accuracy and recall have been ignored.

Mitrofanova et al. (2019) provided a conceptual classification and analysis of refining teaching courses and educational methods using data mining techniques. They categorized technical aspects of educational data mining based on some important factors such as adverse student behavior, social network analysis, knowledge tracing using data mining tools. There is no experimental and evaluation results on the technical aspects in the proposed classification.

Finally, a comprehensive review was presented for evaluating educational data mining using soft computing and learning techniques in Charitopoulos et al. (2020). The authors categorized existing work in domain 2010–2018 relevant to soft computing methodologies for machine learning-based educational systems. This paper evaluates 148 relevant research studies related to soft computing techniques for educational data mining with respect to e-learning context, teaching methods and learning management systems (LMS). The main weakness of this survey is that the authors ignored presentation of a technical taxonomy for categorization of classic and soft computing methodologies on educational data mining.

3 Proposed QoE-based prediction model

The evaluation of teaching is an important job operated by colleges frequently. We can make use of the results of evaluation, such as the teachers’ proportion score, satisfactory level of e-learning system, assessment of learning-assisted equipment to comprehend the total teaching and studying circumstance of some departments or the whole university. But what factors lead to good or bad results of evaluation? In other words, which behavioral aspects are effective in the virtual education system and which learning-assisted techniques have not sufficient performance? Aiming at this question, data mining techniques can detect evaluation of behavioral aspects of teaching and learning based on QoE factors to make full advantage of original data of students’ score for teachers and virtual education systems in recent semester to find out hidden rules and pattern. According to above-mentioned discussion, Fig. 1 shows a QoE-based prediction model to evaluate performance of virtual education systems using data mining techniques. This model includes four main phases: (1) virtual education system platform, (2) data preparation, (3) rule mining and (4) prediction model.

Fig. 1
figure 1

QoE-based prediction model for virtual education system

In virtual education platform, students and teacher login to the system with respected username and password. Students can use a categorization of e-learning courses and teacher can choose his/her specified course to teach. Teacher can use existing computer-assisted learning methods such as presenting PowerPoint and PDF files, screen capturing, audio capturing, video capturing, file sharing, e-handwriting and many design skills. The collection of these learning methods in the LMS platform as virtual education system can be considered for student to achieve best knowledge for each course section. In other side, the reaction of each of e-learning tools to student can be specified as behavioral aspects in QoE factors from students after finalizing e-learning courses. After completion of each course, students complete specified questionnaire based on behavioral aspects of virtual education system platform. Each filled questionnaire gives a conceptual QoE-based behavioral aspects from e-learning satisfactory level for student in virtual education system.

In data preparation phase, all QoE-based information are gathered to feature collection database. To achieve optimal performance of the prediction method, a data preprocessing is required to clean the existing QoE factors from the virtual education system. Furthermore, the normalization procedure is used for reducing the dimension of data for easier classification processes in the proposed prediction model. Some QoE attributes have nominal domain and others have numerical domain. Based on feature normalization method, all QoE attributes are reconfigured to same domain.

Rule mining phase illustrates a knowledge-based rule extraction according to association rules mining methods. Then, the behavioral aspects of the QoE-based virtual education system are specified. Based on the above QoE factors as behavioral features in virtual education system, we can refine important association rules in the data set to achieve a sufficient pattern for predicting effective behavioral aspects of QoE-based virtual education system.

Finally, in prediction phase, machine learning is provided to detect unknown patterns of the proposed data set with train and test procedures. Existing prediction factors such as accuracy, precision, recall and f-measure are evaluated to examine performance of prediction model.

3.1 Data collection and preprocessing

According to the proposed prediction model on QoE factors, we used the LMS platform based on virtual education system for gathering data and instances of electronic courses.Footnote 1 Table 1 shows the content of each QoE factor which is made e-learning platform including all features of behavioral aspects for teaching and learning in English education systems. The existing QoE factors were collected using evaluation of virtual education system by students in 32 features.

Table 1 QoE factors as behavioral features in virtual education system

In the QoE factors, five basic features are illustrated for each student including gender, age, academic degree, academic field and employment status. In addition, six academic fields are including computer, electrical and mechanical engineering, economic, social sciences and accounting. Also, 25 questions have been provided to specify evaluation of virtual education system. The main concept of these 25 questions is related to evaluation of behavioral aspects of students based on the existing computer-assisted learning methods in virtual education systems. Some of the existing questions are related to technological aspects of e-learning and rate of accessibility of students to virtual education infrastructures. We illustrated description of each question in Table 1. Finally, two features as the GPA with rang of [0–20] and status of each student have presented to illustrate degree of satisfactory each student based on e-learning in virtual education system.

After illustrating data collection, the data preprocessing phase is required to clean the existing QoE factors from the virtual education system. Furthermore, the normalization procedure is used for reducing the dimension of data for easier classification processes in the proposed prediction model. We finalized 543 samples as student’s information for the proposed prediction model based on the QoE factors in applied virtual education system.

3.2 Association rules mining method

Association rules mining is to search interesting relationships, which determine that what will happen coincidently, among data (Wu et al. 2018; Zhu et al. 2020). Based on the above QoE factors as behavioral features in virtual education system, we can refine important association rules in the data set to achieve a sufficient pattern for predicting effective behavioral aspects of QoE-based virtual education system. For example, a computer engineering (CE) student who was satisfied based on behavioral aspects of virtual education system with “Very Good” (VG) grade and achieved A score in total average probably gain a succeed grade in final status, then we can achieve to an association rule between existing behavioral aspects based on QoE factors CE =  > VG =  > A =  > succeed. This is an instance of association rules mining. The formal description of association rules mining is as follows: I = (il, i2, i3ik) is a set consisted of k different elements, each element is called an item, the set is called items set. The set which includes k items is called k items set (Lou et al. 2021b; Bai et al. 2021).

Postulated a transmit data, each transmit T in which is a subset of the data item I, that is T \(\subseteq \) I, and only if X \(\subseteq \) T, we say transmit T includes items set X. So the mining association rules is the implication such as "X =  > Y", in which X \(\subseteq \) I, Y \(\subseteq \) I, X \(\cap \) Y = \(\Phi \). The mining association rules X =  > Y are well-founded in the transmit database and have support degree and confidence degree (Abedini and Zhang 2021; Huang et al. 2020). Support degree expresses that s% transmits of D include X \(\cup \) Y. (This sign presents a set which supports both X and Y.) Confidence degree expresses that c% transmits of those which includes X also include Y, the description is: confidence (X =  > Y): the number of transmits which includes both X and Y/the number of transmits which includes X (Zhang and Wang 2020). The user will give a MinSup and a MinConf, for items sets X and Y, if the confidence degree of the rule X =  > Y is not lower than MinCon, we called this rule as a mining association rules. From semantic angle, the confidence degree expresses the correct degree of the rule (Liu et al. 2020); support degree expresses what percent’s objects we can induce from this rule, vise the impatience of this rule to all data. For example, among 200 student grade records there are 30 records which expresses that capability of learning is A, and among these 30 records there are 15 records which expresses the grade of extracurricular activities is C. So the rule that the learning grade is A implies that the extracurricular activities grade is C has the confidence degree C = 15/30 = 0.5 and the support degree S = 15/200 = 0.075. Figure 2 shows a conceptual Apriori algorithm in association rules mining (Shi et al. 2020a).

Fig. 2
figure 2

Apriori algorithm in association rules mining (Wu et al. 2009; Lv 2021)

After bring forth the association rules mining, bring out the core algorithm of this miring-Apriori algorithm. This algorithm can be seen as the classic algorithm of association rules mining, the core thought is as follows (Failed 2020b): (1) to find out all the frequent item sets in the transmit database. The so-called frequent item sets is the item sets whose support degree is higher or equal to MinSup. Now we simply descript the first step of this algorithm as follows (chart 1): This algorithm uses the recursion method. First, it generate the set of frequent 1 item sets Ll, then the set of frequent 2 item sets L2, when some r make Lr empty, the algorithm ceases (Zhang et al. 2019). Here, in the k round circle, it first generates the set of all candidate k item sets, then scans the database to compute the support degree of each item sets and preserves Lk consisted of the k item sets whose support degree is higher or equal to MinSup. The generation of Ck needs to suffice such demands: (1) relation: each item sets in Ck is generated by the connection of two item sets of Lk−1. These two item sets have the same first k−2 items and the different last item (k−l item). (2) Pruning: Because of any non-empty frequent item sets of frequent item sets is a frequent item sets, so after the connection of the former step, it needs to delete all that item sets which has non-frequent subset. This is called pruning. The application of pruning lowers the cost of the calculation of all the candidate item sets’ support degree and improves the performance of the algorithm (Yin et al. 1045). To apply association rules mining, we used Weka 3.9 software installed on a system with operation system window-64-bit, 8 GB (RAM), with an intel ® Core ™ i5-6200U CPU.

Figure 3 illustrates extracted association rules using Apriori algorithm for QoE factors of student in virtual education system according to the proposed data set. We show ten important rules with high confidence score in existing QoE mining. Each confidence score as a context-based threshold illustrates percentage of acceptance for matching dependencies between extracted rules. Based on existing confidence scores, the maximum confidence scores are related to rules 4 and 9 with score “1”. Also, the minimum confidence score is related to rule 7 with score “0.84”.

Fig. 3
figure 3

Extracted association rules using Apriori algorithm for QoE factors

3.3 Classification approach

In classification process, existing classification techniques are applied to evaluate the proposed prediction model on the collected QoE instances. We applied machine learning algorithms such as multilayer perceptron (MLP) (Yu et al. 2021), C4.5 (J48) (Zhang et al. 2021), KStar classifier (Gholipour et al. 2020), sequential minimal optimization (SMO) (Wang et al. 2021), K-nearest neighbors (IBk)(Xu et al. 2018), Random Forest (Chen et al. 2021), Naïve Bayes (NB) (Li et al. 2021) and hybrid J48 + BinarySplites algorithm (Zuo et al. 2015). In the collected QoE data set, 70% of the instances are divided as train and 30% of them are applied as testing data set. Also, the k-fold cross-validation technique is employed in which the data set is separated into k mutual-exclusive folds of close equal size in a random way for k times training and testing of the classification process. The evaluation factors of prediction are considered as the total number of right classifications, which is divided by the instances in the data set in cross-validation method.

4 Experimental results

To examine the efficiency of the proposed prediction model, five main predictive factors including accuracy, precision, recall, F-measure and execution time are considered. Tables 2 and 3 show the mentioned performance evaluation attributes and prediction factors with equations concerning the confusion metrics which are usually employed for measuring the performance of machine learning classifiers.

Table 2 Performance evaluation attributes to detect prediction parameters (Zhou et al. 2019b; Alam et al. 2021)
Table 3 The prediction’s parameters of the supervised method (Xie et al. 2018; Chao et al. 2020)

K-fold cross-validation method is employed for reducing the bias of random choice of instances in the training step for assessing the accuracy of different classification methods. Typically k-fold cross-validation technique is a common procedure in which the folds are provided from the same portion of labels for creating dissimilar data sets (Shi et al. 2020b). Here, for performance evaluation of the applied classifiers, stratified k-fold cross-validation is employed with values between 5, 10, 15 and 20 for k-folds.

The results from different classification methods are presented in Fig. 4 to Fig. 8. The figures show the evaluation of the performance of the mentioned classification algorithms based on some different cross folds values. As exposed in figures, all the methods of classifications have reached approximately close and satisfactory outcomes according to the accuracy, precision, recall and F-measure values.

Fig. 4
figure 4

The accuracy of each classifier algorithm with 20 cross-validation fold

According to Fig. 4, the J48 + BinarySplits classifier showed an accuracy of 95.9% than other classifier algorithms. Likewise, hybrid J48 + BinarySplits algorithm has the highest quality regarding the results. Hence, it can result that in the QoE-based prediction model, the J48 + BinarySplits classifier has the maximum performance among the others.

Figure 5 illustrates the evaluation of precision factor to show how many selected QoE factors in all features of data set are relevant to detect satisfactory status of students. We observed that the hybrid J48 + BinarySplits classifier has the maximum level of the precision factor with 95.8% to detect relevant selected features for predicting satisfactory status of each student in evaluation of virtual education system. Also, SMO and IBK classifiers have 94.2% and 93.7% for achieving maximum precision factor based on QoE data set.

Fig. 5
figure 5

The precision factor of each algorithm with 20 cross-validation folds

To describe the performance of recall factor in Fig. 6, we conclude that the hybrid J48 + BinarySplits classifier has the maximum level of the recall metric with 96.8% to select relevant features for predicting satisfactory status of each student in evaluation of virtual education system.

Fig. 6
figure 6

The recall factor of each classifier algorithm based on QoE data set

According to Fig. 7, by comparing the f-measure factor of the existing classifier algorithms, it is found that the hybrid J48 + BinarySplits classifier has the highest recognition f-measure, which can reach 96.2% than other algorithms. Also, SMO classifier has 93% for evaluating f-measure factor.

Fig. 7
figure 7

The F-measure of each classifier with existing cross-validation folds

Nevertheless, since the execution time of the employed classifiers is an important assessment factor, Fig. 8 shows that the KStar classifier has achieved the minimum execution time with a major difference comparing to the other algorithms. The MLP has highest execution time than other classifiers to achieve the prediction results.

Fig. 8
figure 8

Average execution time for applied classifier algorithms

After applying Apriori algorithm on the data set to select high confident features, we have executed existing classification algorithms, respectively. Based on Fig. 9, the J48 + BinarySplits classifier showed an accuracy of 98.3% than other classifier algorithms when Apriori algorithm was applied to feature selection. Also, hybrid J48 + BinarySplits algorithm has the highest accuracy in the QoE-based prediction model.

Fig. 9
figure 9

The accuracy of each classifier algorithm after applying Apriori algorithm

Figure 10 illustrates the evaluation of precision factor to show how many selected QoE factors in all features of data set are relevant to detect satisfactory status of students. After applying Apriori algorithm, the hybrid J48 + BinarySplits classifier achieves to the highest level of the precision factor with 98.8% to detect relevant selected features for predicting satisfactory status of each student in evaluation of virtual education system. Also, J48 and IBK classifiers have 96% for achieving maximum precision factor based on QoE data set, respectively.

Fig. 10
figure 10

The precision factor of each algorithm after applying Apriori algorithm

To describe the performance of recall factor in Fig. 11, we conclude that the hybrid J48 + BinarySplits classifier after performing Apriori algorithm has the maximum level of the recall metric with 99.3% to select relevant features for predicting satisfactory status of each student in evaluation of virtual education system.

Fig. 11
figure 11

The recall factor of each classifier algorithm after applying Apriori algorithm

According to Fig. 12, by comparing the f-measure factor of the existing classifier algorithms, it is found that the hybrid J48 + BinarySplits classifier with feature selection of Apriori algorithm has the highest recognition f-measure, which can reach 99% than other algorithms. Also, IBK classifier has 97% for evaluating f-measure factor.

Fig. 12
figure 12

The F-measure of each classifier after applying Apriori algorithm

According to the extracted association rules based on QoE factors in Sect. 3.2, we show statistical analysis on classified existing QoE factors. Figure 13 shows classification of existing academic fields based on level of satisfactory scores. Blue color describes class of succeed satisfaction, red color shows average satisfaction and green color depicts class of damaged satisfaction score. According to this classification, we observed that maximum number of succeed satisfaction related to computer engineering students. Little number of electrical engineering students have succeed satisfaction score in learning procedure of virtual education system. It is observable that all economics students have damaged in their learning procedure of virtual education system. Also, social sciences and accounting students have negative performance in their learning procedure of virtual education system.

Fig. 13
figure 13

Classification of existing academic fields based on level of satisfactory scores

Figure 14 illustrates classification of existing academic degrees based on level of satisfactory scores. We have three academic degrees bachelor, master and PhD that have different density between satisfactory statuses including succeed, average and damaged. According to the classified academic degrees, bachelor students have damaged on virtual education system in COVID-19 side effects. Also, PhD students have little damage than master student on the e-learning environment based on QoE factors in virtual education system. Finally, Fig. 15 shows categorization of GPA scores based on level of satisfactory status we observed that some students with the GPA scores more than 15.75 have succeed on the virtual education system. Of course, many students have average satisfactory status for the virtual education system. Finally, students with the GPA scores lower than 15.75 have damaged using the virtual education system.

Fig. 14
figure 14

Classification of existing academic degrees based on level of satisfactory scores

Fig. 15
figure 15

Classification of GPA based on level of satisfactory scores

5 Conclusion

In this paper, a new QoE-based prediction model was proposed for evaluating performance of virtual education systems in COVID-19 pandemic. The proposed model used association rules mining and classification algorithms to capture the direct and indirect behavioral aspects of virtual education system on satisfactory level of students in e-learning. The Apriori analysis evaluated important rules in QoE factors to specify student sentiments in the virtual education system. The positive relationship established between important QoE factors and satisfactory status of students that valuable for the improving virtual education system architecture in next semester. Then, some important classification algorithms have been applied to predict important metrics of behavioral aspects of students based on QoE factors in the virtual education system. The experimental results showed that our hybrid J48 + BinarySplits classifier after applying Apriori algorithm has an accuracy of 98.3%, precision of 98.8%, recall of 99.3% and f-measure of 99% than other classifier algorithms to predict maximum number of succeed satisfaction related to behavioral aspects of students based on QoE factors. As limitations of this research, we can suggest applying a meta-heuristic algorithm for feature selection of educational data preprocessing. In the future work, some meta-heuristic algorithms can applied to improve the feature selection strategy based on finding important behavioral aspects of QoE factors in virtual education system.