Evaluation of the learning state of online video courses based on functional near infrared spectroscopy

Studying brain activity during online learning will help to improve research on brain function based on real online learning situations, and will also promote the scientific evaluation of online education. Existing research focuses on enhancing learning effects and evaluating the learning process associated with online learning from an attentional perspective. We aimed to comparatively analyze the differences in prefrontal cortex (PFC) activity during resting, studying, and question-answering states in online learning and to establish a classification model of the learning state that would be useful for the evaluation of online learning. Nineteen university students performed experiments using functional near-infrared spectroscopy (fNIRS) to monitor the prefrontal lobes. The resting time at the start of the experiment was the resting state, watching 13 videos was the learning state, and answering questions after the video was the answering state. Differences in student activity between these three states were analyzed using a general linear model, 1s fNIRS data clips, and features, including averages from the three states, were classified using machine learning classification models such as support vector machines and k-nearest neighbor. The results show that the resting state is more active than learning in the dorsolateral prefrontal cortex, while answering questions is the most active of the three states in the entire PFC, and k-nearest neighbor achieves 98.5% classification accuracy for 1s fNIRS data. The results clarify the differences in PFC activity between resting, learning, and question-answering states in online learning scenarios and support the feasibility of developing an online learning assessment system using fNIRS and machine learning techniques.


Introduction
Online video courses can break through the limitation of time and space, which has become an indispensable part of modern learning, the scientific and effective evaluation of online education has received the attention of researchers, the actual used methods in online education evaluation are the analysis of educational big data [1] and students' feedback questionnaires [2], which cannot provide completely real feedback on the actual learning situation of students, so the current research combines brain imaging modalities like electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) with online learning from the perspective of attention monitoring, learning ability evaluation and so on [3][4][5], the experimental design of these studies have a certain gap from the real online learning, and the research is mainly carried out by analyzing the data of the learning stage of watching videos, ignoring the data of the answering part of the question, which limits understanding of the different learning states in the whole process of online learning behaviour.Watching videos and answering questions in online learning are different thinking stages, watching videos is the process of receiving knowledge, while answering questions is the process of active thinking and recalling the learning content, and we believe that distinguishing between these two states can help us to better understand the learning behaviour, which is important for building a process evaluation of online learning.
EEG and fNIRS has been applied to several aspects of research in cognitive neuroscience [6,7] and is a powerful tool for tracking changes in brain activation in the field of educational neuroscience [8,9].Brain activity provides a wide range of physiological information, the PFC plays an important role in cognitive activity [10], it is the brain region responsible for decision making, working memory and is also involved in the ability to cope with novelty.EEG is a method of recording the spontaneous rhythmic electrical activity of groups of brain cells using electrophysiological indices, neuronal discharges consume energy with additional metabolic demand, Meidenbauer et al. found that the metabolic demand associated with neuronal activity varies with task difficulty or cognitive load [11], and this change in metabolic demand induces local changes in blood oxygenation in the brain, and Saikia et al. found that an increase in working memory load induced an increase in mean haemodynamic activation [12].fNIRS is a scalp-based optical spectroscopy technique that uses a light source and detector to measure haemodynamic changes in brain tissue.fNIRS can record blood oxygen saturation levels and is sensitive to changes in the concentration of the two main oscillating absorption chromophores in the near-infrared spectral region: oxygenated and deoxygenated haemoglobin [13,14].This property, coupled with the low absorption of water in the same wavelength range, makes it possible to measure the relative concentrations and oscillations of these substances.Zhu et al. found that a model based on fNIRS was able to help categorize cognitive load caused by different changes in task difficulty, and because fNIRS is robust to motion artefacts, it has more advantages than other neuroimaging methods for cognitive load classification [15].Brockington et al. provide proof of concept for the use of fNIRS in scenarios that more closely resemble real classroom routines and daily activities [16].The use of fNIRS for educational research still requires support from methods such as machine learning, which describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks [17].Classification is one of the main tasks of machine learning, common algorithms include Naive Bayes, Support Vector Machine (SVM), and K-Nearest Neighbours (KNN), these algorithms can learn features from the data, find the relationship between the data and the labels and help us discover patterns in physiological signal data and use such patterns to build scoring models.
At present, research on near-infrared learning evaluation mainly focuses on evaluating learning ability and monitoring focus during the learning process.Their research analyzes data during the learning stage and focuses more on learning effectiveness and results, but does not study the differences in brain activity under different task states during learning.Choi et al. obtained fNIRS signals from university students while completing seven cognitive tasks, using College Scholastic Ability Test Levels college entrance scores to represent the students' learning ability, and classified the signals using several machine learning methods including the XGBoost classifier to provide new insights into the relationship between haemodynamics in fNIRS signals and college learning ability test levels [3].Learning ability is relatively stable over time and not under individual control, and the experimental tasks used in the article are classical paradigms that do not reflect the online learning process of students.Chen et al. studied online learning from the perspective of attentional monitoring with EEG devices, comparing the effects of learning with and without cueing of an attentional monitoring system using a learning video and demonstrated the significant effect of EEG attentional monitoring on learners' attentional state [4].Sun et al. seek to determine how pedagogical agents (PAs) influence learning by investigating the influence of PAs on learning outcomes and brain activity during learning and proved that students who learn with a PA perform better on learning outcome tests [5].The difference in NIRS metrics between the first and second 10 minutes of a 30-minute sustained attention reaction time task was compared by [18], which demonstrating that a prolonged attention task results in longer reaction times, and the difference can be reflected in NIRS metrics and allows SVM methods to classify the two types of signals with a high degree of accuracy.These studies all focused on the change of attention during the performance of the task, and were able to differentiate attention levels using EEG or fNIRS.However, they did not compare the change in brain activation levels in the different stages of online learning.Oku et al. collected the signals of fNIRS of the students while watching the astronomy video and answering questions.They used random forests and penalized logistic regression, predicted the correctness of answering the questions and established the relationship between the fNIRS signal while watching the video and the correctness of answering the questions [19], but still did not pay enough attention to the state of the fNIRS while answering the questions.There is no clear distinction in the level of brain activity and activity areas during the resting, learning and answering process, nor is it possible to accurately judge the learning status of students and to use the difference to make process online-education evaluation.
In summary, we make two hypotheses: Hypothesis 1: the resting, learning, and answering states during online learning have large differences in PFC activity levels and Hypothesis 2: machine learning techniques can learn the differences between different learning states, achieving high accuracy judgments of study state using fNIRS data.In order to verify the hypotheses, we performed the following experiments: we gathered the fNIRS signals in three conditions, namely resting, watching educational videos and answering questions.We analyzed and classified them to ascertain the genuine state of the students in the process of online learning, and to link the actual learning situation of the students with the online learning assessment.The aim was to acquire a more scientific evaluation of online education.In this study, we simulated the actual online learning mode to formulate the experiment.The experimental materials were the massive open online courses (MOOC) video.19 university students participated in this experiment.Eight near-infrared channels were installed on both the left and right sides of the prefrontal lobe.An evaluation model was established between prefrontal brain activity and online learning state, which provides a process evaluation for the online video course offered as a reference.

Participants
We enrolled 19 university students (11 males, 8 females; age ranging from 17 to 19 years) from Xidian University in this study.The sample size was determined to be adequate by using the G*Power 3.1 [20] sample size estimation function and by referring to existing studies [21].The subjects we chose are all freshmen students from different colleges, they have entered the university for about 3 months, and have not yet had an in-depth study of the content of their majors; Xidian is known for its electronics majors, and the students' foundation in science is generally stronger than that in liberal arts.Their learning background is more consistent, which is conducive to our selection of suitable experimental materials.All participants were right-handed, had normal hearing and normal or corrected-to-normal vision.The study was conducted in accordance with the Declaration of Helsinki, and all participants signed an informed consent form for the experiment.The length of the experiment ranged from 20-25 minutes.fNIRS data from 4 subjects, which had 2 or more channels with poor connectivity throughout the experiment, were excluded from the follow-up study due to low quality.Of the 15 participants left, there were 7 males and 8 females with an average age of 18 years.

Stimuli
The experimental stimuli used in this paper were all video clips from icourse163.We wanted to simulate a realistic learning situation, to keep the subjects' attention while studying and to ensure that they needed to watch the video carefully to answer the questions, rather than answering with knowledge that they had learned.Therefore, after we recruited the subjects, based on their learning background mentioned above, we first made a questionnaire to investigate students' familiarity with certain courses, the details of questionnaire can be found in Supplement 1 (Table S1), we sent this questionnaire to individual.Based on the results of the questionnaires, courses like polymer chemistry which most students didn't interest in and had not studied systematically were chosen.We selected 13 video clips from 9 courses, each course was taught in Mandarin by a corresponding professional teacher.Each clip lasting about 1 minute (average time: 59.3 seconds) and introduced some conceptual knowledge points within the course by combining speech and PowerPoint presentation, the topic of every clip was showed in Supplement 1 (Table S2).We also designed a total of 45 questions that required the knowledge from the videos to make choices.Pre-video questions was also designed in the experiment to determine again how well the subjects knew about the video, and if the subjects performed better in the pre-video answers, we removed the corresponding data.This design avoided the effect of prior knowledge on the experimental answering time and accuracy, as well as avoiding experimental error caused by differences in working memory load due to subjects' different levels of familiarity with the knowledge and answering questions.

Experimental design
The experimental flow chart is shown in Fig. 1(a).and the experimental procedure was completed in MATLAB using the PSYCHTOOLBOX (PTB).Subjects were required to first complete a 30-second resting state, which acted as a resting control.Then, two 45-second Russian lecture videos were presented to have a control with audio and visual stimuli, but without understandable knowledge to Chinese subjects.After the 30-second break, subjects needed to answer questions about the content of the first video, which could show the participants' familiarity with the content of the video.Subsequently, the learning video was watched.When the video ended, subjects were required to read the questions and options displayed on the screen and press the number key corresponding to the selected option.There was no time limit for answering the questions and subjects could respond according to their own answering habits, PTB would record the content of the participant's key presses and the time taken to answer the questions.Because some of the videos were related in content or belonged to the same course, the 13 videos were divided into 3 groups, 4 videos in each of the first 2 groups and 5 videos in the third group, with a 30-second break between each large group and a short 2-second break between different videos within the group.After all the videos and topics were finished, there was a 10-second break and then the experiment was over.

Data acquisition
We used the multichannel fNIRS system OctaMon (Artinis, Netherlands) with two wavelengths of near-infrared light (760 and 850 nm) to measure oxygenated hemoglobin(O 2 Hb) and deoxygenated hemoglobin (HHb) signal changed from 8 channels with a sampling rate of 10 Hz.
fNIRS probe contained 8 optical fibers with eight emitters and two detectors, distance between emitter and detectors was 3 cm.We set the fNIRS probe at the prefrontal cortex as shown in Fig. 1(b).We also measured the Montreal Neurological Institute (MNI) coordinates of the eight fNIRS probe contained 8 optical fibers with eight emitters and two detectors, distance between emitter and detectors was 3 cm.We set the fNIRS probe at the prefrontal cortex as shown in Fig. 1(b).We also measured the Montreal Neurological Institute (MNI) coordinates of the eight channels based on the MNI IBCM-152 head model and then entered the coordinates into the NIRS-SPM [22] toolbox for spatial alignment.The spatial configuration of the channels [23] (MNI coordinates, anatomical markers and percentage overlap) is provided in Table 1.channels based on the MNI IBCM-152 head model and then entered the coordinates into the NIRS-SPM [22] toolbox for spatial alignment.The spatial configuration of the channels [23] (MNI coordinates, anatomical markers and percentage overlap) is provided in Table 1.

Behavioral data analysis
First, we calculated the correct rate of pre-study answers and post-study answers of 15 students using the information recorded by the PTB program, as shown in Fig. 2(a), and conducted statistical analysis (paired t-test), the results of which are shown in Fig. 2(b).It proved that the students have improved the correct rate of answering by carefully watching the learning video and acquiring the corresponding knowledge during their participation in the experiment.

Machine learning analysis
After conducting a general linear model analysis to compare prefrontal activity levels across states, we employed machine learning techniques to construct a classification model that identifies a student's learning state based on fNIRS signals.This approach enables us to differentiate between states, evaluate the student's processual nature of online learning, and determine the student's learning state accurately.
We divided the pre-processed data in 2.6 into three states: the resting state data is the resting period at the beginning of the experiment, the learning state data is the 13 segments while watching videos, and the answering state data is the corresponding part after each video segment.As the fNIRS signal has a 3-5 second delay of the neural impulse, we removed the first 5 seconds of data for each segment of the three states to ensure a one-to-one correspondence between the fNIRS signal and the state.
Two types of data sets were then constructed for three different data states: the first was a 1s non-overlapping data slice, which sequentially integrated 10 sample points of O 2 Hb and HHb data from 8 channels per second into one row, and provided corresponding state labels to form a 1*161 row vector as a sample.The other type is a 10s data slice with 5s-overlapping, where each 10s slice extracts 7 features for O 2 Hb and HHb, respectively mean, variance, skewness, kurtosis, maximum, minimum and peak-to-peak [26].The 7 features from the 8 channels are sequentially integrated into a row and labelled to form a 1 * 113 row vector.
Both datasets have the problem of unbalanced samples between categories.Therefore, we first split the training and test sets on the original dataset in an 8:2 ratio, and then used the SMOTETomek [27] mixed sampling method to up-sample the samples of categories with smaller sample sizes.At the same time, categories with larger sample sizes were downsampled so that the sample size of the three categories before training the classification model was 1:1:1.The sample ratios before and after sampling for both methods are shown in Table 2.

Data pre-processing
NIRS_KIT [24] was used for the data pre-processing in this paper.The pre-processing consisted of 3 steps, starting with a first stage de-drift, followed by a Temporal Derivative Distribution Repair [25] to enhance the signal quality, and finally a 3rd order Butterworth Infinite Impulse Response (IIR) filter for 0.01-0.08Hz band-pass filtering.A comparison of the signals before and after pre-processing is shown in Fig. 2(c, f).
Based on the answering data recorded by the experimental program, the correct rate of each participant was calculated.The general requirement for universities to award diplomas is an average score of 70 (on a percentage scale) or more, and we divided participants with a correct rate greater than or equal to 0.7 into one group and those with a rate less than 0.7 into one group.There was a significant difference (unpaired t-test) between the two groups in terms of the percentage of correct answers, as shown in Fig. 2(e).

General linear model analysis
In this paper, we used a general linear model (GLM) to analyze the O 2 Hb signal of the preprocessed fNIRS data.Since the answering time of each participant was different, we made a design matrix for every participant based on the time recorded by PTB, as shown in Fig. 2(d), where the white color blocks indicate the conditions corresponding to the sampling points, con1 in the figure indicates the rest condition, the data in this condition1 is from the resting state at the beginning of the experiment and doesn't include the rest part between videos and answers.con2 indicates the watching video condition, con3 denotes the answer video questions condition, and constant denotes the comparison constant.The GLM method was used to obtain β values for each of the eight channels for each participant in each condition, which represents the degree of channel activity for each condition compared to the constant.Paired t-test of β was conducted to determine the mean differences in the degree of activity of each channel across conditions, we used Benjamini-Hochberg procedure to perform false discovery rate correction on the p-value of paired t-tests and presented the corrected significance results in the results section.

Machine learning analysis
After conducting a general linear model analysis to compare prefrontal activity levels across states, we employed machine learning techniques to construct a classification model that identifies a student's learning state based on fNIRS signals.This approach enables us to differentiate between states, evaluate the student's processual nature of online learning, and determine the student's learning state accurately.
We divided the pre-processed data in 2.6 into three states: the resting state data is the resting period at the beginning of the experiment, the learning state data is the 13 segments while watching videos, and the answering state data is the corresponding part after each video segment.As the fNIRS signal has a 3-5 second delay of the neural impulse, we removed the first 5 seconds of data for each segment of the three states to ensure a one-to-one correspondence between the fNIRS signal and the state.
Two types of data sets were then constructed for three different data states: the first was a 1s non-overlapping data slice, which sequentially integrated 10 sample points of O 2 Hb and HHb data from 8 channels per second into one row, and provided corresponding state labels to form a 1*161 row vector as a sample.The other type is a 10s data slice with 5s-overlapping, where each 10s slice extracts 7 features for O 2 Hb and HHb, respectively mean, variance, skewness, kurtosis, maximum, minimum and peak-to-peak [26].The 7 features from the 8 channels are sequentially integrated into a row and labelled to form a 1 * 113 row vector.
Both datasets have the problem of unbalanced samples between categories.Therefore, we first split the training and test sets on the original dataset in an 8:2 ratio, and then used the SMOTETomek [27] mixed sampling method to up-sample the samples of categories with smaller sample sizes.At the same time, categories with larger sample sizes were down-sampled so that the sample size of the three categories before training the classification model was 1:1:1.The sample ratios before and after sampling for both methods are shown in Table 2. To train the model, we used the Statistics and Machine Learning Toolbox in Matlab 2021a.The computations in this toolbox were performed on the CPU, and the training of the model was performed on a computer configured with CPU: i7-10700 K, RAM: 32GB.All models used a 10-fold cross-validation method during the validation process, where the training and validation sets were split 9:1 and tested using data from the test set at last.What's more, because the differences in rest, learning and answering obtained from the analysis in GLM had some symmetry on the left and right prefrontal lobes, we also used a fine Gaussian SVM and fine KNN approach to classify the 1s slice data after pre-processing for channels 1-4 and channels 5-8.

General linear model results
Comparing the rest-learning conditions, as shown in Fig. 3(a).and Fig. 3(d), there was a greater change in the superior frontal gyrus, dorsolateral and orbital part.Most of the channels were less active when watching the study video than resting.Comparing the rest-answer conditions, as shown in Fig. 3(b).and Fig. 3(e), there was a higher level of activity during answer than during rest, again with more significant changes in the dorsolateral prefrontal cortex (DLPFC) and orbit compared to other sites, but no significant results in the channels.Comparing the watching-video condition and answering condition, as shown in Fig. 3(c).and Fig. 3(f), there were greater changes in the middle frontal gyrus, with channel 1 (p = 0.0192) showed significant differences and were more active in all channels when answering questions than when watching video learning.Fig. 3. GLM analysis results graph.(a-c) The comparisons of channel activity for fNIRS con1-con2, con1-con3 and con2-con3, respectively (paired t-test with false discovery rate correction, p < 0.05), were significantly different for multiple channel activity in the 2-3 condition comparisons, the color-bars in (d) to (i) represent the degree of difference between the two sets of data on each channel.When the color leans towards yellow, it indicates that the activity level of state 1 is higher compared to state 2, When the color leans towards blue, it indicates that the activity level of state 1 is lower compared to state 2. (d-f) 3D model of the brain corresponding to the one directly above it, (g-i) Differences in brain activity between the two groups, (g) at rest (con1), (h) watching the video (con2), (i) answering the question (con3) under subgroups with correct rates below 0.7 and above 0.7.
From the three states of rest, learning and answering questions for all subjects, the level of prefrontal activation was basically enhanced sequentially for learning, resting and answering, and there was some symmetry on the left and right frontal lobes, and it may be possible to distinguish the learning state from the signal collected by unilateral changes in blood oxygenation in the dorsolateral and boxed parts of the superior frontal gyrus.
Analysis of the GLM β values for the different correct rate groups yielded the results shown in Figs.3(g) -3(i).In the resting state, the group with a correct rate below 0.7 was more active than the group with a correct rate above 0.7 in the left hemisphere.In the learning state, the group with a correct rate below 0.7 was more active than the group with a correct rate above 0.7.In the answer state, there was no significant difference in the level of activity of the two groups in terms of correctness (p < 0.05).
Basically, the group with less than 0.7 correct was more active than the group with more than 0.7 correct in all 3 states, but there were differences in the magnitude and range of activity and they did not show significant differences in β.The resting state showed a greater difference in activity in the left prefrontal lobe, the learning state showed a greater difference in all channels measured, and the question-answering state showed a decrease in the difference and had some channels in which those with a high rate of correctness were more active.

Classification results
In the Statistics and Machine Learning Toolbox, we compared the use of five different machine learning models on O 2 Hb and HHb, O 2 Hb and HHb data respectively, and the validation and testing results of all models are shown in Tables 3 and 4. When the models were trained on 1s signal slice data, the classification effect without adding PCA was better, the fine KNN had a highest test accuracy, as shown by the bold numbers in Table 3. the fine KNN used Euclidean for the distance metric, and the test accuracy was 98.5% when the number of neighbours was 1 and the training time was 45.53s, the fine SVM also had a good accuracy, it used a Gaussian kernel function, the kernel size was 3.2 with a test accuracy of 97.1% and a training time of 197 s.The confusion matrices for these two models are shown in Fig. 4. Optimal training was performed on the best performing KNN method and the test accuracy was 99.3% after 30 iterations, but the training time was longer at 1.38 hours.When training with 10s data features, PCA needs to be added for further feature selection of the feature data, the model for this dataset performed good in validation, but poorly in test, even in the highest accuracy from the fine Gaussian SVM, the test accuracy was 80.4% (shown by the bold numbers in Table 4), which largely lower than 1s signal classification in terms of accuracy.The 1-slice data of channels 1-4 and 5-8 are modelled separately, and the classification results show that both SVM and KNN methods have a better degree of differentiation for channels 5-8 than for channels 1-4.The validation and testing results are shown in Table 5.The 1-slice data of channels 1-4 and 5-8 are modelled separately, and the classification results show that both SVM and KNN methods have a better degree of differentiation for channels 5-8 than for channels 1-4.

Discussion
In this study, we explored the changes of prefrontal activity in the brain of university students in different states during online learning by analyzing fNIRS signals, designed an experiment with course videos on Mucous Online, compared the differences of prefrontal activity in three different conditions using the GLM model, and classified the signals in the different states using a machine learning algorithm.The results showed that the prefrontal activity intensity was higher when answering questions than both resting and watching videos, whilst the activity intensity during studying was lower in DLPFC than resting; short time data slices of near-infrared signals from students whilst studying could be used to differentiate between the learning states of the learners.

Discussion
In this study, we explored the changes of prefrontal activity in the brain of university students in different states during online learning by analyzing fNIRS signals, designed an experiment with course videos on Mucous Online, compared the differences of prefrontal activity in three different conditions using the GLM model, and classified the signals in the different states using a machine learning algorithm.The results showed that the prefrontal activity intensity was higher when answering questions than both resting and watching videos, whilst the activity intensity during studying was lower in DLPFC than resting; short time data slices of near-infrared signals from students whilst studying could be used to differentiate between the learning states of the learners.
The intensity of activity whilst watching the video and answering the questions was different from that at rest, which is the same as the findings of previous studies [18,28,29].The activity intensity whilst watching the video was lower than that at rest, and we suggest that this may be because the PFC of experimental interest belongs to the default network, part of the task-negative network [30], and when watching the learning video, the changing images in the video may have caused a decrease in blood flow in the medial PFC [31].While many channels in the question answering state were more active than watching the learning video, we believe that the question answering process after learning consists of static pictures presenting the questions, and the visual stimulation decreases dramatically.Whereas watching the video to learn is a guided process of accepting information, which belongs to passive learning, while the process of reading the question and reviewing the video content to answer the questions is a process of outputting the information with active thinking, which belongs to scaffolding learning and requires more energy [32], so the activity level is higher when answering questions compared to watching videos, which is consistent with view of an increase in working memory load causes an increase in mean hemodynamic activation [12].From an emotional perspective, students had to learn new subjects and knowledge points when watching study videos, which will be accompanied by emotions such as nervousness, and may also feel anxious when they don't understand, but when answering questions, there is no time constraint, and subjects can consider carefully.and the right or wrong answer will have no other effect on the follow-up, and their emotions will be relatively relaxed, so the fNIRS in the forehead is more active when answering questions, which is consistent with the fact that the absence of stress and negative feedback lead to a higher PFC active level [33].
There is no significant difference in the fNIRS data of PFC in the three states between students with a correct rate of answering questions higher than 0.7 and lower than 0.7.However, it can be seen from the figure.3that in the learning state, those with a correct rate lower than 0.7 have a higher PFC activity than those with higher than 0.7; and in the question-answering state, those with a correct rate lower than 0.7 have a lower PFC activity than those with higher than 0.7.Correct rate can reflect to a certain extent the ability of students to acquire and use new knowledge.Students with different learning abilities will have different feelings about the same learning material.Students with strong learning ability will find it easy, while people with weak learning ability will find it difficult and will mobilize more resources when watching the MOOC videos.Therefore, people with a correct rate lower than 0.7 are more active in the PFC in the learning state.When answering questions, making a correct judgment requires more accurate recall of the video content, or mobilizing more resources and decision-making strategies, which is a more complex behavior, so those with higher than 0.7 are more active in PFC than those with lower than 0.7 in answering questions.However, the number of people in the two groups in this study was different and there were not many people in each group, so this part of the analysis of the conclusions needs to be verified by more data from the subjects.
In the selection of data for machine learning model classification, 1 second preprocessed data segments and features showed higher than 90% classification correctness.A used the average of 1s fNIRS for right or left hand squeezing imagery guided by an acoustic stimulus [34], suggesting that 1s fNIRS data can reflect differences between different behaviours.This study explores the differences amongst states, as revealed through analysis of the GLM.It is evident that there are fluctuations in the level of activity across different states which have a period duration.For the creation of the dataset, we excluded the initial 5 seconds of data from each state.Each 1s slice is corresponding to a certain state, which is suitable as a sample input to the machine learning classification model.
The experimental design of this study is highly ecological and very close to the real situation of online learning for college students.When selecting the videos, we identified the unfamiliar knowledge areas of the participants through a pre-questionnaire, which could ensure that the participants answered the questions based on the knowledge introduced in the learning videos, and established the relationship between learning ability and the accuracy of answering the questions.Sections introducing the basic concepts of the subject were intercepted from the courses on MOOC as the experimental material, and all the videos contained clips of the teacher explaining in Mandarin and clips presenting only the PPT pictures; and there was no limitation on the participants' time to answer the questions in the experiment.This can try to simulate the real online learning scene in the laboratory environment, and because each video clip is short, it can ensure that the overall experiment time will not be too long, avoiding the impact of fatigue on the experiment due to students' prolonged study.
There are also a few limitations of this study.The number of participants in this study was small, although the difference in the number of male and female subjects was not large, it was not enough to analyze the effect of gender factors on the results of the experiment [35].Some of the questions we designed for the experiment required only recalling the concepts taught in the learning video to arrive at the options, and some required some calculations based on the concepts, so we also have interest to see whether there is a difference in the pattern of the prefrontal lobe activity of the students in their answers to the two kinds of questions.The experiment did not pay much attention to the students' emotions, which limits our analysis in the emotion dimension, and in the future, we will enhance our understanding of the subjects' learning emotions through questionnaires and other forms.We proved the differences in brain activity in different states of online learning and established a classification model with high accuracy, but the results of the study need to be further tested if we want to be used in real scenarios for the evaluation of online education.The current 8-channel experimental equipment is still too complex compared to the equipment for acquiring brain signals that is applied to the study of real learning environments, and the existing KNN model is based on the data of students who have similar learning backgrounds, learning abilities and ages.The evaluation effect on other learners' data with different study backgrounds is still unknown when our model actually used.Therefore, in future work, we will be improving the experimental paradigm by adopting the classical disciplinary model for material video selection, making comparisons between disciplines, and paying attention to the subject's learning mood changes during the experiment.We will also adopt the multimodal fusion approach to obtain more brain regions and dimensional data to conduct a in-depth analysis of students' online learning.We will collect more students' data to build a more reliable classification algorithm model.We pay attention to the changes of students' brain activities during online learning and are committed to developing a set of online learning evaluation system that evaluates the learning status in real-time.

Conclusion
Online video courses can break through the time and space limitations, and simulating realistic online learning environment to study the changes of students' brain activities during the learning process is an emerging but significant field.Current research combines brain imaging modalities EEG, fNIRS with online learning from the perspective of attention monitoring, learning ability evaluation and so on.These studies have been conducted mainly by analyzing the data of the learning stage of watching videos, neglecting the importance of the data of the question-answering part of the study.Therefore, we investigated the differences in PFC blood oxygen levels among the three states of the brain, namely, resting, watching videos, and answering questions.We investigated the differences in blood oxygen levels in PFC, and found that there were differences in the intensity of prefrontal lobe activity levels in different states; the differences in the changes between the learning and question-answering states and the resting state during the online learning process of college students were more concentrated in the DLPFC, whereas the scope of the differences in the video watching and question answering became larger, including part of the frontal-middle gyrus; and KNN was able to achieve a high level of accuracy in classifying the signals of the three states.Our findings determines that the activity level of the PFC in students increases sequentially from watching videos, resting to answering questions, and establishes a model for using fNIRS data from PFC to determine what learning state students are in.Our work uses fNIRS signals to determine the learning status of students, which can monitor their learning states from the perspective of brain activity.This can compensate for the shortcomings of only making outcome evaluations based on behavioral data, which may not necessarily reflect the true learning level of students.It effectively judges the true states of students and improves the quality of online education, which is conducive to making process based online learning evaluations of students.This is very different from the current method of judging whether students are in a learning state based on their attention level, and different from the current educational evaluation that focuses on results.We have a deeper understanding of brain activity under different behaviors in online learning.Our model can not only distinguish between learning and answering states, but also develop into distinguishing whether students are in an active thinking or passive receiving process.This allows us to provide more comprehensive and personalized evaluations of students, Promote the personal development of students and the development of online learning.
However, the data of the subjects are limited, and the subjects' learning background is similar, all of them are college students, the use of the conclusions in this paper is still limited, and the analysis between disciplines and genders has not been carried out yet, and there is not enough investigation on the learning mood, in the future, we will make efforts in these directions to improve the model's evaluation ability.

Fig. 2 .
Fig. 2. Changes in correct answer rates before and after learning (a, b), data preprocessing effects (c, f), grouping of students by overall correctness (d) and GLM design matrix for a selected student (e).(a) Correct answer rate for all participants (blue: before learning, purple: after learning), (b) Statistical analysis of correct answer rates before and after study, (c, f) Comparison of data before and after pre-processing, (d) GLM design matrix for one participant, (e) Correctness analysis grouped by 0.7 correctness.

Fig. 2 .
Fig. 2.Changes in correct answer rates before and after learning (a, b), data preprocessing effects (c, f), grouping of students by overall correctness (d) and GLM design matrix for a selected student (e).(a) Correct answer rate for all participants (blue: before learning, purple: after learning), (b) Statistical analysis of correct answer rates before and after study, (c, f) Comparison of data before and after pre-processing, (d) GLM design matrix for one participant, (e) Correctness analysis grouped by 0.7 correctness.

Fig. 4 .
Fig. 4. Confusion matrix for fine KNN and fine Gaussian SVM models, (a) fine Gaussian SVM Confusion matrix for validation, (b) fine Gaussian SVM Confusion matrix for testing, (c) fine KNN Confusion matrix for validation, (d) fine KNN Confusion matrix for testing

Fig. 4 .
Fig. 4. Confusion matrix for fine KNN and fine Gaussian SVM models, (a) fine Gaussian SVM Confusion matrix for validation, (b) fine Gaussian SVM Confusion matrix for testing, (c) fine KNN Confusion matrix for validation, (d) fine KNN Confusion matrix for testing