A Concise Fuzzy Rule Base to Reason Student Performance Based on Rough-Fuzzy Approach

A fuzzy inference system employing fuzzy if then rules able to model the qualitative aspects of human expertise and reasoning processes without employing precise quantitative analyses. This is due to the fact that the problem in acquiring knowledge from human experts is that much of the information is uncertain, inconsistent, vague and incomplete (Khoo and Zhai, 2001; Tsaganou et al., 2002; San Pedro and Burstein, 2003; Yang et al., 2005). The drawbacks of FIS are that a lot of trial and error effort need to be taken into account in order to define the best fitted membership functions (Taylan and Karagozoglu, 2009) and no standard methods exist for transforming human knowledge or experience into the rule base (Jang, 1993).


Fig. 1. The proposed Fuzzy Inference System
This chapter is divided into six sections. Section 1 is the introduction and the problem statements. Section 2 discusses about the student modeling and learning criteria. Section 3 presents the Human Expert Fuzzy Inference System model that defines the data representation and the rule base acquired from the human experts. Section 4 describes the ANFIS approach to form a complete fuzzy rule base to solve the problem of incomplete and vague decisions made by human. Section 5 presents the proposed Rough-Fuzzy approach to determine important attributes and refine the fuzzy rule base into a concise fuzzy rule base. Finally, section 6 presents the conclusions of the work.

Student modeling and the learning criteria
Student model represents the knowledge about the student's behavior and learning performance. In this work, student's performance are classified into three categories, named as Has Mastered (HM), Moderately Mastered (MM), and Not Mastered (NM). The conditions that determine the decision made about the student's performance is also depend on the criteria set by the human expert. There are four input conditions namely, the score (S), time (T), attempts (A), and helps (H) in which each of the input condition is represented by three term sets with values (Norazah, 2005). a. Score (S) is the average scoring, x 1 , which gains from each question of a learning unit and the term sets is represented by low (S 1 ), moderate (S 2 ), and high (S 3 ). It can be found by dividing the total marks for a set of given questions by the total number of questions (Q) in the set, as shown in equation (1).

www.intechopen.com
Where : m i is marks from each question Q is total number the question in the set b. Time (T) is the average duration, x 2 , taken by a student to answer the each question of a learning unit and with three term sets: fast (T 1 ), average (T 2 ), and slow (T 3 ). The average of time (x 2 ) is obtained by dividing the total time to answer a set of given questions by the total number of questions, see equation (2).
Where : is total number of questions is the time spent to answer the i-th question Measurement of time can be done by using the distribution method. Fig. 2 shows the T-score distribution, in which the mean is 50 and the standard deviation is 10.
Where : is the time spent to answer the i-th question is the time spent by the student is mean score for the time spent distribution is the standard deviation for the i-th question The numbered "10" is distance value of standard deviation from mean, while numbered "50" is value of mean. T-score is divided by 100 so that able to get the value in the range 0 to 1.
c. Attempt (A) is the average number of tries , x 3 , for a given learning unit, in which it is counted after student give a wrong answer for the first attempt and the question will repeat again for student to answer until correct. The term sets involve: a few (A 1 ), average (A 2 ), and many (A 3 ). The average of attempt (x 3 ) is calculated as equation (4). Dividing the total number of tries to answer a set of given questions by the total number of questions in the set. The average amount of help (x 4 ) is calculated as equation (6), by dividing the total amount of help accessed by a student in answering a set of given questions by the total number of questions in the set.
Where : is the total number of questions h i is the total amount of help accessed by a student The amount of help (ℎ ) is found by calculating the number of help ( ) links that a student accessed while answering a given question and dividing it by the maximum number of help links ( ) provided for a given question.
The output consequent of the student model is the student's performance and can be represented as has mastered (P 1 ), moderately mastered (P 2 ) and not mastered (P 3 ) for the output. A student is classified as has mastered in a particular learning unit, when the student earns high scores (i.e. greater than 75%) with below 40% of time spent, not exceeding 25% of number of tries needed and number of helps. Besides that, a student is classified as moderately mastered when the student earns a moderate score, with moderate time spent, tries more than once, and number of help needed. For example, a moderate score would be rated in between 35% and 75%, time spent between 40% and 60%, tries between 25% and 75%, and help between 25% and 75%. Furthermore, a student is classified as not mastered when the student has a low score with a lot of time, many tries and many help needed. However, in acquiring knowledge from the human experts is that, they cannot decide on all www.intechopen.com possible students learning performance. Bases on a survey done by Norazah (2005), there are only 18 decisions about the student's behavior are formed with certainty from seven subject matter experts; and these decisions are considered as the acceptable rules. All other decisions that are not certain and have conflicts are being discarded from the rules.

Human expert Fuzzy Inference System
Human expert's FIS uses a collection of fuzzy membership functions and rules to reason about student's performance. FIS consists of a fuzzification interface, a rule base, a database, a decision-making unit, and finally a defuzzification interface.
To compute the output of this fuzzy inference system given the inputs, four steps has to be followed (Norazah, 2005): a. Compare the input variables with the membership functions on the antecedent part to obtain the membership values of each linguistic label. This step is called fuzzification. b. Combine the membership values on the premise part to get firing strength of each rule. c. Generate the qualified consequents or each rule depending on the firing strength. d. Aggregate the qualified consequents to produce a crisp output. This step is called defuzzification.

Fuzzification
In the fuzzification stage, the input and output of the fuzzy inference system are determined. variables, in which each of the variables consists of three term values and labels as discussed in Section 2. The fuzzy output follows the zero-order Sugeno style inference, in which the output value of each fuzzy rule is a constant (Sivanandam et al., 2007). Fig. 3 shows the four inputs and one single output for the Human Expert FIS.

Creating fuzzy rules
Fuzzy rules are a collection of linguistic statements that describe how the fuzzy inference system should make a decision regarding classifying an input or controlling an output. Fig.  5 presents the four inputs and one output reasoning of the student's performance procedure for zero order Sugeno fuzzy model. Each input has its own membership function.

Fig. 5. Fuzzy reasoning procedures for Human Expert FIS model of Student's Performance
R i have four input variables and one output variable as shown below: The rule R i is the i-th rule in the fuzzy rule base system, the µ i is the membership function of the antecedent part of the i-th rule for each input variable and w i is the weight of the consequent of each rule. For example, for input1 is score and the membership function can classified as low, moderate or high. If score is high and time is fast and attempt is a few and help is little then student performance is has mastered. This process of taking input such as score and processing it through membership functions to determine the "high" score is called fuzzification. Based on the human experts' experience and knowledge about the students' performance, 18 initial rules that are certain have been constructed as shown in Table 4.

Combining outputs into an output distribution
The outputs of all of the fuzzy rules must now be combined to obtain one fuzzy output distribution. The output membership functions on the right-hand side of the figure are combined using the fuzzy operator AND to obtain the output distribution shown on the lower right corner of the Fig. 5. For a zero-order Sugeno model, the output level z is a constant. The output level z i of each rule is weighted by the firing strength w i of the rule (Lin and Lu, 1996). For example, for an ∩ rule with input 1 = x and input 2 = y, the firing strength is as shown in equation (9).  Table 4. Initial fuzzy rules determine by human experts

Defuzzification of output distribution
The input for the defuzzification process is a fuzzy set and the output is a single number crispness recovered from fuzziness. Given a fuzzy set that encompasses a range of output values, we need to return one number, thereby moving from a fuzzy set to a crisp output. The final output of the system is the weighted average of all rule outputs, computed as in equation (10).
Finally, all the outputs of datasets for reasoning of the student's performance in the human expert FIS have been recorded.
Next section describes the ANFIS approach to form a complete fuzzy rule base to solve the problem of incomplete and vague decisions made by human.

Development of Adaptive Neuro-Fuzzy Inference System (ANFIS)
Basically, fuzzy rules and fuzzy reasoning are the backbone of fuzzy inference systems, which are the most important modeling tools based on fuzzy sets (Jang et al., 1997). Fuzzy reasoning is an inference procedure that derives conclusions from the set of fuzzy If-Then www.intechopen.com rules and known facts. The ANFIS model is proposed to form a complete fuzzy rule bases so that all possible input conditions of the fuzzy rules are being generated.
It is necessary to take into consideration the scarcity of data and the style of input space partitions. For example, for a single input problem, usually 10 data points are necessary to come up with a good model (Jang et al., 1997). Details on ANFIS model structure will be described in section 4.1.

ANFIS model structure
The ANFIS model structure consists of four nodes for input layer, the nodes of hidden layer and one node for output layer as presented in Fig. 6. The input layer represents the antecedent part of the fuzzy rule, which is the student's learning behavior such as the scores (S) earned, the time (T) spent, the attempts (A), and helps (H); the output layer represents the consequent part of the rule, i.e. the student's performance (P). The size of the hidden layer is determined experimentally.
In this work, the ANFIS model is trained with 18 fuzzy rules obtained from the human expert. These rules are considered as the rules that are certain. After that, 81 potential fuzzy rules are used for testing the network that represent the 3  3  3  3 rule antecedents.  Layer 2 is the fuzzification layer. Neurons in this layer perform fuzzification. In this student model, fuzzification neurons have a Gaussian function. A Gaussian function, which has a Gaussian shape, is specified as: = Where: is the input is the output of neuron i in Layer 2 c represents the membership function's center σ determines the membership function's width Layer 3 is the rule layer. Each neuron in this layer corresponds to a single to a single Sugeno type fuzzy rule. A rule neuron receives inputs from the respective fuzzification neurons and calculates the firing strength of the rule it represents. In an ANFIS, the conjunction of the rule antecedents is evaluated by the operator product (Negnevitsky, 2005). Each node output represents the firing strength of a rule. Thus, the output of neuron i in Layer 3 is obtain as, Layer 4 is the normalization layer. Each neuron in this layer receives inputs from all neurons in the rule layer and calculates the normalized firing strength of a given rule. The normalized firing strength is the ratio of the firing strength of a given rule to the sum of firing strengths of all rules. It represents the contribution of a given rule to the final result.
Layer 5 is the defuzzification layer. Each neuron in this layer is connected to the respective normalization neuron and also receives initial input S, T, A, and H. A defuzzification neuron calculates the weighted consequent value of a given rule as, www.intechopen.com Where: is the output of the Layer 4 is the output of defuzzification neuron i in Layer 5 { , , , , } is a set of consequent parameter of rule i Layer 6 is represented by a single summation neuron. This neuron calculates the sum of outputs of all defuzzification neurons and produces the overall ANFIS output (y).

Training with different training datasets
The preparation of the input patterns for training the ANFIS involves the conversion of the linguistic terms of the fuzzy rules into numeric values. Initially, there are 44 rules that are the certain and consistent rules, which are obtained from the human experts. The total number of input patterns for the training datasets is 44 rather than 18, because the 'x' symbol used in rule-18 in Table 4 should be represented by all possible linguistic terms for the respective antecedents.
The increments of the training datasets are very important until the ANFIS model had provided the best result and reasonably able to classify all of the student performance. Due to insufficient training data problem, the increments of 10 training patterns were proposed. Therefore, besides the 44 input patterns for training, this research also proposes 54, 64 and 69 trained ANFIS model.
In order to determine the best ANFIS model, ten tests had been carried out for each model and calculate their mean square errors (MSE). The error is the difference between the training data output value, and the output of the ANFIS corresponding to the same training data input value. The ANFIS model with the lowest mean square errors is being chosen for the next experiment.

Results and discussion on ANFIS
This section explains the testing results of the three ANFIS model selected from the trained fuzzy inference system. All the results had been tabulated in a line graphs to compare between ANFIS output based on 44, 54, 64 and 69 training datasets, respectively and the testing data.
In this section, four ANFIS model selected from the previous experiment are selected to test the 81 input data patterns. All the results had been tabulated into a line graphs to compare between the ANFIS output. Fig. 7 shows the comparison between ANFIS outputs based on 44 training datasets and testing data. There are 69.14% of the input patterns which are classified successfully and 30.86% which are misclassified.
Besides that, Fig. 8 shows the comparison between ANFIS outputs based on 54 training datasets and testing data. From the graph, there are 85.19% were classified successfully and 14.81% were misclassified. Therefore, the increment of the training datasets need to be executed, so that able to achieve better result.

. Comparison between ANFIS outputs based on 54 training datasets and testing data
After incrementing the training data from 54 to 64, the results seem becomes better. Fig. 9 shows the comparison between outputs of ANFIS model based on 64 training datasets and outputs of the checking data. The outcomes of the trained ANFIS able to achieved up to 96.3% which are classified successfully. However, still have some of outputs are illogical decisions. There are 3.7% of the decisions are illogically.
Thus, another experiment carried out by using the 69 training datasets and finally the all the outputs of the ANFIS are able to classify all the 81 input patterns successfully. We can see it clearly in the Fig. 10. From the graph, both of the outputs are same and the ANFIS model can classify the student performance correctly in all possible conditions.

Rough-fuzzy approach
ANFIS approach described in Section 4 has successfully formed a complete fuzzy rule that able to solve the problem of incomplete and vague decisions made by human. However, not all rules generated are significant and thus it is important to extract only the most significant rules in order to improve the classification accuracy. In this work, we propose Rough-Fuzzy approach to refine the fuzzy rule base into a concise fuzzy rule base (refer Fig. 11). Fig. 11. The rough-fuzzy approach to constructing concise fuzzy rules www.intechopen.com

Rough fuzzy phases
The three main phases in the rough-fuzzy approach are data pre-processing, reduct computation and data post-processing as shown in Fig. 11 and described as follows: Phase 1. Data pre-processing.
In this phase, the complete fuzzy rules are converts from linguistic terms into numeric values that correspond to the rough set format.
The fuzzy rules are mapped into a decision system format, discretisation of data, computation of reducts from data and derivation of rules from reducts.
a. In this problem, the fuzzy rules are mapped as rows; while the antecedents and the consequents of the rules are mapped into columns. In the rough set decision table, the antecedents and consequents of the fuzzy rules are labelled as condition and decision attributes, respectively. b. Discretisation refers to the process of arranging the attribute values into groups of similar values. It involves the transformation of the fuzzy linguistic descriptions of the conditions and the decision attributes into numerical values. In this study, a conversion scheme is formulated to transform the conditions and decisions of fuzzy linguistic values into numerical representations. c. Computation of reduct The reduct computation stage determines the selection of an important attribute that can be used to represent the decision system (Carlin et al., 1998). It is used to reduce the decision system, thus generating more concise rules. The rough set approach employs two important concepts related to reduction: one is related to reduction of rows, and the other one is related to reduction of columns (Chen, 1999). With the notion of an indiscernibility class, the rows with certain properties are grouped together, while with the notion of dispensable attributes, the columns with less important attributes are removed. Another essential concept in reduct computation is the lower and upper approximations, in which the computation involved in the lower approximation produces rules that are certain, while the computation involved in the upper approximation produces possible rules (Øhrn, 2001). d. Rule Generation. A reduct is converted into a rule by binding the condition attribute values of the object class from which the reduct is originated to the corresponding attribute.

Phase 3. Data post-processing
The rules in rough set format are converted into linguistic terms of the concise fuzzy rule base.

Rough fuzzy experiment
In Section 4, there are 81 datasets that represent every possible value of the fuzzy rules with full certainty. This dataset is used for the development of the ANFIS model. Using Rosetta as rough set tool, the genetic algorithm with object reduct is the method used for computing reducts (Øhrn, 2001). This method implements a genetic algorithm for computing minimal hitting sets as described by Vinterbo and Øhrn (2000). Using rough set, we trained the fuzzy rules incrementally with different training data set that consist of 44, 54, 64 and 69 input data patterns as described in Section 4. The purpose of the iteration with different input patterns of ANFIS is to ensure that the decision is agreed by human expert. To determine whether the performance of the concise fuzzy rule base is consistent with the performance of the complete fuzzy rule base, each rule bases of input patterns is compared. Table 7 shows that the decision output given by both the rule bases of each input patterns has very small differences (in terms of its mean square error). This result confirms that the concise fuzzy rule base does not degrade the performance of the complete fuzzy rule base.
It can be seen from  Table 8 shows four object-related reduct generated by Rosetta for ANFIS with 69 input patterns. All reducts has 100% support, which mean that all objects are mapped deterministically into a decision class. In other words, the support for the decision rule is the probability of an object to be covered by the description that belongs to the class (Grzymala-Busse, 1991). Rules generated from reduct are representative rules extracted from the data set. Since a reduct is not unique, rule sets generated from different reducts contain different sets of rules as shown in Table 9. For example, the given reduct from Table 8 i.e. reduct {Score, Attempt}, is presented by three rules as shown in Table 9 namely R 3 , R 4 , and R 5 . A unique feature of the rough set method is its generation of rules that played an important role in predicting the output. Table 10 listed the rule generation analysis by Rosetta and provides some statistics for the rules which are support, accuracy, coverage and length. The rule coverage and accuracy are measured to determine the reliability of the rules. Below is the definition of the rule statistics (Bose, 2006).

Rule set
a. The rule support is defined as the number of records in the training data that fully exhibit property described by the IF condition. b. The rule accuracy is defined as the number of RHS support divided by the number of LHS support. c. The conditional coverage is the fraction of the records that satisfied the IF conditions of the rule. It is obtained by dividing the support of the rule by the total number of records in the training sample. d. The decision coverage is the fraction of the training records that satisfied the THEN conditions. It is obtained by dividing the support of the rule by the number of records in the training that satisfied the THEN condition. e. The rule length is defined as the number of conditional elements in the IF part.  Coverage gives a measure of how well the objects describe the decision class. The conditional coverage is measured by the ratio of the number of rules that fulfil the conditional part of the rules to the overall number of rules in the sample. Meanwhile, the decision coverage is measured by the ratio of the number of rules that give decision rules to the overall number of rules in the sample. Accuracy gives a measure of how trustworthy the rule is in the condition. It is the probability that an arbitrary object belonging to Class C is covered by the description of the reduct (Grzymala-Busse, 1991). According to Pawlak (1998), an accuracy value of 1 indicates that the classes have been classified into decision classes with full certainty and consistency.

RS
For example, there are 27 objects that fulfil the conditional part of the rule R 1 , compared with the overall 81 rules in the sample. Therefore, the conditional coverage of this rule is about 0.3333. In addition, the decision for the performance and learning efficiency with the value of not mastered is used once in the fuzzy rule base and it is only given to rule R 1 . Therefore, the decision coverage for this rule is 1. Finally, the accuracy value of this rule is 1, which means that this rule belongs to Class C 1 and is covered. Thus, it is said to have full certainty and is consistent. In conclusion, because all of the rules in Table 10 have accuracy values of 1, the concise fuzzy rules are reliable because they are covered, have full certainty, and are consistent.

Conclusion
In this study, fuzzy inference models provide an efficient way to reason about a student's learning achievement in quantitative way. In this work, a complete fuzzy rule base are formed using ANFIS approach, where all possible input conditions of the fuzzy rules are being generated apart from the 18 human experts' rules that are considered certain. By training the neural network with selected 18 conditions that are certain, the ANFIS is able to recognize other decisions that are previously not complete, in both the antecedents and consequent parts of the fuzzy rules. However, some of the decisions are found misclassified and inconsistent. In addition, it is realized that the number of fuzzy rules formed is directly related to the number of fuzzy term sets defined at the antecedents. As the number of fuzzy term sets increases, the fuzzy rules will also increases and will affect the computation time and space. Besides that, when there are too many rules, some of the rules may be found not www.intechopen.com significant. Therefore, this work proposes the Rough-Fuzzy approach that able to reduce the complete fuzzy rule base into a concise fuzzy rule base. This approach able to determine the selection of important attributes that can be used to represent the fuzzy rule base system. Therefore, the condition space is reduced by taking only a few conditions to achieve a reasonable size of the condition subspace. Moreover, the proposed concise fuzzy rule base is said to be reliable, due to the fact that it is covered, consistent and have full certainty.