Predicting Performance of Schools by Applying Data Mining Techniques on Public Examination Results 1

This study work presents a systematic analysis of various features of the higher grade school public examination results data in the state of Tamil Nadu, India through different data mining classification algorithms to predict the performance of Schools. Nowadays the parents always targets to select the right city, school and factors which contributes to the success of the results in schools of their children. There could be possible effects of factors such as Ethnic mix, Medium of study, geography could make a difference in results. The proposed work would focus on two fold factors namely Machine Learning algorithms to predict School performance with satisfying accuracy and to evaluate the data mining technique which would give better accuracy of the learning algorithms. It was found that there exist some apparent and some less noticeable attributes that demonstrate a strong correlation with student performance. Data were collected through the credible source data preparation and correlation analysis. The findings revealed that the public examinations results data was a very helpful predictor of performance of school in order to improve the result with maximum level and also improved the overall accuracy with the help of Adaboost technique.


INTRODUCTION
Tamil Nadu Board of Secondary Education, established in 1910, is under the purview of the Department of Education, Government of Tamil Nadu, India.The Directorate of Government Examinations was formed as a separate directorate in Feb.1975.Dr. Lawrence planned and implemented the all India 10+1+2 pattern of education in 1978.The Higher Secondary Examinations were introduced in the year 1980 (Anon, 2014a, b).This Higher Secondary examinations play vital role in the career of any student who is completing their schools with talent.These results mostly determine their career aspirations and considered entry criteria to join any new college or universities.The scores from the Higher Secondary Board examinations are used by universities to determine eligibility and as a cut-off for admissions into their courses.Thus this examination and its results play a vital role in the context of educational systems in Tamil Nadu, India.Despite its importance and unique position in the educational systems, there is no or very limited predictive strength is existing on the Higher Secondary Examinations.
The transformation of examinations from a student selection and certification tool into an indicator of school effectiveness and an accountability instrument is a core reform in educational policy making (Naidoo et al., 2014).There is increasing research interests in using data mining in education.This new emerging field, called Educational Data Mining (Barnes et al., 2009), concerns with developing methods that discover knowledge from data originating from educational environments.Databases are rich with hidden information, which can be used for intelligent decision making.Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends (Micheline and Jiawei, 2008).School evaluation is part of the decision-making process in education; it involves judgments about the performance of schools through systematically collecting and analyzing information and relating this to explicit objectives, criteria and values.Ideally, school evaluation involves an (internal and external) assessment that covers all aspects of a school and their impact upon student learning.Such review and analysis covers a range of inputs, processes and outcomes reflected in such elements as staffing and physical resources, curriculum resources, the quality of leadership and management, learning and teaching activities and the standards achieved by students.
In our previous work (Macklin et al., 2014), we provided data cubes to analyse the exam results.We test different classification algorithms to predict which schools performs best based on the historical result data.We selected various algorithms like Naïve Bayes, Random Forest and K-NN.After evaluating the outcomes of these classifiers we decided to increase the accuracy using Adaboost and considering Naive Bayes as the weak classifier of the 3 classifiers.Overall, our Adaboost algorithms performance was accurate after multiple iterations.

LITERATURE REVIEW
In all of the countries, a major component of evaluation and school reform includes attempts to improve academic standards and quality through the use of tests or examinations.Many assessment systems have come into the picture includes the following: (a) national assessment, which includes public (external) examinations to select students for successive levels in the education system, system assessments to determine if children are acquiring certain knowledge, skills and values.Student results leads the school become a judgments on the school's performance.In a growing number of countries, 'league tables' of schools, especially at secondary level, are published in newspapers, as information to the public, to allow parents to choose a school (Naidoo et al., 2014).Sonali et al. (2012) determined that data mining could be used to be used to improve the education system to enhance the efficacy and overall efficiency by optimizing the resource available.Brijesh and Saurabh (2011) with the help of variables such as Semester Marks, Attendance, etc., used in the classification techniques for predicting the end semester results.Sundar (2013) Prediction of student's performance based on the exam results of engineering college students where the comparison of the classifiers has been done which had helped students to focus on their performance area.Kabakchieva (2013) 10330 instances of data from Bulgarian schools were taken as samples and classified with labels Excellent, Very Good, Good, Average and Bad.These were used to predict the target label (Adeyemi, 2008).This study focus on reviewing the strategy by looking at the performance of the students at Junior Secondary Certificate examinations in the Ondo State, Nigeria.In one of the experiment done for evaluating performance of various classification techniques for distance education student's education dataset, it has been identified that Naive Bayes performs adequately with accuracy of 80.97% (Garc´ıa-Saiz and Zorrilla, 2011).

Data set:
The data set used in this proposed work contains students public examination results information collected from the Directorate of Higher Secondary Education, Tamil Nadu.This was done based in reference to building data warehouse/data mart to store and analyze the public examination results of higher grade students by Directorate of Government Examinations belonging to Tamil Nadu, India which was obtained based on Microsoft SSAS (Macklin et al., 2014).The dataset has around 27994 rows which has data segregated by District, School, Sex, Average Marks in individual subjects and the overall pass percentage among them.There were 6269 schools covering around 71 districts.Totally there are 2305726 (53%) female students and 2005502 (47%) male students.Since the volume of data we handle were huge, we intend to use MySQL with Rapid Miner for the purpose of loading data and training.The data was available in the MS Access 2007 format, further to that we exported the same to CSV Format and then did loaded to the MySql database.Table 1 shows that the data was having details of the students who have attended the exam in private.Those students have been removed to have refined dataset.

METHODOLOGY
Data mining: Data mining refers to extracting or "mining" knowledge from large amounts of data.Educational Data Mining is an emerging interdisciplinary research area that deals with the development of methods to explore data originating in an educational context.Data mining helps to discover underlying structures in the data, to turn data into information and information into knowledge.It can be defined as the process involved in extracting interesting, interpretable, useful and novel information from data from the Educational Domain such as Schools, Colleges, E-learning platforms, Intelligent Tutoring, Learning Management Systems (Romero and Ventura, 2010).Data mining consists of a set of techniques that can be used to extract relevant and interesting knowledge from data.Data mining has several tasks such as association rule mining, classification and prediction and clustering.Classification techniques are supervised learning techniques that classify data item into predefined class label.It is one of the most useful techniques in data mining to build classification models from an input data set.The used classification techniques commonly build models that are used to predict future data trends.There are several algorithms for data classification such as decision tree and Naïve Bayes classifiers.With classification, the generated model will be able to predict a class for given data depending on previously learned information from historical data.Figure 1 depicts that the overall process of Data Mining.
Classification: Classification refers to the task of predicting a class label for a given unlabeled point (Zaki and Meira Jr., 2013).Based on the above labelling approach each training point belongs to one of the 4 different classes namely "Excellent", "Good", "Average" and "Bad".In a multiclass prediction, the result on a test set is often displayed as a two-dimensional confusion matrix with a row and column for each class.Each matrix element shows the number of test examples for which the actual class is the row and the predicted class is the column.Figure 2 shows that the pictorial representation of classification.
As our intention is to choose the best tool and classification algorithms for handling educational datasets which can be integrated in our Java application tool, we have to search among those that can support categorical and numeric data, handle large set of data and be accurate.Given that the confusion matrix of the Classifier is oriented toward True Positive and True Negative using One Vs All Methodology.i.e., that a given row of the matrix corresponds to specific value for the "truth", we have: That is, precision is the fraction of events where we correctly declared i out of all instances where the algorithm declared i.Conversely, recall is the fraction of events where we correctly declared i out of all of the cases where the true of state of the world is i.
Rapid miner: While technology enables us to capture and store ever larger quantities of data, finding relevant information like underlying patterns, trends, anomalies and outliers in the data and summarizing them with simple understandable and robust quantitative and qualitative models is a grand challenge.RapidMiner is a system which supports the design and documentation of an overall data mining process.It offers not only an almost comprehensive set of operators, but also structures that express the control flow of the process.RapidMiner and RapidAnalytics provide an integrated environment for all steps of the data mining process, an easy-to-use Graphical User Interface (GUI) for the interactive data mining process design, data and results visualization, validation and optimization of these processes and for their automated deployment and possible integration into more complex systems.RapidMiner enables one to design data mining processes by simple drag and drop of boxes representing functional modules called operators into the process, to define data flows by simply connecting these boxes, to define even complex and nested control flows and all without programming (Markus and Ralf, 2013).The below given figure refers to the process block typically used in rapid miner to setup the Data mining process with different machine learning algorithms.As we can see in this example first the database is read and attributes/features are selected for the process, the set role operator defines the feature which is to be considered for learning and next goes to the validation step.Figure 3 describes that the process block of Rapid Miner.In the context of the problem of estimating classes for a test set containing instances.The true classes are noted, whereas the estimated classes, as defined by the considered classifier, are noted as Excellent, Good, Average and Bad (Cherif et al., 2011).Most measures are not processed directly from the raw classifier outputs, but from the confusion matrix built from these results.This matrix represents how the instances are distributed over estimated (rows) and true (columns) classes shown in Table 2.
The terms nij(1≤i, j≤k) correspond to the number of instances put in class number i by the classifier (i.e., C i ), when they actually belong to class number j (i.e., Ĉ j ).The rules for labeling classification is exists in Table 3.

RESULTS AND DISCUSSION
Naive Bayes process: A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions.Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting.The Naive Bayes classifier assumes that attributes are independent, but it is still surprisingly powerful for many applications (Zaki and Meira Jr., 2013).In naive Bayes classifiers, every feature gets a say in determining which label should be assigned to a given input value.To choose a label for an input value, the naive Bayes classifier begins by calculating the prior probability of each label, which is determined by checking frequency of each label in the training set (Steven, 2009).The contribution from each feature is then combined with this prior probability, to arrive at a likelihood estimate for each label.The label whose likelihood estimate is the highest is then assigned to the input value.The independence assumption immediately implies that the likelihood can be decomposed into a product of dimension-wise probabilities: We generally estimate P(ai | Vj) using m-estimates: where Ensemble learning methods: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions.Ensembles are well-established as a method for obtaining highly accurate classifiers by combining less accurate ones (Dietterich, 2014).Since we have large volume of data to be used for training, we envisaged the use of ensemble based systems.Ensemble based systems can be useful when dealing with large volumes of data or lack of adequate data.When the amount of training data is too large to make a single classifier training difficult, the data can be strategically partitioned into smaller subsets.Each partition can then be used to train a separate classifier which can then be combined using an appropriate combination rule (see below for different combination rules).
Random forest: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest (Breiman, 2001).Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees (Anon, 2013).To classify a new object from an input vector, put the input vector down each of the CARTs in the forest.Each CART gives a classification and Random Forest asks the trees "votes" for that class.The forest chooses the Fig. 5: Naive Bayes rapid miner implementation results Fig. 6: Decision trees from random forest classification having the majority votes.Random forest was attempted with Gain Ratio which resulted with the accuracy of 67.96% within 43 sec, to further improve the performance attempt was made to evaluate Information gain which resulted in 71.21% within 1 min 10 sec. Figure 6 are the some of the decisions trees which are getting generated out of Random Forest.K-NN: KNN, originally proposed by Fix and Hodges a very simple 'instance-based' learning algorithm.principle of this method is based on the intuitive concept that data instances of the same class should be closer in the feature space.While a training dataset is required, it is used solely to populate a sample of the search space with instances whose class is known.No actual model or learning is performed during this phase; for this reason, these algorithms are also known as lazy learning algorithms.Different distance metrics can be used, depending on the nature of the data.Euclidean distance is typical for continuous variables, but other metrics can be used for categorical data.Specialized metrics are often useful for specific problems, such as text classification.When an instance whose class is having the majority votes.Random forest was attempted with Gain Ratio which resulted with the accuracy of 67.96% within 43 sec, to further improve the performance attempt was made to evaluate Information gain which resulted in 71.21% within 1 Figure 6 are the some of the decisions trees which are getting generated out of Random Forest.

KNN, originally proposed by Fix and Hodges is
based' learning algorithm.The principle of this method is based on the intuitive oncept that data instances of the same class should be closer in the feature space.While a training dataset is required, it is used solely to populate a sample of the search space with instances whose class is known.No ed during this phase; for this reason, these algorithms are also known as lazy learning algorithms.Different distance metrics can be used, depending on the nature of the data.Euclidean distance is typical for continuous variables, but other e used for categorical data.Specialized metrics are often useful for specific problems, such as text classification.When an instance whose class is unknown is presented for evaluation, the algorithm computes its k closest neighbors and the class is assigned by voting among those neighbors.To prevent ties, one typically uses an odd choice of k for binary classification.For multiple classes, one can use plurality voting or majority voting shall define KNNC in a more rigorous manner.Suppo The preceding equation uses the nearest neighbor to determine the class.Alternatively, we can have K nearest neighbors to determine the class by voting.As the extension to KNN is straightforward, we shall not formulate it separately.Using Mixed Euclidean distance as the methodology we have implemented this algorithm.In our case by implementing K able to arrive at the accuracy of 68.49% within the time frame of 1 min 55 sec.

Comparison of Classifier performance:
investigate the performance on the selected classification methods or algorithms namely Naive Bayes, Random Forest and K-NN we implemented the methodology outlined in the preceding section with the help of Rapid Miner.All the implementation a with 10-Fold cross validation and final results are provided to compare in the following Table 4.
Since the information gain has better accuracy rate when compared to Gain Ratio with respect to Random Forest we use Information Gain for them.
Figure 7 shows that the comparison of various classification of instances.In this above chart we can we can realize that Naive Bayes has classified with better accuracy with 23504 instances are predicted correctly.Subsequently we have taken the following aspects to do comparison between different classifiers as we use multiple classes in our research.The following are the factors which we use to measure and compare their performance across their respective classes: Sensitivity (specificity) probability of the positive (negative) label being true; in other words, it assesses the effectiveness of the algorithm on a single class; F-score measure which benefits algorithms with higher sensitivity and challenges algorithms with higher specificity.
unknown is presented for evaluation, the algorithm computes its k closest neighbors and the class is ned by voting among those neighbors.To prevent ties, one typically uses an odd choice of k for binary classification.For multiple classes, one can use plurality voting or majority voting (Anon, 2013).We shall define KNNC in a more rigorous manner.Suppose that we are given a training dataset of n points with where, (xi, yi) represent data pair i, with xi as the as the corresponding target class.Then for a new data point x, the most likely class should be determined by KNNC (k = 1 in this case), as

−xi∥2
The preceding equation uses the nearest neighbor ss.Alternatively, we can have K nearest neighbors to determine the class by voting.As the extension to KNN is straightforward, we shall not formulate it separately.Using Mixed Euclidean distance as the methodology we have implemented this our case by implementing K-NN we were able to arrive at the accuracy of 68.49% within the time

Comparison of Classifier performance:
To gauge and investigate the performance on the selected classification methods or algorithms namely Naive NN we implemented the methodology outlined in the preceding section with the help of Rapid Miner.All the implementation are done Fold cross validation and final results are provided to compare in the following Table 4.
Since the information gain has better accuracy rate when compared to Gain Ratio with respect to Random Forest we use Information Gain for benchmarking Figure 7 shows that the comparison of various classification of instances.In this above chart we can we can realize that Naive Bayes has classified with better accuracy with 23504 instances are predicted ently we have taken the following aspects to do comparison between different classifiers as we use multiple classes in our research.The following are the factors which we use to measure and compare their performance across their respective approximates the probability of the positive (negative) label being true; in other words, it assesses the effectiveness of the score is a composite measure which benefits algorithms with higher and challenges algorithms with higher

Validation of the performance of the classifier:
To validate the accuracy of the classifier we decided to predict the performance of the school by providing inputs through an external source to the Java  5.In the context of validation, Random Forest gives the better accuracy for the data selected from the year 2013.
Implementation of meta-algorithm: AdaBoost: Ada-Boost (Abu Afza et al., 2011), short for Adaptive Boosting, is a machine learning algorithm, formulated by Freund and Schapire (2007) and steps to iterate better performance was taken.This Boosting approach is being taken with the idea of creating a highly

)
Subsequent to the evaluation and improvement of the Naive Bayes performance with the boosting algorithm the performance has reached up to 98.12% from 83.96% which is 14% higher than the original performance of Naive Bayes without adaboost implementation (Fig. 11).

CONCLUSION
Though there are different benchmarks comparing the performance and accuracy of different classification algorithms, there are still very few experiments carried out on Educational datasets such as this one we have done in this experiment.We have compared the performance and the interpretation level of the output of different classification techniques applied on educational datasets with multiple classes in order to determine which one is more suitable for integrating with Java application and use widely.As a conclusion, we have met our objective which is to evaluate and investigate the three selected classification algorithms which would help which to predict the performance of the school.Our experimentation shows that there is not one algorithm that obtains significantly better

Fig. 3 :
Fig. 3: Rapid miner process block , n = The number of training examples for which v = v j nc = Number of examples for which v = v j and a = a j p = A priori estimate for P (a j | v j ) m = The equivalent sample size Implementation of the Naive bayes algorithm in the Rapid Miner with the dataset provides accuracy of 83.96% which is depicted in the Fig. 5.
that we are given a training dataset of n points with their desired class, as shown below: {(xi, yi), (x2, y2), …, (xn, yn)} where, (xi, yi) represent data pair i, with xi as the feature vector and yi as the corresponding target class.Then for a new data point x, the most likely class should be determined by KNNC (k = 1 in this case), as follows: nnc (x,1) = yp, p = argmini∥x−xi

Fig. 7 :
Fig. 7: Performance of classifiers based on classification of instances

Table 1 :
Statistics of students in private examination

Table 4 :
Comparison of performance of different classifiers

Table 5 :
Validation on the classifier with actual data Since this algorithm is the first practical boosting algorithm and remains one of the most widely used and studied, with applications in numerous fields.The weak learner naive bayes which is providing a accuracy is 83.96% is further being improved by iterations with the help of this Meta-algorithm and the iterations are shown in Table6: Iterationsclassification accuracy.Though in our case Naive Bayes has better accuracy rate of 83.96% when compared with K-NN having 68.49% and Random Forest having 71.21%, respectively.The accuracy of the weak classifier Naive Bayes is also increased upto 98.12% with the help of Adaboost Algorithm.Added to that Naive Bayes can generate this performance within 11 sec.From the above results it is clear that Naive Bayes classification techniques can be applied on educational data for predicting the School's outcome and improve their results.Our near future work is to extend this experimentation by building a novel selfconstructing cascading classifier algorithm for analyzing the public Examination results.