P REDICTION OF S TUDENTS ' A CADEMIC P ERFORMANCE BY K-M EANS C LUSTERING

Schooling system must provide high quality learning opportunities to meet the educational needs and ensuring achievement for every student. All teachers monitor their students’ progress throughout the year, includes formative assessment, questioning, providing feedback, etc. This practice helps teachers continually assess students’ academic performance and evaluate the effectiveness of their teaching. In this paper, k-means clustering method with deterministic model is used to analyze the student's overall performance. The results is important for educators to identify students who are at risk academically and areas where teaching strategies may need adjustment to better meet these students' needs.


A b s t r a c t
Schooling system should offer finest teaching and learning opportunities to reach the educational requirements and ensuring achievement for every student.All teachers monitor their students' progress throughout the year, includes formative assessment, questioning, providing feedback, etc.This practice helps teachers continually assess students' academic performance and evaluate the effectiveness of their teaching.In this paper, method of kmeans clustering with deterministic model is applied to analyze the student's overall performance.The results is important for educators to identify students who are at risk academically and areas where teaching strategies may need adjustment to better meet these students' needs.

R e s e a r c h H i g h l i g h t s
Typically, k-means algorithm is applied to numeric and continuos data [1].It is commonly used in wireless sensor networks, pattern recognition, document classification, rideshare data analysis, diagnostic systems, etc.The advantages of k-means clustering are that it is relatively simple to perform, assemble stable and tight clusters, converges after some interations and low computational cost [2].
K-means clustering is used to categorize and sort a given data set by specify the number of clusters k at the beginning.There are k centroids for k clusters.The algorithm is used to find observational groups that have not been specifically chosen in the data.The algorithm aims to obtain better data arrangement in order to work out an appropriate decision.The results is that similar characteristics data will be grouped within the same cluster, but the clusters themselves are disparate.The target is to secure data points as homogeneous as possible in the same cluster and data points as heterogeneous as possible in the opposed cluster [3].

R e s e a r c h O b j e c t i v e s
The aim of clustering in the paper is to partition students into homogenous groups according to their academic achievements, as measured by assessment scores [4].These applications can help both the instructors and student to improve the quality education.The teacher can analyze different causes of low academic achievement and introducing effective teachinglearning methods.The new teaching strategy may motivate students to study and progress in their academic performance [5][6].

M e t h o d o l o g y
The initial set of centroids are selected randomly, which are used as the initial points for every cluster.Then, individual data point is assigned to be a member of a group by iterative calculations.The K-means algorithm attempts to gather objects which have similar attributes into same group while dissimilar attributes into different groups.It assigns data points to a group in which each data point is in close proximity with the cluster's centroid.The methodology intends to have familiar features objects gather within clusters, so that homogeneity are found within the same cluster.
The k-means algorithm can be summaried as below: 1. Choose a value k to classify the data into k clusters.
2. Select cluster centre randomly for each cluster.
3. Assign data point to a cluster which the data point have the shortest distance with that cluster centre.
4. Determine the new cluster centre for each cluster.
5. The process repeats from step 2 until there is no data points change group.
The purpose of the algorithm is to minimize the sum of squares of distances between data points and the corresponding cluster centre.The objective function is

R e s u l t s
We applied equation (1) on the data set (student's scores in academic year 2019) of a college.The number of students involved in analysis is 106 and dimensions (total number of subjects) are 8.The Annual Average Mark (AAM) achieved by student across all subjects attempted in an academic year is calculated.The AAM is out of 100.
We apply the deterministic model in equation ( 2) & (3) to find the overall performance.The assessment in every cluster size is evaluated by finding the average mark in each cluster.

AAM =
sum of (credit value * marks of each subject) sum of credit values (2) where N = total number of students in a cluster The grades are being associated with various percentage intervals to compare performance of the students.The results derived are the students having above 70 are in A category, marks between 65-69, 60-64, 50-59 are in A-, B, C category respectively.Marks less than 50 are in F category.
For  = 6, cluster size 12 (cluster 1) had been listed as having overall performance 70.66% while cluster size 20 (cluster 2) had been listed as having overall performance 64.03%.

Findings
K-means clustering algorithm and deterministic model are used to evaluate the academic performance of a college students.This methodology will assist academic planners in measuring students' academic performance and assessing students' progression whether students are meeting course requirements.This methodology provides an indication of overall academic performance and steps that need to be taken to improve students' academic performance in next academic year.

R E F E R E N C E S
[1] Jafar OM.A New Hybrid Hard-Fuzzy (K-MFCM) Data Clustering Method for Finding Cluster International Journal of Advanced Research in Computer Science.2018 Mar 1;9(2):626.
1) where m = total number of students in a cluster n = total number of subjects taken by each student |  () −   | 2 denotes sum of squares of distances between the cluster centre   and a data point of all the data points from their corresponding cluster centres.

Figure 1 :
Figure 1: Students' Overall Performance for Each Subject for Cluster  = 6