Application of regression methods to investigate the factors influence on student’s Grade Point Average Ahmed Saied Rahama Abdallaha* and Mohammed Omar Musa Mohammeda

Article history: Received: July 7, 2020 Received in revised format: August 1


Introduction
Grade Point Average (GPA) is the sum of average of grade points used as a main standard for measuring students' performance at university level (Westrick, 2015). GPA has a variety of representations including employing numbers (1-5 scale), letters (A, B, C, D, E), or on label of performance (Excellent, Good, Fair, Borderline or Poor) (Suendarti & Liberna, 2018). GPA is obtained by university students after finishing the first semester or the first year. The performance accumulation is considered the base of the GPA. GPA for students in Science colleges related to the Secondary school rate and achievement test, because the criteria of admission in the colleges depend on the two variables (Secondary school rate and achievement test). This means that student's GPA in university could be affected by these two explanatory variables. Most of the scientific research was concerned with the study of the predictive power and predictive honesty of secondary school tests, achievement test, and general aptitude test, but the results of the research were contradicted. Most of them did not arrive at decisive decisions about the ability of the criteria used to admit students to predict the cumulative average and confirm their predictability. Therefore, this study attempts to examine the relation between the cumulative rate GPA and explanatory variables (the achievement test and the secondary school rate) to continue the efforts of researchers to evaluate the efficiency of the admission criteria used in science faculties at Prince Sattam Bin Abdul-Aziz University (PSAU), especially in scientific colleges. Regression has been constructed to disclose the relation between response variable and explanatory variables for over 200 years. The old regression is the most common technique to measure the impacts on the mean. These traditional regressions claim that the coefficients of regression effects are fixed across the population. However, such average impacts do not always attract many areas, and they are sometimes quite heterogeneous. For instance, Quantile regression (QR) with applications by exploring the relation of the foreign direct investment and economic growth (Girma & Görg, 2005;Zhou, 2011) and in "precision health/medicine" (Collins & Varmus, 2015). Researchers in a variety of fields including, economy, finance, medicine and politics have shown an increase attention on group variations in all the population rather than the focus on the average. Mean regression cannot necessarily meet all these requirements or needs.

Problem statement
The problem of the study is that when students enter the university in scientific colleges, this is in accordance with specific admission criteria associated with their previous academic achievements in their secondary schools. This relationship needs to track and study for students during their university studies and then obtain the cumulative average for graduation in the final year, or it is possible to take the cumulative average for the first university year to know whether students' performance at the university is affected by their results in the achievement test and the secondary rate. The current study focuses on identifying the relationship between the students' GPA and the variables that affect it (secondary school level and achievement test).

Study significance
The importance of the research lies in the fact that it deals with a vital and important topic with regard to the explanatory variables that affect students 'performance and to obtain a grade point average, as the cumulative average at the university has to do with the student's performance for the achievement test and the secondary rate since they are among the admission criteria for students in colleges of science. In addition, the relationship between GPA and the variables (achievement test and secondary school rate) could be investigated with regression models, especially the quantile regression model.

Study Objectives
This study attempts to achieve the following objectives 1. Apply regression models to investigate the factors affected on student's GPA at Prince Sattam University. 2. Compare the results of the two models of regression.

Source of data
The data of the study were obtained from PSAU mainly through the Deanship of Admissions and Registration for the academic year (2018). The collected data of the students included the achievement test, the secondary school rate, and the Average Grade Point of students (GPA) of the colleges of science in the university.

Study Population Sample Size
The study population is the colleges of Prince Sattam University, and the study focuses on the colleges of Science since they adopted the secondary school rate and the achievement test as a criterion of admission. The sample size was 175 students in the final year.

Models
In this paper, two approaches of regression were applied; namely Ordinary Least Square Method (OLS) and Quantile Regression (QR) to investigate the relationship between GPA and (secondary school rate, achievement test, gender, and department), and to decide on the independent variable impact power on the dependent variable. Regression might be utilized to predict change impacts.

Literature review
Lotsi (2019) conducted a study to examine determinants significantly affecting GPA of Level 100 students of the University of Ghana. The study adopted a questionnaire to collect data, and the regression method was used to analyses the data. The results showed that gender, residential status, and previous high school did not significantly affect the students' GPA. Habtamu (2018) revealed that undergraduate students at an Ethiopian university academic performance was significantly associated with the students' GPA and preparatory CGPA, gender, place of region, college of student, mismatch between teaching and learning style and students' expectation about future job opportunity. However, variables such as age, study hours, entrance exam result, participation in corporative learning program, mother education level, and department choice were insignificant. Similarly, Erdem et al. (2007) aimed to determine which socioeconomic and demographic factors affected student's GPA. They conduct a survey at Gaziosmanpaşa University with fourth grade students and Probit model was used to analyze the data. The findings showed factors such as, sort of secondary school graduate, gender, family members attending school numbers, parents' educational level, expressing family expectations about school and study time had impact on the GPA. (Abdullah, 2011). The study found that GPA of the students affected by nationality age, and the score of the high school. Aleidi et al. (2020). The findings revealed that, age, part time employment, smoking, and short duration of studying was statistically significantly associated with achieving good or below GPA. Ahmed (2006) attempted to investigate the socioeconomic affecting the student performance of the College of Business and Economics-UAEU. The sample size was 864 students; the study uses the regression method to analyses the data. The findings revealed that, the most important determi-nants impacting performance of the students include students' proficiency in English and participation in classroom discussion. Determinants that negatively impact students' performance include being absent from lectures and living in crowded houses. Alshammari et al. (2017) studied factors influencing the academic performance of nursing students in Saudi Arabia. The study concluded that, marital status age, gender, socio-economic status and academic level, and previous school attended by the students showed various influenced on the sample's academic performance. Raneem (2013) found that factors such as, motivation, gender, interest, marital status, and the transportation used to reach the faculty had significant effects on the sample's academic performance. Whereas, factors such as learning resources, age, motivation, study time and type of transportation used have shown to create a significant difference in GPA between male and females. Raychaudhuri et al. (2010) tried to discover factors impacting on students' performance. The study employed regression as a methodology, and the random sample survey was used. The finding revealed that the participant's academic performance was positively influenced by factors such as students' attending classes, education of the mother and trained teacher presence in the school. Ibrahim, and Yahaya (2015) conducted a study and concluded that Quantile regression was performed better than OLS. In the same line John (2015), in his study revealed that, Quantile regression is more robust for outliers than OLS. Also,

Multiple Linear Regression Model
Multiple linear regression (MLR), is a tool in statistics that utilize several predictor variables to forecast the response of outcome variable. The aim of ( MLR) is to study the linear association between the independent variables and dependent variable. The MLR model formula is where for = 1,2, . . . . . . , , is the response variable , are explanatory variables, is the intercept of , are slope coefficients for each explanatory variables, and is the model error terms (residuals). In order to estimate the unknown parameters of the MLR model, Ordinary Least Square was used as follows: The test of significance for the unknown parameters of the MLR is as = = =………………= =0 vs : ≠0, for at least one . The statistics is also as follows: where is the sum square of the regression, is the sum square of the residuals, is the numbers of the model parameters, and is the number of observations. For goodness of fit for the model and Adjusted will be used.
where is the sum square for the total.
For testing individual regression coefficients, t-test is used as follows: The test statistics for the test is as follows, where ( ) is the square root of the a slope of the covariance matrix for the estimated parameter vector .

Quantile Regression (QR)
Quantiles defines as a cut points allocating the area of a probability distribution or observations of the sample into continuous intervals with same probabilities. Quantile Regression (QR) was proposed by Koenker and Bassett (1978), extension of linear regression was also used to model the conditional quantiles of the dependent variable like 25 percentiles, 50 percentiles, and 90 percentiles or 0.90 quantile. As each quantile perhaps adopted, it is achievable to study several fixed points of the distribution. QR is especially appropriate when the amount of variation in the conditional quantile, asserted by the regression coefficients, relies on the quantile. Median regression is special case of quantile regression when the quantile is 0.50. The basic superiority of QR compared to MLR is its resiliency for studying data with heterogeneous conditional distributions. Data of this type appear in various fields, such as, econometrics, survival analysis, and ecology (Koenker & Hallock 2001).
QR brings accomplish illustrate of the variables effect when a group of percentiles is studied, and it assumes no distributional assumption about the error term in the model.
Let be a random variable with distribution function The quantile of is define as the inverse function where 0 < < 1, we consider the Median as special case (0.5). Also, Eq. (8) where 0 < < 1, = ( ( ) , ( ) ,……….., ( ) ) is the vector of the unknown parameters Let , ,……., be a random sample of , the sample median as known as the minimize of the sum of the absolute deviations Furthermore, the general sample quantile ( ) which is the analogue of ( ), can be formulated for the optimization solution where ( ) = ( − ( < 0)), 0 < < 1. Here (0) represents the indictor function. Like the sample mean which minimize the sum of the square residuals.
could be expanded to the linear conditional mean function ( \ = ) = .
In the similar manner, linear conditional quantile function, ( \ = ) = ( ) can be estimated via solving the following equation for any quantiles 0 < < 1. The portion ( ) is recognized as regression quantile, when = 0.5 which minimize the sum absolute residuals, coincides to median regression. The estimated model of the conditional quantile is given by The interpretation of the estimated parameters in QR has the same way of interpretation as those of OLS as rates of changes (Buchinsky M, 1998). More details about QR can be found in different literatures (Koenker, 2005;Davino & Furno, 2013;Weisberg, 2005;Neter et al., 1996;Sen & Srivastava, 2012;SAS, 2014).

Advantage of QR compared to MLR
In this subsection we introduce the advantages of QR compared to MLR 1-QR grants for studying the association between variables beyond the mean of the data, composing it fruitful in modeling reposes that are non-normally distributed and that have nonlinear association with explanatory variables. 2-QR are robust to outliers. 3-QR does not guess a distribution for the outcome, or assumes a constant variance for it, unlike MLR. 4-QR provides a lot of model robustness than MLR. 5-QR is adaptable, as it does not need a link function or distributional assumptions. 6-It provides comprehensive view of the association between the dependent variable and the explanatory variables, as it grants us to model the effect of covariates on various quantiles of the dependent variable.

Descriptive statistics
This section presents QR findings of the 2018 student's GPA data with four predictor variables. SAS QUANTERG procedure is used to fit the model. Table 1 shows that, the median is the value as the 50 th percent or the second quantile. Also, we notice that GPA increased gradually from quantile 0.25 up to quantile 0.75.    Table 3 presents the estimates and P-value of the parameters through quantile level. The result reveals that, the secondary school rate, achievement test, and department have significant effects on GPA at 0.25 quantile. allotment for each entity alternate in the rate of the initial descriptive variable (secondary school rate), maintaining other predictor variables fixed. Particularly, the Q1 regression coefficient demonstrates that 25% of GPA was declined by 0.014 for each one-entity alternate in secondary school rate, putting other predictor variables fixed. Q1 is a rate which obtains 25% of the cases lesser than it. At 0.5 quantile, intercept = -1.97 that is the forecasted level of the 0.5 quantile GPA when all the predictor variables are zero. Predicted value 0.356 denotes the rate of change of 0.5 quantile (Q2) of the response variable allotment for each entity alternate in the rate of the initial regressor (secondary school rate), putting other predictor variables fixed. Particularly, the Q2 regression coefficient denotes that 50% of the GPA will increase by 0.356 for every one-unit alternate in secondary school rate, putting other predictor variables fixed. Q2 is a rate obtaining 50% of the cases lesser or similar to it. At 0.75 quantile, attract = -3.80, that is the forecasted level of 0.75 quantile of students' GPA when all the predictor variables are zero. Predicted value 0.045 denotes the rate of change of 0.75 quantile (Q3) of the response variable allotment for each entity alternate in the value of the secondary school rate, and the other predictor variables remained fixed. Particularly, Q3 regression coefficient means that 75% of the GPA will decline by 0.045 for a single entity alternate in secondary school rate, putting other predictor variables consistent. Q3 is a rate having 75% of the cases lesser or similar to it; specifically, 25% of cases are greater than it. At 0.95 quantile, catch = -3.68, that is the forecasted rate of 0.95 quantile of students' GPA when all the predictor variables are zero. Predicted value of 0.608 expresses the level of alternate of 0.75 quantile(Q4) of the response variable allotment for each entity alternate in the rate of gender, put all the other predictor variables constant. That is, Q4 regression coefficient reveals that 95% of the students' GPA will decrease by 0.608 for each single unit alternate in gender, setting the other predictor variables fixed. Q4 is a rate obtaining 95% of the cases that are less or similar to it.

Graphical Interpretation of the Predictor Covariates
This section shows a brief summary of the QR findings of the present research covariates. All charts display one coefficient in the QR model, the highlighted space displaying the 0.95% confidence interval band. The intercept of the model in Fig. 2 can be explained as the expected limited quantile function of the GPA through levels of quantile. It has a negative impact in the top quantiles rather than the bottom quantile; the chart reveals a negatively toward inclined line through the quantiles. The second chart illustrates the impact of the achievement test on GPA that was positive, particularly in the lower rather than upper quantiles. The third chart reveals that the secondary school rate has a positive impact on GPA in the upper quantiles. The third chart in Fig. 3 displays the positive influence gender on the GPA in the upper quantiles; the graph expresses a positively upward sloped line through the quantiles.

Fig. 2.
Quantile processes with 95% band for secondary school rate, achievement test, and department of Biology  Table 4 shows the results of fitting MLR model to examine the relation between participants' GPA in the Colleges of Science at PSAU and the explanatory variables secondary school rate, achievement test, department, and gender. The findings showed a statistically significant relation between GPA and explanatory variables secondary school rate, achievement test, and gender. Whereas the variable department showed an insignificant effect on GPA. The OLS R 2 statistic indicates that 25% of the variation in GPA is due to the change in the explanatory variables.

The Significant of OLS Model
Analysis of variance (ANOVA) was utilized to test the importance of the overall model. The result is shown in Table 5 indicates the model is significant at a 5% level of significance. In other words, the three explanatory variables of the secondary school rate, achievement test, and gender affected the students' GPA positively.  Table 6 illustrates the results of fitting the OLS regression and QR model to investigate the relationship between GPA and predictors variables. The independent variable secondary school rate, achievement test, and gender showed significance effect in the OLS regression. At 0.25,0.5, 0.75 quantiles the explanatory variable department has an insignificant effect on GPA, while at quantile 0.95 showed a significant association with GPA in science colleges.

Discussion
This study has used QR and OLS regression models for study students' GPA in science colleges. The estimates through quantile rates permit us to examine the effect of predictors on various quantiles of the dependent variable, and thus add a perfect image of the relationship between the response variable (GPA) and predictor variables (secondary school, achievement test, department, and gender). It has also been observed that secondary school rate, achievement test, and gender were revealed to be variables of importance which significantly affect GPA at all quantile levels. This result agrees with Abdullah (2011) and Raneem (2013). At 0.95 quantile the variable department has a significant effect on GPA beside the other three independent variables. This indicates that, QR is performed better in upper quantiles levels than lower quantiles. The findings show that, QR is performing better compared to OLS, and this agreed with previous study such as (Ibrahim & Yahaya, 2015). QR has many advantages compared to OLS, (John, 2015). In addition, QR may lead to better understanding and inference. QR has no assumptions around the distribution of the residuals. It also allows to explore different aspects of the association between the dependent variable and the predictors variables. The findings agree with others results in literature and could be used in the future studies to investigate the factors effected on the student's GPA.

Conclusion
This study used two approaches, OLS and QR to investigate factors effect on student's GPA. The results of this study have illustrated that investigating GPA was a critical problem among all university colleges and not just science colleges, since the criteria of admission to university colleges depend on these factors in the admission process. Furthermore, the results have revealed that, not only these three factors affected GPA, there are other factors such as department and gender. This indicates that in the future researchers should focus on all variables which could also affect GPA. University criteria for admission should concentrate on the impact of important determinants through all quantile rates to establish policies of admission to science colleges or to introduce a new criterion for admission and add them to the admission requirements for admission. The study has a limitation in terms of data in explanatory variables, such as, the students' attendance, study hours, place of residence, mother education, age, and teaching methods.

Recommendations
Based on the study results, some suggested recommendations are: 1. Apply the regression methods to all colleges in the university, 2. Think to add a new criterion for admission according to the privacy of every college or university, 3. The government, the concerned bodies, and policy makers need to work together to enhance the achievement of students, 4. Accessibility and quality of learning facilities to the students need to be provided by the university in order to increase the academic achievement of students.