Determination of Support Vector Boundaries in Generalized Maximum Entropy for Multilevel Models

Generalized Maximum Entropy (GME) approach is one of the alternative estimation methods for Regression Analysis. GME approach is superior to other classical approaches in terms of parameter estimation accuracy when some or none of the assumptions of classical approaches are violated. However, determining bounds of parameter support vectors is one of the open parts of this approach when researchers have no prior information about the parameters. If support vectors cannot be determined correctly, parameters estimations will not be obtained correctly. There are some theoretical studies about GME for different datasets in the literature, but there are fewer studies about how to determine parameter support vectors. To obtain robust parameter estimations in GME, we introduced a new iterative procedure for determining parameter support vectors bounds for multilevel dataset. In this study, the new iterative procedure was applied for multi-level random intercept model and the new procedure was tested both simulation study and the real life data. The Classical and the new procedures of GME estimations were compared to Generalized Least Square Estimations in terms of Root Mean Square Error (RMSE) statistics. As a result, the estimations of the new approach provided lower RMSE values than classical methods.


Introduction
Shannon was defined the term "Entropy" as a measure of uncertainty in communication theory in 1948 and the basic principle of Generalized Maximum Entropy (GME) is based on Jaynes' Maximum Entropy Principle (Shannon, 1948;Jaynes, 1957).Golan, Judge and Miller generalized this principle for regression framework (Golan, Judge, & Miller, 1996).In this approach, Golan et al. maximized Shannon's entropy formula under model consistency constraints.GME approach requires fewer assumptions than the classical methods and it has been used an alternative approach for both classical linear and nonlinear estimation models (Golan, Judge, & Miller, 1996).The most important part of GME is re-parameterization of regression coefficients and the error term processes.The main topic of GME approach is determination of support vector boundaries when researchers have no prior information about the parameters.Thus, this approach has been receiving increasing attention in the statistics literature.
In this study, Random Intercept Model and GME Approach were combined.The main purpose of this study is to determine parameter support vector boundaries which allow to obtain more consistent parameters than classical methods for multilevel data.Therefore, the determination of error vector boundaries will not be taken into consideration in this study.Thus, the error vector boundaries were determined according to the literature by Pukelsheim (3σ rule).(Pukelsheim, 1994).
The outline of this paper organized as follows.The literature review was given in Section 2. We introduced Multilevel Modeling and briefly describe the GME estimation process in Section 3. In Section 4, we suggested the new procedure for determination of parameter support vector boundaries in Multilevel Random Intercept Models and we presented both real-life and simulation study.Also we compared the results obtained from different methods.Section 5 is a brief conclusion of the study.

Literature Review
In the literature, the term "Entropy" (and also known as "Maximum Entropy") has been widely used in many disciplines such as thermodynamics, communication, education and statistics.In previous studies, researches were proposed new approaches which could be used rather than classical approaches and compared their results to different methods with simulation studies (Al-Nasser, 2014;Al-Rawwash & Al-Nasser, 2011).Also, there were several attempts to answer the question of "How parameter and the error term support vectors could be determined in the GME approach?"(Henderson, Golan & Seabold, 2015;Ciavolino & Calcagni, 2014;Golan & Gzly, 2012;Fernandez-Vazquez, Mayor-Fernandez & Rodriguez-Valez, 2008;Caputo & Paris, 2008), especially when researchers have no prior information about distribution of the parameters.The Maximum Entropy principle was also combined with Longitudinal Analysis, Data Envelopment Analysis, Multilevel Models, Regression Analysis and Logistic Regression Analysis in statistical area in order to compare GME results to the results of classical approach (Al-Nasser & Al-Atrash, 2011;Al-Nasser et al., 2010;Gastón & Garcí a-Vinas, 2011;Donoso et al., 2011).Furthermore, alternative solutions were proposed when data have the problem of multicollinearity and heteroscedasticity (Akdeniz et al., 2011).
Abellan, Baker and Coolen studied on the Nonparametric Predictive Inference (NPI) model for the multilevel data and proved that the lower and upper probabilities of the NPI could be obtained only by using the singleton probabilities (Abellan, Baker, & Coolen, 2011).
The performance of the GME estimator in both large and small samples was studied to assess the efficiency of the results in terms of estimation accuracy (Mittelhammer, Cardell, & Marsh, 2013;Gastón & Garcí a, 2011).
GME Approach was combined with Hierarchical Cumulative Logit Model in the study of Donoso, Grange and Gonzá lez in 2011.They compared the results of Hierarchical Cumulative Logit Model with Maximum Likelihood estimation using Monte Carlo simulations (Donoso, Grange, & Gonzá lez, 2011).In conclusion, they showed that the simulations produced reduced-bias in the estimates of the subjective value of time.
An alternative solution for the problem of multicollinearity was suggested by Akdeniz, Çabuk and Güler.Appropriate constraints were added to the classical Generalized Maximum Entropy Approach according to the characteristics of the relationship among independent variables and the results were compared with OLS (Akdeniz, Çubuk, & Güler, 2011).
The mathematical properties of Entropy, Maximum Entropy, Minimum Cross Entropy and Maximum Entropy Leuven Method were also explained in detail in Altaylıgil's study in 2008.The performance of the parameter estimations of this method was tested with OLS, Generalized OLS and Ridge by using Monte Carlo simulations (Altaylıgil, 2008).
In all of these studies, new algorithm or methodology comparison between classical methods and GME were provided, but there were not many study about how to determine parameter support vectors (especially bounds of support vectors), which is the missing part of this approach especially when researchers have no prior information about parameters distributions.

Methods
Multilevel Modelling is a generalization of linear regression models and such can be used for a variety of purposes, including prediction, data reduction and causal inference from clustered or hierarchical datasets (Gelman, 2006;Raudenbush et al., 2005;Raudenbush & Bryk, 2002).Also, GME, which is based on optimization technique, is one of the alternative solution approaches of Linear and Non-Linear Models.In this approach, all of the unknown parameters (β) and the error term (e) are re-parameterized by using finite-dimensional known support vectors z and v (Golan, Judge & Miller, 1996).Similar to the coefficient re-parameterization process, e could be re-parameterized as a finite and discrete random variable with 2≤J≤∞ possible outcomes (Golan, Judge, & Miller, 1996).Adaptation of the re-parameterization process to Random Intercept Models is given in the next section.

Adaptation of GME Approach to Random Intercept Models
The study of Al-Nasser in 2010, GME approach was adapted to Multilevel Random Coefficient Model (Al-Nasser et al., 2010).In this study, GME approach was adapted to Random Intercept Model which was illustrated by Raudenbush and Bryk (2002) and Raudenbush et al. (2005) and a new iterative approach was proposed for determination of parameter support vectors for Random Intercept Model.Two-level Random Intercept Model can be expressed as two equations. (1) In Equation 1, i refer to level-1 units while j refers to level-2 units.Y ij refers to response variable for level-1 unit i within level-2 unit j. β 0j is the random intercept for level-2 unit j while β 1j is the random slope of X i of unit j. r ij is the residual term for unit i within unit j. (2) In Equation 2, and are the intercepts.and represent slopes predicting β 0j and β 1j respectively.Furthermore, U 0j is the level-2 random errors.Finally, Equation 1 and 2 could be written as in Equation 3.
(3) In Equation 3, there are four unknown parameters and two error terms.They will be re-parameterized in order to use Generalized Maximum Entropy Approach (Al-Nasser et al. 2010).
After re-parameterization process, the model could be written as below.
Therefore, the GME model for the Two-Level Random Intercept Model can be expressed by the following nonlinear programming system: Subject to: The new optimization problem for Random Intercept Model is solved by using Lagrangian Method.

The Procedure for the Determination of Parameter Support Boundaries
In the literature, there are two main rules for determining support vector boundaries which are based on prior knowledge about parameters and the error term (Golan, Judge, & Miller, 1996).
• Support vector boundaries might be determined according to the prior information of parameters and the error term.
• They might also be determined extensively enough to include population parameters around zero and Pukelsheim's 3σ rule for the error term when the information about the parameters and the error term does not exist.
In the first rule, for example, if it is assumed that the mean, minimum and maximum values of β 1 are 1, 0.5 and 2, respectively; the parameter support vector should be z 1 =[0.5 1 2] for M=3 and z 1 =[0.5 0.75 1 1.5 2] for M=5.However, in the literature, it was not clearly identified how accurate support vector boundaries were determined when the researchers had no prior information about the parameters and the error term.As a result of these literature findings, the number of discrete points of parameters and the error term support vectors were selected as five (Golan, Judge, & Miller, 1996;Al-Nasser, 2011).
This study aimed to obtain prior information using the information of the current dataset.For this aim, the following steps, which were called as repeated sampling with replacement in Depren's study, were adopted for multilevel dataset (Depren, 2014): 1.A new sample (n and 2n sample size) was created by using repeated sampling with replacement (n is the total number of observation).
2. Restricted Maximum Likelihood approach was used to obtain prior parameters without checking whether the data met the required assumptions, such as multicollinearity, autocorrelation and homoscedasticity or not.
4. All the obtained parameter estimations were sorted in an ascending order.
5. %5 of the top and the bottom values of the parameters (outliers) were extracted from the parameter matrix.
6.The support vector boundaries of each parameter were determined according to Equation ( 8).
Since sample size (n), repeats (t) and parameter support vectors (z) are the parameters of the procedure to be identified, the procedure is much more complex than standard GME or other estimation techniques.However, simple code can be written in SAS, R or in other package programs to run the procedure.Thus, researchers can overcome this complexity in business life.
The results obtained from classical and new approaches were compared by using RMSE (Miller, 2002;Timm, 2002).

Application
The new procedure was tested in both simulated and real-life datasets.In this study, the following questions to be answered were which sample size should be chosen (n or 2n) and how many repeats (t) should be run for sampling with replacement procedure.

Background of Simulated Data
The simulation study was performed under following assumptions: 1. Generate 1000, 2000, 3000, 5000 and 10000 random sample of size n and 2n with repeated sampling with replacement technique.
5. Five-point support vectors for parameters and the error term were used for GME estimator (Al-Nasser 2011).
6.For determination of parameter support vector boundaries, two different alternatives were tested.
i. Parameter support vector bounds were determined extensively enough to include the population parameters around zero and these bounds were narrowed down in every iteration for each sample.ii.Parameter support vector boundaries were determined according process which was explained in Section 3.2.
7. The error support bounds were determined by Pukelsheim's 3σ rule.e = [-3s -1.5s 0 1.5s 3s] where s is the standard deviation of the dependent variable.

Simulation Study
In order to identify n and t for sampling with the replacement procedure, Once the results of the different scenarios were compared according to RMSE, there was no significant difference for all scenarios in terms of β coefficients.For this reason, identifying n and t is not important for simulated dataset.In this study, first scenario (2n sample size and 1000 repeats) was chosen for further analysis.
The parameter support vectors, used for all datasets, were shown in Table 2.In this section, Support Vector I, II and III were used for Classical GME estimations and Support Vector IV was used for proposed GME estimation.Coefficients and standard errors were given in Table 3.In the new iterative approach (Support Vector IV), standard errors were estimated relatively small.

An Application to Real Life Data
New procedure was tested with the data of the Programme for International Study Assessment (PISA) conducted in 2009.Data consists of a total sample of 515.985 students who are nested within 1.535 schools (OECD, 2009).Similar to the study of Kılıç et al. (2012), Turkey and neighbouring countries of Turkey, which are Bulgaria, Greece, Azerbaijan, Russian Federation, Israel, Serbia, Romania and Jordan, were included in this study in order to examine learning strategies accounted for mathematics achievement.Thus, the results of two studies could be compared in terms of estimation accuracy.In this study, the study of Kılıç et al. (2012) is named as referenced study.OECD Average 496 PISA mathematics test score of 42.417 15-year-old students were analysed.Three-level random coefficient model was used to model differences across countries and across schools.In this study, mathematics achievement was considered as a dependent variable.At the first level, gender, socio-economic status, elaboration, memorization, control strategy, home educational resources and cultural possession were considered.School size and student-teacher ratio were considered at the second level and gross domestic product (GDP) was considered at the third level variables.Mathematics achievement score of countries are given in Table 4.
The best performer country is Russian Federation while the worst performer country is Jordan in PISA 2009 study in terms of mathematics achievement.Turkey is at rank 4 among these countries.First-Level Variables; 1. Gender: Male coded as 1 and Female coded as 0.
2. Socio-Economic and Cultural Status (ESCS): The index of ESCS was derived from three indices: home possessions, higher parental occupation (HISEI) and higher parental education expressed as years of schooling.Second-Level Variables; 1. School Size (SCSIZE): Total number of male and female students in a school.

Student-Teacher Ratio (STRATIO):
The number of students per teacher in a school.
Third-Level Variable; 1. Gross Domestic Product (GDP): Gross Domestics Product of a country.Since GDP has right skewed distribution, Log(GDP) is used in this study.
Descriptive statistics of these variables are given in Table 5. Male-Female ratio was 50%-50%.Vector IV) produced lower RMSE and it was significant that the iterative technique required no assumptions regarding distribution or parameters.
Coefficients and standard errors are given in Table 9.In the new iterative approach (Support Vector IV), standard errors were estimated relatively small.Furthermore, RMSE obtained from new approach was smaller than other alternatives.

Conclusion
In the literature, there are many different approaches in order to make a robust estimation.The GME estimator is a one of the robust estimators resistant to multicollinearity, heteroscedasticity and the existence of outliers.Although it was not necessary to make strict assumptions about parameters or population distributions, the most important point was to specify discrete supports for the coefficients and the error term in this approach, which had a significant effect on the results obtained.
In this study, a new approach based on iterative process was presented for Random Intercept Models.The Classical GME Approach, using wide parameter support boundaries, was mostly suggested in the literature in case the researcher had no prior information.As opposed to this technique, a new iterative approach is suggested in this paper which will help the researcher to obtain consistent parameters without prior information.Furthermore it was adopted multilevel datasets.
The new approach was tested on both simulated and real-life datasets.The obtained results proved that the suggested approach provides better parameter estimates and lowest RMSE than classical methods.The results produced with support vectors suggested in the literature (Support Vector I, II and III) were not closer to the true regression parameters.However, parameter estimations of proposed approach were relatively closer to the true regression parameters.

Figure 1 .
Figure 1.RMSE statistics of the classical and alternative approaches for simulated dataset As shown in the Figure 1, as the parameter support vector bounds were widened, RMSE decreased.The new estimation (iterative) technique (Support Vector IV) produced better RMSE statistics.

3.
Cultural Possession (CULTPOSS): It was derived from students' responses to the three items listed below.a) Classic literature, b) Books of property, c) Works of art 4. Home Educational Resources (HEDRES): The PISA 2006 index of home educational resources was derived from students' responses to the some items.a) Desk for study, b) A quiet place to study, c) Your own calculator, d) Books to help with your school work, e) A dictionary 5. Memorization (MEMOR): It was derived from students' responses to the four items measuring preference for memorisation/rehearsal as a learning strategy for mathematics as listed below.a) I go over some problems in mathematics so often that I feel as if I could solve them in my sleep, b) When I study for mathematics, I try to learn the answers to problems off by heart, c) In order to remember the method for solving a mathematics problem, I go through examples again and again, d) To learn mathematics, I try to remember every step in a procedure.6. Elaboration (ELAB): It was derived from students' responses to the five items measuring preference for elaboration strategy as listed below.a) When I am solving mathematics problems, I often think of new ways to get the answer, b) I thing how the mathematics I have learnt can be used in everyday life, c) I try to understand new concepts in mathematics by relating them to things I already know, d) When I am solving mathematics problems, I often think about how the solution might be applied to other interesting questions, e) When learning mathematics, I try to relate the work to things I have learnt in other subjects 7. Control Strategy (CSTRAT): Control learning strategies was derived from students' responses to the five items measuring preference for control as a learning strategy as listed below.a) When I study for a mathematics test, I try to work out what are the most important parts to learn, b) When I study mathematics, I make myself check to see if I remember the work I have already done, c) When I study mathematics, I try to figure out which concepts I still have not understood properly, d) When I cannot understand something in mathematics, I always search for more information to clarify the problem, e) When I study mathematics, I start by working out exactly what I need to learn.

Table 1 .
Table 1 was prepared.Model outputs are shown by n and t values for all datasets.Outputs of the alternative approach for simulated dataset ( ):Standard Deviation

Table 2 .
Parameter support vectors used in analysis

Table 3 .
Coefficients and standard errors of different parameter support vectorsSupport Vector I Support Vector II Support Vector III Support Vector IV ( ):Standard Deviation

Table 4 .
Mathematics scores of countries

Table 5 .
Descriptive statistics of the variables

Table 7 .
Coefficient estimates and standard deviation for different sample size and repeats

Table 9 .
Coefficient estimates and standard deviation of the alternative approach for PISA dataset ( ):Standard Deviation