Parameter Evaluation for a Statistical Mechanical Model for Binary Choice with Social Interaction

In this paper we use a statistical mechanical model as a paradigm for educational choices when the reference population is partitioned according to the socioeconomic attributes of gender and residence. We study how educational attainment is influenced by socioeconomic attributes of gender and residence for five selected developing countries. The model has a social and a private incentive part with coefficients measuring the influence individuals have on each other and the external influence on individuals, respectively.Themethods of partial least squares and the ordinary least squares are, respectively, used to estimate the parameters of the interacting and the noninteracting models. This work differs from the previous work that motivated this work in the following sense: (a) the reference population is divided into subgroups with unequal subgroup sizes, (b) the proportion of individuals in each of the subgroups may depend on the population sizeN, and (c) the method of partial least squares is used for estimating the parameters of the model with social interaction as opposed to the least squares method used in the earlier work.


Introduction
Education provides people with the knowledge and skills that can lead to better employment opportunities and a better quality of life. The educational level attained by an individual explicitly determines the occupational choice of that individual. All these attainments and choices of individuals are made under certain socioeconomic conditions such as peers, neighbours, family members, wealth quintile of the individual, gender, residence, etc. Those who reside in the rural areas with the poorest wealth quintile are more likely to have low education [1]. Here our focus is on how the collective behaviour of a reference group of individuals is determined by the intricate interactions among the individuals [2][3][4][5]. Collective behaviour such as self-organization has been observed in biological, ecological, and socioeconomical systems [6][7][8].
The collective behaviour of a large group of individuals may undergo sudden changes due to slight variations in the socioeconomic structure of the group. For instance a change in the pronunciation of a language due to a small immigrant population and a substantial decrease in crime rate is a result of actions taking by the authorities [9,10]. This abrupt change in macroscopic behaviour caused by the changes in interactions among constituents is referred to as phase transition from the statistical mechanics literature. Phase transition has been shown to exist for some classes of spin models designed to explain the phenomenon of ferromagnetism [11,12]. The simplest spin model within this class is the mean-field Ising model proposed in [11]. This model is tractable and has seen several applications in social sciences [13], finance [14], chemistry [15], and ecology [16]. An interesting family, which has naturally emerged in applications, is a multispecies version of the mean-field Ising model for studying magnetism in anisotropic materials [17]. This model has seen social science applications in recent works of Contucci, Gallo, and Barra [18][19][20].
The above and the works in [2][3][4][5] highlight the need for importing statistical mechanical models into the social science to offer insights into how social interactions determine social outcomes. Multipopulation Curie-Weiss model serves as a paradigm for certain binary discrete choice where individuals are to choose between two options, say to stay in school or drop out of school, use medicated mosquito net 2 Journal of Probability and Statistics while sleeping at night or not etc., subject to the constrains of their socioeconomic environments. Here our key assumption is that individuals with the same socioeconomic attributes tend to behave the same way but people with different socioeconomic attributes may behave differently [19].
The authors of [18][19][20] considered a multigroup Curie-Weiss model where the fraction of individuals in each subgroup of the population is a constant that is independent of the total population size. In particular, the work in [19] provides an estimation procedure to estimate the parameters in a multigroup Curie-Weiss model for suicidal tendencies and mode of marriage in Italy. The authors used least squares estimation procedure to estimate the parameters of their model. The study considered residence as the only socioeconomic attribute. In this work we adapt partial least squares estimation procedure to estimate the parameters for a multigroup Curie-Weiss model for educational attainment for five developing countries. Here we use the socioeconomic attributes of residence and gender. That is, we study how the collective choice of educational attainment of a group of individuals is influenced by their gender and place of residence. Our focus is on educational attainment of individuals in developing countries due to our interest in comparing the estimates from the different countries. Because of this we wanted countries that share similar socioeconomic attribute from different parts of the globe. We chose Dominican Republic from North America, Kenya from East Africa, Egypt from North Africa, Ghana from West Africa, and Indonesia from Asia. The choice of these countries is based on the availability of data from the demographic and health survey program.
The rest of the paper is organized as follows: Section 2 addresses generalities on Curie-Weiss model, its multipopulation version, and the parameter estimation procedure for the model. A case study on how educational choices is influenced by gender and residence is presented in Section 3. Section 4 discusses the main findings of the case study and we conclude the work in Section 5.

The Curie-Weiss Model
The Curie-Weiss model is made up of an energy function (Hamiltonian) that assigns interaction energies to spin configurations. This energy function takes the form where = ±1. The energy function consists of two parts, namely, the interaction part modulated by the interaction strengths and the external field part also controlled by ℎ . The interaction between neighbouring spins tends to induce alignment of the neighbours if > 0; i.e., both neighbours will prefer to be either +1 or both −1 if > 0. Otherwise the neighbours will prefer to assume different spin values. The parameter ℎ is the external magnetic field applied to site . If the magnetic field ℎ is positive it favours +1 spin values. On the other hand if the field is negative it favours −1 spin values. Hence, for each site , the external field contributes to the energy function by a term of ℎ [25]. This model is of a mean-field type as each spin interacts with the rest of the spins. The effect of all the other spins on any given spin is approximated by the average effect of the rest, and this makes computations easy. In this paper we will use the Curie-Wiess model as a benchmark model for discrete choice with social interaction. More precisely, we are interested in how educational attainment choices of individuals from some selected countries are influenced by the interaction among the individuals and the socioeconomic attributes of gender and residence. For related applications of statistical mechanical models to social science we refer the reader to [2][3][4][5]26].
. . Multipopulation Curie-Weiss Model. This work uses partial least squares estimation procedure developed in [27] for the multipopulation Curie-Weiss model to estimate parameters in an interaction based logit model for educational attainment in some selected developing countries. Here our key assumption is that individuals with the same socioeconomic attributes tend to behave the same way but people with different socioeconomic backgrounds behave differently. This assumption helps us to reparametrise the parameters in the Hamiltonian (1). Thus we will focus on finding a suitable parametrisation for the interaction coefficient and a systematic procedure that allows us to estimate the parameters characterizing the model from data. From our discrete choice model each individual is assigned socioeconomic attributes, = ( (1) , (2) , . . . , ( ) ) , with ( ) ∈ {0, 1} .
Therefore a population of size can be partitioned into 2 groups that do not overlap. Each of the groups is identified by one of the elements of {0, 1} . Let be the set of individuals in partition for = 1, . . . , 2 and | | = . Therefore We suppose further that, for each = 1, . . . , 2 , Our assumption above implies that individuals in the same group or partition are characterised by the same socioeconomic attribute; i.e., individuals with the same socioeconomic attributes are characterised as a group or partition. Therefore, it follows from our assumption above that all individuals in a partition or a group have the same private incentive ℎ and for any pair of groups and , = Journal of Probability and Statistics 3 for every ∈ and ∈ . It follows from this assumption and (1) that Here is the average decision for the individuals in group or partition . Our model has now been changed from individual choices to group choices. Note that returns the level of satisfaction for the entire population.
measures the influence group has on group and also it is the social incentive of groups and to interact. When is positive, then it implies that the groups are satisfied if their empirical means have the same signs; otherwise the empirical means of the groups prefer to have different signs; i.e., it is not encouraging or rewarding for the groups to interact with one another. See Figure 1 for an artistic impression of the 's. ℎ is the private incentive of the group , describing how the group is satisfied with itself.
Suppose the spins 1 , 2 , . . . are independent and identically distributed sequence of random variables with We denote by the corresponding product measure on Ω = {−1, 1} . The equilibrium state associated with the Hamiltonian in (4) is given by where is the partition function of the model and is the law of vector of empirical means = ( 1 , . . . , ) ∈ [−1, 1] under and we have set = 2 . In (8) we have used that Note that, for any ∈ [−1, 1] , In what follows , ℎ, and shall be as follows The pressure function of the model is then given by The large behaviour of the model is governed by the pressure function. It is known from [28] that the thermodynamic limit exists. The proof for the case = , for all , was earlier given in [18,20].

Theorem 1.
For any choice of the parameters , ℎ, and , the limiting pressure admits the following variational representation where and is given in ( ).
In this work we will consider the attributes of gender and residence, i.e., = 2. This implies that we have two socioeconomic attributes and each takes two possible values; therefore we have 4 partitions. In what follows we put = ( /2) and ℎ = ℎ The 's for our case become The proof of this theorem is given in [29]. In [18] the case = 1 of this model was studied under the assumption that = , for all . The model's thermodynamic limit was proved and a rigorous derivation and some analytical properties of the model's solution were given. In particular, it was shown there that the model factorizes completely, so that all the information about the equilibrium states of the model is captured by the self-consistency equations (16).
This allows us, in particular, to write the probability that the ℎ individual in group will choose ∈ {−1, +1} to be given by From here we observe that the expected value of ℎ individual's choice is given by Therefore, the self-consistency equations (16) imply that, for every = 1, 2, . . . , , Note from (21) that is the average decision level for the individuals in group . Further, the expression for 's in (18) is a linear regression model with respect to parameters 's and ℎ 's. The private incentive part ℎ is also modelled as a linear regression of the attributes (2) as follows where is the number of attributes, 's are the relative weights that individuals associate to their socioeconomic attributes, and 0 is the homogeneous private incentive for all the individuals. In our case = 2. It follows from here that , , and 0 are the parameters to be estimated. In the case = 0 for every , we get the noninteracting version of the model. In that case we have that . . Estimation. Due to the factorization property of the model in the thermodynamic limit, it follows from (21) that the , defined in (5), is an estimator for given in (21). Note that, for large enough , is approximately equal to . Therefore, in what follows we will suppress the dependence of . Thus and ℎ will be independent of .
Therefore, it follows from (21) that tanh( ) is the model's prediction and is the observed quantity which is estimated by the empirical average choice . The model parameters are then estimated with the help of least squares method. Thus we need to find the parameter values which minimize where is the estimated average choice of group . The function tanh( ) is nonlinear and will make computations very cumbersome. Berkson back in the nineteen-fifties, when developing a statistical methodology for bioassay, encountered a problem of this nature [30]. This stimulus-response kind of experiment bears a close relationship to the natural kind of applications for a model of social behaviour, such Journal of Probability and Statistics 5 as linking stimuli given by incentive through policy and media, to behaviour responses on the part of individuals in a population. Furthermore, the same approach is used by statistical mechanics, for example, within the context of finding the proper order parameter for a given Hamiltonian [31]. Since is a linear function of the models parameters, and tanh is an invertible function, then the above error estimation is reduced to the following The method breaks down or becomes inefficient whenever the average choice of group is equivalent to +1 or -1, since arctanh(±1) = ±∞. If the sizes of the groups are large enough this challenge or situation will be escaped. In the next section we apply the above estimation procedure to estimate the parameters for educational choices against the socioeconomic attributes of gender and place of residence.
In the interacting case, the independent variables are correlated. Due to this the least squares method is not appropriate. The partial least squares estimation is used in that case.

Case Study
The data used in this case study is taken from five different developing countries, namely, Ghana, Kenya, Egypt, Dominican Republic, and Indonesia. Our choice of these countries is based on our desire to compare how interaction influences choices made by individuals from countries with similar characteristics and on availability of data. Our data was taken over a six-year period from reports gathered by the Demographic and Health Surveys Program. These reports are national representative surveys of individuals in the various countries sponsored by USAID, UNICEF, UNFPA, UNDP, TheGlobalFund, ILO, Daninda, and other national bodies. Our interest is in how place of residence and gender influence educational choices of individuals.
Under this section we will be looking at data coming from Ghana Demographic and Health Survey, Egypt Demographic and Health Survey, Kenya Demographic and Health Survey, Encuesta Demográficay De Salud República Dominicana, and Indonesia Demographic and Health Survey [1,[21][22][23][24]. Individuals in this report are categorized according to their socioeconomic attributes such as level of education, wealth quintiles, place of residence (rural or urban), gender (male or female), age, etc. With the help of the attributes of place of residence and gender, the sampled population is categorized into four groups. Here, individuals with same attributes are put together in a group. The key assumption here is that individuals with the same attributes will tend to behave the same way. With these choices of attributes we define and Note that if (1) = 0, then it means that that attribute has no effect on the private incentive of the individual. In the sequel we will analyse for each of the five countries the effect of social interaction and lack of it on educational choices. We will employ the above estimation procedure and the reparametrisation of the parameters of the model according to the attributes of place of residence and gender to study educational choices made by the people in these countries. Table 1 contains the data used for the various countries.
. . Educational Attainment. Here we are looking at groups of individuals that have been partitioned according to these two binary attributes of gender (1) and residence (2) . We would want to investigate how residence and gender influence choices of educational attainment for each of the countries selected. Under this case study we will be looking at individuals choosing between having some level of education against no education. With this we code the choice of the ℎ individual as follows: Since the data is over different period of time we will sum our quantities of interest over the different years.
Note that where , is the number of people in group with some education for year and , . is the number of individuals in group with no education for year and is the number of individuals in group for year . With this the averages for the various years and countries can be calculated from the data set. The estimation for the parameters of the model is obtained by the use of the R statistical software. Tables 2 and 3, respectively, contain the results for the model with interaction and the model with no interaction.

Discussion
The estimates for the noninteracting model will be followed by discussion on the interacting model. In the above case Table 1: Data for the selected countries grouped according to residence and gender. Taken from [1,[21][22][23][24].    Females in urban area 2 Males in urban area 3 Females in rural area 4 Males in rural area study there are four groups and these groups are explained in Table 4.
. . Noninteracting Case. When the sum of thê's of the private incentive of a group is positive then that group will make a choice favouring having some education and when the sum is negative then the group will make a choice favouring no education. The estimates for this case have been collected in Table 3.
The sum of thê's are all positive except for Egypt having a negative sum. Thê's of Dominican Republic are all positive signifying that the attribute of gender and residence has a significant contribution to the private incentive of the people in Dominican Republic when it comes to educational attainment. For the same country, the estimate of the private incentive for gender1 is close to zero. Implying gender does not really contribute to the private incentive of individuals towards educational attainment when interaction is absent in Dominican Republic. The estimates for1 for the rest of the countries are all negative indicating that being a female favours the no education choice.
All the selected countries have positive estimates for the private incentive of residencê2. This implies that residing in the urban area has a positive effect on the choice of attaining some level of education in those countries.
The base private incentivê0 has positive values for all the five countries, signifying that individuals will prefer the choice of having some level of education. In fact, the base private incentive part is greater than the sum of the estimates for the private incentives associated with gender and residence. Thus choices under the noninteracting model are dominated by the base private incentive.
. . Interacting Case. The interacting model has a utility function that consists of both social and private incentives. Whenever the coefficient of the social incentivêfor groups and is positive, individuals in groups and prefer to imitate themselves, while when̂is negative, the average choices of individuals in groups and will prefer to have different signs. Here conformity is not rewarded. When thê's of the private incentive of a group sum up to a positive value, then at the private level individuals in that group will make a choice favouring the attainment of some level of education and when it sums up to a negative value 8 Journal of Probability and Statistics    then individuals in the group will make a choice favouring no education. The estimates for this case are found in Table 2. Note that̂1 1 is the interaction strength of females in an urban area interacting with themselves. The estimate for̂1 1 is negative for all the countries except for Dominican Republic, where the estimate is positive. This signifies that imitation is not rewarding in all the countries when it comes to females in urban areas interacting with themselves, while such an interaction is rewarding in the Dominican Republic. The estimatê2 2 is positive for all the five countries. Thus males in urban area interacting with themselves could reenforce either attainment of some level of education or no education. 34 is also positive for all the countries except Egypt. It can be observed that Dominican Republic has a higher estimate than the other four countries. This implies that females in the rural areas in the Dominican Republic have greater influence on males in the rural areas when it comes to educational attainment. The estimate for Egypt is negative.̂4 4 is positive for Ghana and Indonesia but it is negative for Kenya, Egypt, and Dominican Republic.
For̂4 1 , Indonesia has the highest estimate. This indicates that in Indonesia males in rural area have greater influence on females in urban area. Ghana and Kenya have negative estimates for this parameter.
The estimatê1 is positive for Egypt, which indicates that gender has a significant effect or influence on the educational choices made by individuals in that country. On the contrary, the estimates are negative for the remaining countries.
. . Model Diagnostics and Validation. Under this section our interest is to look at how well our interacting model fits the data. The partial least squares (PLS) method was used to estimate the parameters of the interacting model found in (18). From that equation the is used as our dependent variable and as the independent variable. The PLS method analyses (predicts) a set of dependent variables from a set of independent variables (predictors). In this method, orthogonal factors, called latent vectors, are generated from the independent variables and those factors arranged in decreasing order of their eigenvalues. The choice of the number of latent vectors to use is dependent on the vectors that can best explain the covariance between the independent and the dependent variables [32].
Our results and analyses are performed in the R statistical software, version 3.5.2. Tables 5 to 9 indicate the amount of variances explained in the independent variables ( ) and the dependent variable ( ) for each of the selected countries in terms of the latent vectors used in the prediction together with their root mean square error of prediction (RMSEP).
It is observed from Table 5 that the first three latent vectors explain 96.45% of variance in the and 65.42% of variance in the , for the Ghanaian case. The variances explained in the dependent variable are good enough to be kept for modelling or prediction but considering the variances explained in the independent variable suggests that the first three latent vectors are not good enough. When the four latent vectors are considered together, it is observed that 93.63% and 96.60% of variances in the independent and dependent variables are explained, respectively. The RMSEP is observed to be decreasing as the number of latent vectors is increased in all the countries with the exception of the Kenya case. Similar trends are for the remaining four countries in Tables 6, 7, 8, and 9.

Conclusion
The above study gives credence to the potential of statistical mechanical models to socioeconomic applications, as has been suggested by the authors of [2][3][4][5]. The statistical mechanical formulation allows us to use interactions among individuals to explain the macroscopic behaviour of such socioeconomic systems. Usually, what the socioeconomist takes as their building blocks turn out to be the limiting objects that result from statistical mechanical models. In fact, the statistical mechanical model studied here is a special case of a class of probabilistic choice models in the socioeconomic literature called the Luce or the multinomial logit models. There are other probabilistic choice models such as multinomial-probit models and the generalized extreme-valued models [33]. The question here is what are the statistical mechanical analogues of these models? The understanding gleaned from these statistical mechanical models will go a long way to aid the proper understanding of the socioeconomic models. Further, starting from the statistical mechanical models we can properly reparametrise the parameters of the socioeconomic models to aid their estimation. In fact, all these should result in building a software to make these models accessible to practitioners.

Data Availability
The data used in this study are survey data carried out by the Demographic and Health Survey program in developing countries. The data used are found in [1,[21][22][23][24].