An analysis of the learning performance gap between urban and rural areas in sub- Saharan Africa

The learning gap between urban and rural areas is a persistent problem in many sub-Saharan African countries. Previous studies have found that the urban-rural learning gap is attributed to the fact that student characteristics and school resources are different in urban and rural areas. Our study updates this finding by using the latest dataset and further examines the changes in the attributed sources over time. Using 15 educational systems in sub-Saharan Africa, we examined 4 potential sources of the gap: student, family, teacher, and school characteristics. Our results reveal that the urban-rural learning gap in recent years is attributed mostly to differences in school and family characteristics. We also found that the attribution remains the same over time from 2004 to 2011 and that the attribution to family characteristics’ differences became slightly greater than the one to school characteristics’ differences.


Introduction
In many countries of sub-Saharan Africa, the learning performance of rural students is much lower than that among urban students, which is seen as a concerning gap related to knowledge disparities or education inequality. The learning performance gap between urban and rural students is a worldwide phenomenon (Echazarra & Radinger, 2019;William, 2005). In most countries, the gap is due to rural students' underperformance, whereas in some countries where governments or/and international organisations provide educational support specifically designed for rural areas, such as in Latin America, the gap is due to rural students' outperformance (Luschei & Fagioli, 2016). In sub-Saharan Africa, rural children perform consistently more poorly than urban children, and compared with the world average, the gap is particularly large (United Nations Educational, Scientific and Cultural Organization, 2017).
The question of what contributes to the urban-rural performance gap has been addressed for many countries, and studies generally show that the gap is attributed to two components; the difference in student characteristics and the difference in school characteristics (e.g., Echazarra & Radinger, 2019). For the sub-Saharan African region, two studies show similar findings, which explains that the gap is attributed to the difference in student and school resources (Burger, 2011;Zhang, 2006). While these two studies provide important empirical evidence for the region, they are somewhat out of date. Both studies used data that were collected between 2002 and 2004, and various inequalities might have been resolved or changed since then due to efforts made in the early 2000s. An updated examination is thus justified.
With this study we aimed to investigate the sources of the urban-rural learning gap by using the latest dataset, and, further, we examined the changes in the source's attributions over time. We examined four potential sources that have critical urban-rural differences: student characteristics, family characteristics, teacher characteristics, and school characteristics. The research questions we address are twofold: to what extent are the four characteristics associated with the urban-rural learning gap, and how do the association proportions change over time?
In light of an increasing demand for ongoing educational efforts to close the regional learning gaps in sub-Saharan Africa and many other emerging economies, knowing the sources of learning gaps is important. If efforts are made without knowing the proper sources, we may end up widening the gap or, even worse, creating a new disparity. Our study provides the latest evidence about the sources of the learning performance gap and changes over time, which helps the efforts to close the gap.
In the next section, we first review the different conditions between urban and rural areas, which are related to learning performance in sub-Saharan Africa. Then, we present the data and the estimation technique. After that, according to the two research questions, we first present the descriptive summary of the differences in characteristics and the estimation results of the decomposition analysis, and then the results of changes in the decomposition result over time. Finally, we conclude.

Urban-Rural Differences in sub-Saharan Africa
There are various differences related to learning performance between children living in urban areas and those who live in rural areas in sub-Saharan Africa. Looking at students, the socio-economic status (SES) of their family differs significantly between urban and rural children. The SES in the rural area, where the poverty rate is 46%, is generally lower than that in the urban area, where the poverty rate is 18% (Beegle, Christiaensen, Dabalen & Gaddis, 2016). Students who have low SES often fail to learn adequately because their families require children for farming or household work, or because they are unable to afford schoolrelated fees such as school lunch, boarding costs, or school uniforms (Chinyoka & Naidu, 2014;Ohba, 2011;Sumida, 2017). Parents' education level is another difference. Parents of children in rural areas generally have a lower level of education (Burger, 2011;Irvin, Meece, Byun, Farmer & Hutchins, 2011). Parents who have a lower level of education tend to attach a lower value to their children's schooling and force children to work at home or do not provide adequate support for their children's learning (Glick & Sahn, 2000;Jenkins, Anyabolu & Bahramian, 2019;Lloyd & Blanc, 1996). The language used at home also differs. In sub-Saharan Africa, the vast majority of people speak native and local languages rather than the languages of instruction (Lewis, Simons & Fennig, 2016), and classes are taught in local languages at early grades, such as grade 3 or 4, and then changed to the official language of instruction (Trudell, 2016). Generally, rural students are more unfamiliar with the language of instruction (Bashir, Lockheed, Ninan & Tan, 2018), so transitioning language becomes particularly difficult for rural children (Kaahwa, 2011).
Comparing schools in urban and rural areas, urban schools generally have more resources and better facilities, such as books, learning materials, and educational equipment (Bashir et al., 2018;Mulkeen & Chen, 2008). Most urban schools have toilets, whereas rural schools do not always have adequate toilets (Viteri Chavez, 2016). The number of public schools and private schools also differs between urban and rural areas. Urban areas generally have more private schools than rural areas (Viteri Chavez, 2016), although in low-income regions like sub-Saharan Africa, private schools are expected to respond to the lower socio-economic population in particular (Day Ashley, Mcloughlin, Aslam, Engel, Wales, Rawal, Batley, Kingdon, Nicolai & Rose, 2014).
With respect to teachers, urban and rural schools have different types of teachers. In rural schools, there are more unqualified teachers and fewer female teachers compared with urban schools (Luschei & Chudgar, 2015). In rural areas, there are fewer options for safe accommodation that has basic facilities such as electricity and water, and access to health care and leisure activities is very limited. Thus, teachers who are experienced or/and female are not willing to work in rural areas (Akyeampong & Lewin, 2002;Towse, Kent, Osaki & Kirua, 2002). Even in countries where the government controls teacher deployment, the government often fails to implement deployment plans adequately, which results in unbalancing teachers' quality between urban and rural areas (Mulkeen & Chen, 2008). Additionally, the teaching quality gap among teachers grows due to the lack of school-based support (Nkambule & Mukeredzi, 2017) and difficult teaching conditions in rural areas (Ramnarain & Hlatswayo, 2018).
Among these differences, two previous studies have revealed that the urban-rural learning performance gap in the region is attributed to differences in student and school characteristics. Zhang (2006) examined 14 educational systems in sub-Saharan Africa and showed that the learning gap is strongly associated with the differences in student characteristics, which are proxied by students' ages, sexes, and family SES. She also shows that the differences in school characteristics, which includes school location, school SES, building conditions, school facility, availability of instruction resources, and teacher quality, are also associated with the learning gap. Her estimation shows that the summed differences of student and school characteristics explain all of the learning gaps in 12 out of 14 sampled systems. Burger (2011) examined the case of Zambia and similarly found a gap attributed to student characteristics and school resources. He used four variables -students' assets, parents' education levels, English skills, and pupil-teacher ratiosand the decomposition method to find that the differences in the concerned variables explain 55% of the urban-rural gap. While these two studies show important evidence for the sources of the urban-rural learning gap in sub-Saharan Africa, they are somewhat out of date, which calls for additional investigation.

Method Data
The data we used for this study are from the survey of the Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ), a regional and cross-national survey of southern and eastern Africa. It is an assessment of academic performance for grade 6 students in primary schools and includes background information about the students, teachers, and schools. The survey was conducted in four waves: 1995-1999 (SACMEQ I), 2000-2004(SACMEQ II), 2006, and 2012-2014 (SACMEQ IV). The SACMEQ IV data are not yet available as of the present study, so we used the SACMEQ III data for the first question which explored the source of the urbanrural gap, and SACEMQ III and II data for the second question which examined the changes in the sources over time. Fifteen educational systems participated In the SACMEQ III, including those in Botswana, Kenya, Lesotho, Mauritius, Malawi, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania (mainland), Tanzania (Zanzibar), Uganda, Zambia, and Zimbabwe.
Fourteen systems participated in the SACMEQ II; Zimbabwe did not.
The sample students were selected using a stratified two-stage sampling design. First, schools were selected on a probability-proportional-to-size (PPS) basis defined by the SACMEQ Co-ordinating Centre. The PPS technique allows large schools a higher probability for selection than smaller schools. Then, 25 students were selected from all grade 6 classes in the selected schools using computergenerated random numbers. The total number of observations in the SACMEQ III was 61,396; in the SACMEQ II, the number was 41,686.
To divide the sample students into the urban group and rural group, we used the school location variable in the student questionnaire. The questionnaire had four choices for the school's location, and we grouped the students who lived in large cities and small towns into the urban group and the students who lived in rural and isolated areas into the rural group (see Appendix A).
The dependent variable was reading test scores. We chose reading instead of math because the score gap between urban and rural areas for reading is larger than that for math, which allowed us to plainly observe the source associations. For the analysis, the test scores were standardised across the educational systems to have a mean of 500 and a standard deviation of 100.
For the explanatory variables, we examined four groups: student characteristics, family characteristics, teacher characteristics and school characteristics. For the student characteristics, we included four variables: older than the official age for grade 6, student's sex, experience repeating grades, and home language. For the student sex variable, we coded one if the student was female. For the over-aged variable, we coded one for students who were older than the official age for grade 6. For the experience repeating grades variable, we coded one if the student had experienced repeating a grade. For the home language variable, we coded one if the student spoke the country's instructional language at home.
For the family characteristics, we included three variables: family possessions, mother's education level, and father's education level. The family possessions variable captured the family's socio-economic status and was an aggregated value of 13 items at home including newspapers, clocks, radios, Televisions (TVs), Video cassette recorders (VCRs), cassette players, cars, motorcycles, bicycles, piped water, electricity, and tables. The aggregated value was rescaled to have a mean of zero and a standard deviation of one. For the parental education variables, we coded one for students whose parents completed at least a primary education.
For the teacher characteristics, we included four variables: teacher's sex, teacher's age, teacher's education level, and classroom resources. For the teacher's sex variable, we coded one if the reading teacher was female. For the teacher's age variable, we coded one if the teacher was younger than 30 years old. For the teacher's education variable, we coded one if the teacher had completed at least a senior secondary education. The classroom resources variable represented the quality of the teaching conditions. It included eight items in the classroom: a white board, chalk, wall chart, cupboard, bookshelf, classroom library, teacher table, and teacher chair. The total value of the eight items was standardised to have a mean of zero and a standard deviation of one.
For the school characteristics, we included four variables: the school head's sex, the school head's education level, the school type, and the school's resources. For the school head's sex variable, we coded one if the school head was female. For the school head's education variable, we coded one if the school head had completed at least a senior secondary education. For the school type variable, we coded zero if the school was a government school and one if the school was a private school. The school resources variable captured the level of the school's quality and consisted of 22 facilities: a library, hall, staff room, school head office, store room, first aid kit, sports ground, water, garden, electricity, telephone, fax, typewriter, duplicator, radio, tape recorder, overhead projector, TV, VCR, photocopier, computer, and fence. The total value was standardised to have a mean of zero and a standard deviation of one. Table 1 shows all of the variables included in this study.  (0, 1) = 1 if school head has senior secondary education and above School type Standardised value of total of 8 items availability at school

Estimation Method
To examine the association between the urban-rural learning gap and the differences in characteristics, we used the Oaxaca-Blinder decomposition technique (Blinder, 1973;Oaxaca, 1973). The Oaxaca-Blinder technique is a technique to estimate the association between two gaps or differences and to determine the extent to which differences in the observed characteristics explain the interest gap, with differences in other characteristics explaining the remainder. It is based on a linear model that uses different regression coefficients across two groups. To illustrate the model within our study, we can draw the following educational production function for each of the urban and rural groups: where and represent the mean test scores for urban and rural areas, respectively; and are vectors of the values for characteristics including student, family, teacher and school characteristics; and are vectors of coefficients for the characteristics that are calculated by the standard Ordinary Least Squares (OLS) regression; and and are random error terms.
With the knowledge of the values of and , we can compute a counterfactual of the following type: "what would the distribution of test scores of rural students be if we kept all of the observed characteristics the same as those of urban students?" With the knowledge of the counterfactual distribution of rural students, we can calculate the difference as follows: In line (3), the first bracket is the part attributed to the difference in the observed characteristics and is called the "explained component." The positive value of this bracket indicates that the difference in observed characteristics is positively related to the outcome gap, and it further means that if the difference in observed characteristics becomes narrowed, the learning gap also becomes narrowed. Therefore, it could be interpreted as the difference in characteristics that can explain the urban-rural learning gap. The second bracket is the part attributed to differences in the return structure of the observed or unobserved characteristics and is called the "unexplained component." Table 2 shows the differences between urban and rural areas for the included variables (also see Appendix B). The mean values for each area are omitted due to space constraints. The differences were calculated by subtracting the mean value of the rural area from that of the urban area. Therefore, a positive value indicates that the value in the urban area is higher than that in the rural area, and a negative value indicates the opposite.  The results show that the differences in test scores are all statistically significant except for in the Seychelles, indicating the prevalence of lower performance among students in rural areas in sub-Saharan Africa. Zimbabwe, South Africa, and Namibia have particularly large differences, with 117, 100, and 79 points between urban and rural areas, respectively, whereas Malawi, Mozambique, and Mauritius have relatively small differences, with 20, 29, and 33 point differences, respectively.

Differences in Characteristics
Regarding the student variables, differences in age, repeating grades, and language variables have a tendency towards the direction of the differences, with a negative value for the age and repeating grade variables and a positive value for the language variable. This indicates that students in rural areas are more likely to be over-aged and have repeated grades and less likely to speak the instructional language. The differences in the sex variable have mixed directions in differences or have insignificant differences. In Mozambique, Zambia, and Tanzania (Zanzibar), there are fewer female students in rural areas, whereas in Kenya and Lesotho, there are more female students in rural areas. In the other 10 systems, there is no statistically significant difference in students' genders between urban and rural areas. In the Seychelles, there are no significant regional differences in all of the student characteristics, except for age.
For the family variables, all of the systems except for the one in the Seychelles have the same pattern in terms of the differences with positive values for all of the variables. This indicates that families in rural areas have fewer possessions, and both mothers and fathers of students in rural areas have lower levels of education than the parents of students in urban areas. The differences in family possessions are particularly large in Zimbabwe, Namibia, and Tanzania (Zanzibar), while the difference in parents' education levels is large in Malawi, Mozambique, and Tanzania (Zanzibar). In the Seychelles, the regional differences in family characteristics were not significant.
Regarding the teacher characteristics, the results show various patterns in the directions of differences across the systems, except for the variable of the teacher's sex. The variable of the teacher's sex has a positive sign in all of the systems, which indicates that schools in urban areas have more female teachers than ones in rural areas. Regarding the teacher's age variable, eight systems have negative differences and three systems have positive differences. The negative values seen in the majority of systems indicate that teachers in rural areas are younger than those in urban areas. However, in three systems, those of the Seychelles, South Africa, and Uganda, teachers in rural areas are more likely to be older than those in urban areas. Regarding the teacher's education levels, many systems have positive differences, indicating that teachers in urban areas are more educated than those in rural areas. Two systems, those of Botswana and Mauritius, have negative values; this suggests that teachers in rural areas are more educated than those in urban areas. For classroom resources, nine systems have positive differences, indicating that classrooms in urban areas have more resources than those in rural areas. In three systems Botswana, Kenya, and Mauritius, rural classrooms have more resources than urban classrooms.
For school variables, the differences show a similar pattern across the systems. The differences in the variable for the sex of the school head show positive values in 11 systems. This indicates that urban areas have more female school heads than rural areas. Only two systems, those in South Africa and Zimbabwe, have more female school heads in rural areas than in urban areas. For school types, many systems had positive values, indicating that there are more private schools in urban areas than in rural areas. Likewise, the school head variables in South Africa and Zimbabwe indicated that there are more private schools in rural areas. Regarding the education of school heads, 10 systems have positive differences, indicating that school heads in urban schools have higher education levels than those in rural schools. In Malawi, the Seychelles, and Swaziland, school heads in urban schools have higher education levels than those in rural areas. As for school resources, all of the systems except for that of the Seychelles have positive values, indicating that urban schools have more resources than rural schools. Particularly, in Zimbabwe, Namibia, and Zambia, the differences in resources are large. Table 3 shows the estimation result of the Oaxaca-Blinder decomposition for 14 systems. The Seychelles was excluded in this estimation because the learning gap there was not statistically significant. The table's first part shows mean test scores of urban (U) and rural areas (R) and the learning difference (D) between these two areas. The second part shows the decomposed values of the differences, the explained component (Q), the unexplained component (N), and the proportion that each component shares for the learning gap. The third part shows the breakdown of the explained components. This part consists of four characteristics: student (QP), family (QF), teacher (QT), and school (QS). To illustrate the proportions comparably, Figure 1 is drawn based on the result of the third part.  Note. Significance levels are: *p < 0.1, **p < 0.05, ***p < 0.01. Standard errors in parentheses.

Decomposition Results
In the second part, the explained component shows generally high proportions in most systems. In 10 out of 13 systems, the explained component has more than half the proportion for the learning gap. Particularly in Botswana, Mozambique, Namibia, South Africa, Zambia, and Zimbabwe, the proportions go above 95%, suggesting that almost all of the learning gaps can be explained by the included variables. In Lesotho, Malawi, and Tanzania (mainland), the proportion is larger in the unexplained component, at 58%, 61%, and 70%, respectively. This result indicates that the learning gaps can be attributed to other characteristics, except for the included variables.
Looking at the breakdown of the explained components, the result shows that school characteristics account for the largest part of the explained components. In eight systems, including Kenya, Mauritius, Mozambique, Namibia, South Africa, Swaziland, Uganda, and Zimbabwe, the largest proportion is the school characteristics; these range from 33% to 71%. Family characteristics also greatly account for the explained components in Botswana, Tanzania (mainland), Zambia, and Tanzania (Zanzibar), which have proportions of 15% to 50%. By combining the two characteristics' proportions, more than half of the learning gap in 10 systems can be accounted for. It suggests that for these systems, the primary sources of the learning gaps are the differences in school and family characteristics. In Lesotho and Malawi, the largest proportion is the student characteristics, at 17% and 29%, respectively, indicating that in these two systems, the learning gaps can be attributed primarily to the differences in student characteristics. Teacher characteristics do not account for any of the learning gaps as a primary source in any system but are related to the learning gaps in Namibia, South Africa, Tanzania (Zanzibar), and Zimbabwe.
This result generally echoes findings of the previous two studies, but it also provides more detailed evidence about the sources. In general, we confirmed the previous findings that more than half of the learning gap could be explained by the differences in student and school characteristics in many sub-Sharan African countries. In addition, we found that by distinguishing family characteristics from student characteristics, and teacher characteristics from school characteristics, the gap is explained more by family characteristics instead of student characteristics, and by school resource characteristics instead of teacher characteristics.  Regarding the breakdown of the explained component, the fact that school and family differences are the main sources of the urban-rural learning gap did not change over time, but it was more attributable to family differences than school differences across the systems. The combined proportion of school and family differences remained at more than half of the explained components gap in nine of 13 systems for both 2001 and 2007. Meanwhile, the systems with the largest proportions in school characteristics numbered 11 in 2001 and seven in 2007, and the systems with the largest proportions in family characteristics changed from two (Mauritius and Tanzania (mainland)) in 2001 to four (Botswana, Tanzania (mainland), Zambia, and Tanzania (Zanzibar)) in 2007. Looking at individual systems, the proportions of family differences in the learning gaps increased from 5% to 50% in Zambia, from 12% to 43% in Mozambique, from 16% to 27% in South Africa, and from 35% to 46% in Botswana. One notable case is Mauritius, where the learning gap was attributed to outside factors in 2001 but became more associated with school differences in 2007. In Zambia and Tanzania (Zanzibar), the main source was family differences in 2001 and school differences in 2007.

Conclusion
This study aimed to investigate the sources of urbanrural learning performance gaps with the latest available data and to examine the changes in source attributions over time. By using the examples of 15 education systems in sub-Saharan Africa, we first documented urban-rural differences in four grouped characteristics; student, family, school, and teacher characteristics. Knowing the differences, we then estimated the associations between the learning gaps and the differences. After that, we further examined the changes in the characteristics' differences and the associations in these changes from 2001 to 2007.
Our results show that urban-rural learning gaps are mostly associated with school characteristic differences and family characteristic differences. School characteristic differences were associated with the largest proportion of the learning gap in eight of 14 systems. Family differences were associated with the largest proportion of the learning gap in four systems. Combining the two proportions of school and family differences accounted for more than half of the learning gap in 10 systems. The associations of student and teacher differences with the learning gap were very low or almost minimal. This result generally confirms the previous two studies, which showed the sources in student and school characteristics, but it also adds detailed evidence about the sources. While the previous studies looked at one or two groups of characteristics, student or/and school resources, we examined four groups of characteristics. By doing so, we were able to show that the gap's sources are associated more with family characteristics than students' own characteristics, as well as more with school resource characteristics than with teacher characteristics.
Our second analysis shows that the main sources of the urban-rural learning gaps had not changed greatly over time, but the weights of characteristics involved have slightly shifted from school to family attributes. In this study, by using the latest and best available dataset, we found that school and family differences were the main and persistent sources of urban-rural learning gaps in sub-Saharan African. However, further research is needed to explore other possible sources by including a wider range of variables. Our analysis showed that in a few systems, such as those in Lesotho, Malawi, and Tanzania (mainland), the learning gaps were not statistically associated with any of the included variables. This finding suggests that the source exists somewhere in the characteristics that we could not observe in this study. It is particularly crucial in the case of Malawi since the association with the learning gap has shifted recently from school differences to unobserved characteristics over time.