Mapping adolescent first births within three east African countries using data from Demographic and Health Surveys: exploring geospatial methods to inform policy

Background Early adolescent pregnancy presents a major barrier to the health and wellbeing of young women and their children. Previous studies suggest geographic heterogeneity in adolescent births, with clear “hot spots” experiencing very high prevalence of teenage pregnancy. As the reduction of adolescent pregnancy is a priority in many countries, further detailed information of the geographical areas where they most commonly occur is of value to national and district level policy makers. The aim of this study is to develop a comprehensive assessment of the geographical distribution of adolescent first births in Uganda, Kenya and Tanzania using Demographic and Household (DHS) data using descriptive, spatial analysis and spatial modelling methods. Methods The most recent Demographic and Health Surveys (DHS) among women aged 20 to 29 in Tanzania, Kenya, and Uganda were utilised. Analyses were carried out on first births occurring before the age of 20 years, but were disaggregated in to three age groups: <16, 16/17 and 18/19 years. In addition to basic descriptive choropleths, prevalence maps were created from the GPS-located cluster data utilising adaptive bandwidth kernel density estimates. To map adolescent first birth at district level with estimates of uncertainty, a Bayesian hierarchical regression modelling approach was used, employing the Integrated Nested Laplace Approximation (INLA) technique. Results The findings show marked geographic heterogeneity among adolescent first births, particularly among those under 16 years. Disparities are greater in Kenya and Uganda than Tanzania. The INLA analysis which produces estimates from smaller areas suggest “pockets” of high prevalence of first births, with marked differences between neighbouring districts. Many of these high prevalence areas can be linked with underlying poverty. Conclusions There is marked geographic heterogeneity in the prevalence of adolescent first births in East Africa, particularly in the youngest age groups. Geospatial techniques can identify these inequalities and provide policy-makers with the information needed to target areas of high prevalence and focus scarce resources where they are most needed.


Background
Pregnancy in adolescence can present a major barrier to the health and wellbeing of young women and their children, and can contribute to long term educational and socio-economic disadvantage [1]. As we move towards the broader post 2015 Sustainable Development Goals agenda, attention is focussed more closely on how more nuanced indicators can highlight the needs of vulnerable sections of the population, and track how their needs are addressed through policies and programmes. National level targets may well be achieved by focussing on certain sectors of the population or geographical region whilst leaving other groups lagging behind, which again points to the need for demographic and spatial disaggregation to highlight disparities. A recent study by Neal et al. [2] in Uganda, Tanzania and Kenya highlighted the marked concentration of adolescent first births (particularly among younger adolescents) among the poorest and least educated sections of the population, and also found that progress over time was poorest amongst the most disadvantaged. The study also identified very marked geographic disparities in rates of adolescent first births within the three countries at state (administrative level 1) level.
Spatial inequalities in adolescent pregnancy and births are found in both high and low income countries, and are likely to reflect underlying levels of deprivation as well as inadequate access to reproductive health services [3]. In addition, adolescent motherhood is often strongly rooted in cultural practices, and this may well lead to prevalence (particularly among the youngest age groups) being concentrated within geographical "pockets" where communities share particular beliefs, norms and practices as well as possibly suffer high levels of deprivation. Spatial mapping of adolescent pregnancies has been used in developed country contexts to understand these geographic distributions, and has identified "hotspots" of adolescent births: small localities with high levels of adolescent childbearing [4,5].
Prevalence mapping for disease or other adverse outcomes has become an important tool for policy makers in low income countries, and numerous studies have examined the spatial distribution of a range of maternal and child health and nutrition outcomes (e.g. [6][7][8][9][10]). While the value of this approach has been acknowledged with regards to adolescent programming (e.g. [11]) it has rarely been utilised for mapping the distribution of adolescent childbearing in low income country contexts. As the reduction of adolescent pregnancy is a priority in many countries, further detailed information of the geographical areas where they most commonly occur is of value to national and district level policy makers.
The aim of this study is to develop a comprehensive assessment of the geographical distribution of adolescent first births in Uganda, Kenya and Tanzania using Demographic and Household Surveys (DHS) data. In order to provide data that is useful for a range of policy makers and planners we present three separate approaches outlined by Ebener et al. [12] which can contribute to a greater understanding of spatial distribution of early first births: a. Descriptive/thematic mapping (creation of maps to convey information about a topic or theme) using choropleths b. Spatial analyses (extraction or creation of new information from spatial data) using adaptive bandwidth kernel density estimates c. Spatial modelling (spatial analysis that includes the use of statistical models to simulate phenomena) in a Bayesian framework.
These three approaches offer different perspectives and advantages for policy makers within the field of adolescent health. Descriptive mapping is generally used for presenting a visual representation of geographical variation for relatively large regions. It gives an overview of geographical inequities within countries, and a series of such maps can be used to highlight temporal trends or regions where progress in reducing adolescent births is particularly poor or good. Applying spatial analysis to information on adolescent childbearing using kernel density estimates provides an overall picture of "hotspots", which is not constrained by administrative boundaries. This can be particularly useful when looking at correlations with other factors that transcend boundaries such as ethnic groupings. Finally, spatial modelling can be used to estimate rates of adolescent first births for small areas such as districts, using additional correlated variables. These are useful for identifying pockets of high prevalence, and can assist district level policy makers in setting priorities.
We present results from the application of these three approaches and thus produce an outline of the geography of adolescent childbearing in three countries disaggregated by age at under 16 years, 16-17 years and 18-19 years. Separating out the age groups enables births among the most vulnerable younger adolescents to be identified and mapped separately. Our discussion suggests how underlying factors may explain these geographic inequalities. It also outlines the advantages and disadvantages of the three different methods, as well as highlighting how policy makers have used such data in low income countries, and the potential for future use.

Data
Data were extracted for these analyses from the most recent Demographic and Health Surveys at the time of writing for Tanzania (2010), Kenya (2008), and Uganda (2011) [13][14][15]. The sample was restricted to women aged 20 to 29 at the time of the survey, resulting in sample sizes of n = 3347 Tanzanians, n = 3167 Kenyans, and n = 3284 Ugandans. Global Positioning Systems (GPS) coordinates of corresponding cluster locations were also gathered through the DHS and mapped using ArcGIS software version 10.2.2 [16]. Participant confidentiality is maintained by the DHS through cluster displacement of up to 2 km for urban clusters and 5 km for rural clusters. For these analyses, a total of 457 clusters were used for Tanzania, 397 clusters in Kenya, and 400 clusters in Uganda. Figure 5 in Appendix 1 shows the locations of the displaced clusters with associated sample size and urban/ rural status. Of note, two districts in Tanzania contained no observed clusters containing women aged 20 to 29 years (Bukoba Urban and Pangani), while one district had only data for births between 18 and 19 years (Mafia). Data were weighted as outlined by DHS guidelines, using SAS version 9.4 software [17]. Administrative boundary shapefiles were obtained from the freely available Database of Global Administrative Areas (GADM) [18], while DHS regional shapefiles were obtained from the DHS [19], and projected using the World Geodetic System 1984 projection.
The outcome of interest was the percentage of women aged 20-29 at the time of survey who had given birth before the age of 20 years. As Neal et al.'s [2] earlier study found important differences in age patterns within the range of adolescent ages, we disaggregated the outcome into three different age groups: first birth before 16, 16-17 and 18-19 years.

Descriptive mapping
Descriptive analyses were performed and presented in Table 1, by country and age group. In addition we produced descriptive choropleth maps using ArcGIS software version 10.2. These are thematic maps in which areas are shaded proportional to the measurement of the statistical variable being displayed: in this case age at first birth. As these descriptive maps are based directly on the survey estimates for the outcome it is not feasible to carry out analysis for small areas as small sample sizes result in large confidence intervals. Thus, the maps are presented at administrative level 1. These maps employed weighted outcomes, as outlined by DHS guidelines.

Spatial analyses
Kernel density estimation (KDE) is a non-parametric method for estimating density, and uses all the data points to create an estimate of how the density of events varies over a given area [20]. It produces a smooth map in which the density at every location reflects the number of points in the surrounding area. This can then be used to create prevalence surfaces, or heat maps, by generating a ratio of case data to control data. We used this method to create heat maps of adolescent first births with the prevR package in R software [21]. Further details of the methodology used can be found in Appendix 2, and are described elsewhere in the literature [22].

Spatial modelling
The Integrated Nested Laplace Regression (INLA) modelling approach is a technique that can be used for small area estimation, which involves the estimation of parameters of sub-populations confined within a small geographical area as part of a larger survey population. It utilises a Bayesian hierarchical spatial regression modelling approach and was carried out here using the INLA package in R [23]. Such geoadditive models incorporating the INLA technique have been used previously in the DHS literature as a method to control for spatially correlated effects in a Bayesian framework [24]. By utilising a Bayesian framework, uncertainties in estimates can be quantified and presented, suggesting where future data collection efforts might be focussed.
For these analyses, proportions are presented at the administrative unit 2 level for Tanzania and Kenya, while the administrative unit 1 level was used for Uganda due to the high number of districts within the country (n = 168). By presenting provincial prevalence within Uganda, parity between geographical units can be maintained. Further methodological details can be found in Appendix 2, while associated confidence intervals and standard deviations for estimates are presented in Appendix 3.  Tables 1 and 2 show these sample characteristics broken down by country.
To provide a regional picture of adolescent first births in East Africa, choropleth maps were generated for DHS regions. Figure 1 reflects weighted sub-national proportions of women who had their first birth at less than 16 years old (Fig. 1a), from 16 to 17 years old (Fig. 1b), and 18 to 19 years old (Fig. 1c). Kenya and Uganda show marked geographic heterogeneity for first births under 16 years: for instance eastern Kenya and parts of Uganda have more than 20 % of women having a first birth before the age of 16, whereas for much of the country the figure is less than 10 %. In Tanzania there appears to be less geographical variation. Generally, as overall prevalence of first birth increases for ages 16/17 and 18/19 the heterogeneity also decreases.

Spatial analyses
Prevalence surfaces, or "heat maps", of maternal age at first birth were generated using an adaptive bandwidth technique encompassing an optimal number of persons surveyed through the DHS, similar to a nearest neighbour approach. The optimal N parameter (N opt ) used in these analyses is defined in Appendix 1, and has been published in detail elsewhere [22]. Figure 2a represents the percentage of women having their first birth before 16 years old, while Fig. 2b and c represent ages 16 to 17, and 18 to 19 years respectively. Prevalence of childbearing tended to increase with increasing age; therefore, to emphasize within-group regional heterogeneity, varying scales were used between age categories, as specified in the corresponding legend key for each map. This was done to highlight areas within East Africa which might have high prevalence of birth in a given age category, even though this proportion might be lower as compared with other age categories. The kernel density maps broadly correlate with the choropleths, although the different scales bring out more clearly inequities in the 16/17 and 18/ 19 year age groups. Again, there is less variation in Tanzania for all age groups than for Kenya or Uganda. The lack of constraint from administrative boundaries allows us to see how "hot spots" or "cool spots" cross and are unaffected by country boundaries e.g. there is an area of lower prevalence that spread along the border between Uganda and Kenya for adolescent births <16 years, as well as several areas of higher prevalence than traverse the borders between Tanzania and Kenya for all three age groups.

Spatial modelling
Predicted prevalence of maternal age at first birth is shown in Fig. 3 for less than 16 years (Fig. 3a), between 16 and 17 years (Fig. 3b), and 18 to 19 years (Fig. 3c). To emphasize within-country variation in prevalence across age groups, countries are presented in columns by age categories for Fig. 3a-c. As would be expected there are strong similarities between the maps produced by the three techniques (and the choropleth and INLA for Uganda based on the same administrative unit level are highly comparable). However, for Kenya and Tanzania the INLA technique provides more nuances and detailed estimates as compared to the previously mentioned techniques. While it again broadly complies with both the choropleths and the kernel density maps, it suggests in some cases marked differences in neighbouring districts, which come out less clearly using the other two methods e.g. it highlights high levels of first births under 16 years in Mbarali district, Tanzania. To reflect uncertainty in the mean estimates displayed in Fig. 3, we mapped standard deviations of the posterior distribution for each district, with corresponding 95 % confidence intervals listed in Appendix 3. These standard deviations reflect the range under which the presented estimates may fall, thereby providing an overall representation of variability within a given district which may assist policy makers in understanding the degree in which they can rely on the data for decision making. In general, the distribution of standard deviations approached normality with increasing age, most likely due to more frequent outcomes, or births (Fig. 6). Areas with highest associated standard deviations at less than 16 years of age included the north eastern region of Kenya and coastal areas of Tanzania. Such variation is likely a result of increasingly rare outcomes in more rural areas with already low sample sizes, and suggest future analyses examining adolescent motherhood might benefit from more focussed data collection efforts in rural areas. The area with highest standard deviation occurred in Mafia, an island off Tanzania's coast, for births between 18 and 19 years. Most notably, births at less than 16 years old and between 16 and 17 years were not observed for this region, while births between 18 and 19 were also low, resulting in high standard deviation and wide confidence intervals (SD: 0.20; 97.5 % CI: 0.07-0.81). Detailed posterior distribution parameters for each region are outlined in Tables 3  through 5 in Appendix 3.
The disaggregation by age group for the three countries makes it possible to note emerging age-related patterns. In Tanzania very few districts have significant numbers of births under 16 years, but this is not the case for Kenya and Uganda. However, for the 16/17 age groups there are a number of districts with very high proportions of first births in all countries including Tanzania, as well as marked heterogeneity between regions. If we consider the 18/19 age group, there is less heterogeneity as most districts have high rates of first birth, although Kenya still has a number of regions with relatively low proportions. Most  districts or regions show the pattern that would be expected where the proportion of first births increases with age, but there are exceptions: for example Mandera and Wajir actually have higher percentages of first births at <16 years, and these decline for 16/17 and 18/19 years (presumably because the majority or women have already given birth before these later age groups).

Discussion
The findings show marked geographical heterogeneity for adolescent first births, particularly in Kenya and Uganda. The distribution in Tanzania, however, is more homogenous, at least for the <16 and 18/19 year group. These differences are most marked for the <16 age groups. While the INLA estimates at district level reflect broader patterns shown in the regional level choropleths, the more detailed maps are in some cases able to demonstrate "hot spots", with marked heterogeneity across neighbouring districts.
A proportion of this heterogeneity is likely to reflect differences in underlying socio-economic determinants of adolescent fertility such as poverty and education. Previous work in Uganda and Kenya using small area estimation techniques has clearly demonstrated heterogeneity at the district level for various economic status indicators [25,26], and indeed there is marked correlation of "hot spots" of poverty with our own estimates of high prevalence of early first births. Probably the most marked area of high prevalence are found in Kenya in the Mandera and Wajir region. While the standard errors are relatively large due to these regions being sparsely populated, the findings are plausible as they generally have very poor socio-economic indicators: they are ranked the second and third poorest districts in the country [27]. In addition this area is most populated by nomadic pastoralists, including many from the Somali ethnic group (some of whom have arrived as refugees from the conflict in Somalia). These populations have strong traditions of early marriage, as well as low levels of autonomy for women [28,29]. In Uganda, the eastern districts with high levels of first births under 16 also have quite high levels of poverty, as well as having been affected by conflict, with a number of regions still experiencing high levels of displaced populations or food insecurity. Some findings are more difficult to explain. While moderate uncertainty parameters suggest the estimates should be interpreted with caution, some districts with very high levels of poverty in northern Uganda actually have relatively low levels of first births <16 years, which suggest cultural differences. Conversely, Mbarali district in Tanzania has a relatively high level of first birth <16 years compared to neighbouring districts, yet it is relatively wealthy. However, it does have a prevalence of HIV infection higher than the national average [30] which may suggest particular norms in sexual behaviour, and in addition to its geographical position on the Dar-Es Salaam -Mbeya corridor these findings may warrant further investigation. The area also has a large number of Maasai migrants who have a strong culture of early marriage, so this may also partially explain the findings [28,29]. The apparent high rate of first births to women under the age of 16 years in Dodoma Urban also warrants further analysis. The lesser degree of geographical heterogeneity in Tanzania is difficult to conclusively explain, but may partly reflect the lower level of socio-economic inequity compared with Kenya and Uganda as measured by the Gini coefficient of inequality and the percentage inequality in income [31]. A further reason could be explained by differences in ethnic composition: the Tanzanian population is composed of a large number of smaller ethnic groups, which may mean diversity between ethnic groups is less clearly visible within a geographical context (or indeed there may be less ethnic diversity among the groups in terms of adolescent pregnancy).
Differences could at least partly reflect differing access to contraception: DHS reports show wide geographic variations in contraceptive prevalence in all three countries [13][14][15]. However, contraceptive uptake to prevent first births in nulliparous women is extremely low in all three countries (5 % in Tanzania and Uganda, and 14 % in Kenya [13][14][15]), so this is probably not a major factor.
The high levels of first births in young women under the age of 16 years in some parts of Kenya and Uganda is particularly concerning: there is evidence that the health disadvantages faced by both adolescent mothers and their infants are concentrated among younger adolescents, so should be of particular concern to policy makers [32][33][34]. The disaggregation by age groups allows us to ascertain age-related patterns which are often lost in studies that use a single indicator for adolescent births. Several areas such as Mandera and Wajir require further investigation and possible interventions, as do other districts in Uganda and Tanzania where rates appear high. High rates of first births at an early age suggest areas where appropriate services and information must be made available at a young age before sexual activity commences, which may require a markedly different approach to those targetted at older adolescents to allow for different levels of cognitive and emotional development. In addition further investigation is needed to understand the contexts of these pregnancies (e.g. within or outside marriage) to enable a comprehensive approach to addressing the issue [35]. In many contexts this will ensure developing and enforcing legal frameworks to establish age at marriage and protect girls from abuse and exploitation.

Using mapping and Geographic Information System (GIS) techniques to inform policy and planning
This work provides examples of how mapping and spatial analyses using already-existing data can inform policymakers about locations where the prevalence of adolescent pregnancies is high. In recent years there has been a marked increase in the number of studies drawing on geospatial techniques to either map health indicators or examine geographical access to services (e.g. [7][8][9]). The growing availability of georeferenced information available through large scale surveys such as the DHS provide further opportunities to use these methods in low and middle income countries to guide policy and practice. Using a variety of methods, as in this study, enables findings to be triangulated to confirm areas of potential concern. Such methods may need to be supported by more detailed analysis of local level data from either existing sources such as vital registration in areas where this data is available for the majority of the population, or health records where nearly all births occur within the health system. Alternatively, it may be necessary to gather focussed and specific data collection methods which can provide more nuanced information and assist in the development of strategies to respond to need.
When we specifically look at how geospatial data has been integrated within policy and planning for adolescent pregnancy prevention, the UK Teenage Pregnancy Reduction strategy developed in 1999 provides an interesting example [36][37][38]. This included the collation and dissemination of ward-level data on teen conceptions in order to identify "hotspots" (defined as more than 6 % of 15-19 year olds becoming pregnant). These high prevalence neighbourhoods could then be targeted in terms of resources and interventions.
This paper has focussed on thematic mapping to identify areas of high adolescent prevalence. However further opportunities exist for using mapping and other geospatial modelling techniques to examine associations with other variables, or attempt to explain variance. At a simple level it is possible to layer different variables onto choropleth maps to show how different attributes may be associated to provide a clear visual representation: an example is shown in Fig. 4 which demonstrates the association between lack of education and births before age 16 years by region in Kenya. However, it must be noted that this is not always feasible at a small area level: the regional administrative level used for the map in Fig. 4 may restrict its value to policy makers. Alternative methodologies have been used to investigate how relationships between adolescent motherhood and underlying determinants vary spatially [4,39] and this offers opportunities for further analysis in low and middle income countries.

Limitations and advantages of geospatial techniques
The mapping techniques demonstrated in this paper have respective advantages and limitations for policy makers. The initial choropleths presented in this study based on prevalence are easy to carry out and present a visual representation of direct estimates that is easy to interpret. The technique does not need georeferenced data and can be created using free software. However, they cannot be used for small areas based on most national survey data sources as sample sizes will be too small and confidence intervals too great, which may limit their value to policy makers: this however may not be the case for data sources that do not rely on sampling (e.g. vital registration or census data). Kernel density estimates show "heat maps" of high prevalence areas that can be identified independently of administrative boundaries, and this can both be a positive or a negative attribute: when examining the relationship between adolescent motherhood and factors not affected by boundaries such as ethnicity it may be an advantage, but may be a disadvantage for policy makers keen to understand levels specifically within their own districts or regions.
INLA can provide small area information which can be tailored to match the relevant administrative unit for health, thus making it particularly valuable for policy makers and theoretically at least can be used on very small areas, making it less likely that smaller hotspots are overlooked. It must however, be remembered that this is modelled data rather than direct estimates, and attention should be paid to estimates of uncertainty when interpreting the results. The uncertainty estimates for this study vary, but in some districts it suggests that results should be interpreted with caution, particularly where the estimates are wide, and supports the need for triangulation of data from other sources to guide programmatic decisions. While freely available via open source software, it requires fairly specialized knowledge or staff to implement in a programmatic setting. The use of a number of different techniques as included in this study offers an opportunity to triangulate findings and present more robust evidence.
There are also possible limitations associated with the use of DHS data, which relies on retrospective reporting of birth histories to identify adolescent births. This may be prone to either intentional or unintentional recall bias around age at first birth, and in particular there is some evidence that very young adolescent births may be under-reported: a previous study suggests this is most likely when using a sample of [15][16][17][18][19] year olds [40], so our use of a sample of 20-29 year old women should minimise this. Further potential bias may be introduced as the survey will record the birth at the place where the mother was residing at the time of the survey, not where she was at the time of the birth.

Conclusion
Our studies demonstrate marked geographical heterogeneity in adolescent first births, particularly in Uganda and Kenya. These inequities are particularly marked for births under the age of 16 years, which is the group most likely to experience adverse outcomes from pregnancy for themselves and their infants The use of these three geospatial techniques enable these differences to be examined at regional and, in the case of Kenya and Tanzania, district level, as well as being able to display prevalence without the constraints of administrative boundaries. The use of several different methods allow results to be triangulated and enables greater confidence in the results. Such findings can provide policy-makers with the information needed to target areas of high prevalence and focus scarce resources where they are most needed. Geospatial methods have already proved valuable in guiding policy in developed countries and the proliferation of georeferenced data through surveys in low income countries offers greater opportunities to understand and address geographic inequities. Because DHS data uniquely employ clusters comprised of both cases and controls, spatial independence cannot be assumed when using these data. To address this limitation, the prevR package was developed by Larmarange et al. [22] specifically for use with DHS data to create smoothed prevalence surfaces using adaptive bandwidth kernel estimator techniques, with radii width varying to encompass an equal number of persons surveyed. This technique is advantageous over other fixed and adaptive bandwidth kernel estimator approaches specifically in regards to DHS data, where cases and controls for a given system cannot be assumed to be spatially independent of each other, and often are in fact geographically located within the same DHS cluster.
To create a unified prevalence surface of the East Africa region, DHS data from all three countries were pooled prior to analysis. Of note, prevalence scales vary between groups as indicated by the associated legend, emphasizing regional variation among age groups. Cluster weights were re-scaled to sum to 1 within country, then multiplied by the 2010 population of females 20 to 29, ensuring comparability across countries (UN World Population Prospects, 2012). This was done using the following equation for each of the three analysis countries: The optimal number of persons surveyed, or N parameter, was calculated using the following formula outlined in Larmarange et al. [22]: where n equals number of persons surveyed within the sample, p denotes sample prevalence, and c specifies number of sample clusters. Because kernel estimator approaches are highly contingent upon bandwidth size, this equation is beneficial in calculating the optimal bandwidth size to produce a surface with sufficiently high resolution smoothness, while also compensating for localized variations. For these analyses, n = 9050 total persons surveyed, with c = 1252 clusters.

INLA
A model incorporating both spatial and random effects was used because it produced the lowest deviance information criteria (DIC) as compared to models incorporating a spatial effect or random effect alone. We assumed an uninformative prior distribution, and defined the outcome of interest as following a Bernoulli distribution, with prevalence of adolescent first birth, p, ranging between 0 and 1. Covariates of the model included proportion of the sample with rural residences, proportion with no education, and proportion classified as being in the bottom two wealth quintiles.