A grid-based approach for refining population data in rural areas

.


INTRODUCTION
It is of great importance for many spatial decisionmaking processes to have data that is both accurate and up-to-date.Population data with an appropriate format and resolution is required for a variety of applications such as spatial planning processes, disaster and emergency management, and risk and vulnerability assessment (Aubrecht et al., 2010a;Hall et al., 2008;Schneiderbauer, 2007;Sweitzer and Langaas, 1995;Tatem and Linard, 2011).In many countries, particularly the developing countries, population censuses are carried out every ten years and the population data from the census is made available to the public in aggregated form as statistical yearbooks, usually divided into administrative areas or political units.However, as pointed out by Deichmann et al. (2001), cross-disciplinary studies require datasets that are referenced *Corresponding author E-mail: dulalroy@yahoo.comAuthor(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License to a uniform coordinate system rather than to irregular administrative units, since the integration of such vectorbased population data into spatial modelling is problematic.The availability of grid-based (raster) population data is therefore vital for the important tasks of spatial integration and analysis, for ease of computation, and for spatial modelling, for example (Schneiderbauer, 2007;Aubrecht et al., 2013).High-resolution gridded data is also particularly important for spatial vulnerability and risk assessment, especially at a local or community level (Kienberger, 2012;Rafiq and Blaschke, 2012;Roy and Blaschke, in press).Schneiderbauer (2007) highlighted that the lack of recent population data at a high spatial resolution hampers crisis management activities.Highresolution contemporary data on human population distributions are a prerequisite for accurate measurement of the impacts of population growth, for monitoring changes, and for planning infrastructure (Tatem et al., 2007;Linard et al., 2012;Gaughan et al., 2013).
In recent years various scientific communities around the world have undertaken a number of initiatives aiming to develop techniques and methodologies for transforming population vector data (based on census counts) into gridded (raster) data.The first global population density estimate in raster format was developed in response to requests from international agriculture research institutes (Deichmann, 1996).The center for international earth science information network (CIESIN) of Columbia university developed the "gridded population of the world (GPW)", a large-scale data product that demonstrates the spatial distribution of population across the globe at a resolution of 2.5 arc-minutes, that is, 5 km (CIESIN, 2005).It was first developed at the national center for geographic information analysis (NCGIA) in 1995 (Tobler et al., 1997).The GPW database uses two basic inputs: non-spatial population estimates (that is, tables of population counts listed by area names) and spatially explicit administrative boundary data.A proportional allocation gridding algorithm (areal weighting scheme) utilizing more than 300,000 national and sub-national administrative units, is used to assign population values to grid cells.
Other initiatives have included the global rural-urban mapping project (GRUMP) population dataset developed by the CIESIN, which is available at a resolution of 30 arc-seconds (CIESIN, 2005).The data products include population count grids (raw counts), population density grids (per square km), land area grids (actual area, net of ice and water), mean geographic unit area grids, urban extent grids, centroids, a national identifier grid, national boundaries, coastlines and settlement points.The algorithm employed is based on the GPW approach described above and uses approximately one million national and sub-national geographic units.
In an another initiative, the LandScan datasets developed by the oak ridge national laboratory (ORNL) provide population density grids at a global level, with approximately 1 km resolution, through an interpolation method (LandScan, 2010).In this case the allocation of population is based on likelihood coefficients for slope categories, distances from major roads, populated places, night time lights, and land cover (Mirella et al., 2005).
In a number of industrialized countries, gridded population data suitable for use in a variety of applications is available from the relevant statistical agencies (Aubrecht et al., 2010b).Kienberger et al. (2009) used gridded population data for spatial modelling of socio-economic vulnerability in the Salzach River catchment area, Austria.Gallego et al. (2011) described methods to produce a dasymetric population density grid of the European Union at a 100 m resolution.Their main ancillary information source was the CORINE land cover database distributed by the European environment agency, and they also integrated information from the Eurostat point survey (land use and land cover frame survey) into the parameter estimation of some of the approaches tested.Hall et al. (2012) compare gridded population data products for parts of Sweden with high-resolution population records obtained from the Swedish national registry through their regional office in Scania, Sweden.They concluded that further research was required into the quality of gridded population data, through comparisons with reference data such as high-resolution population data.
In a recent study, Scholz et al. (2013) disaggregated 1 km population density grids created by the European forum for geostatistics to target resolutions of 100 and 500 m.The resulting population grids were evaluated with respect to both reference population datasets and a random population dataset.It is interesting to note that the results from Scholz et al. (2013) indicate that the disaggregated population grid with 500 m resolution is more accurate than that with 100 m resolution and has a lower correlation with the random population grid.This study therefore indicates that the highest spatial resolution is not necessary the most accurate.Balk et al. (2006) described the basic methods for constructing estimates of global population distribution with attention to recent advances in improving both spatial and temporal resolution.To evaluate the optimal resolution for the study of disease, they discussed the native resolution of the data inputs as well as that of the resulting outputs.Elvidge et al. (2009) produced a global poverty map at 30 arc-sec resolution (approximately 1 km) using a poverty index calculated by dividing population count by the brightness of satellite observed lighting.In July, 2011 the Asiapop project was initiated with the aim of producing population distribution maps for the whole of Asia (Asiapop, 2013).This dataset was not available in time for our study, but the Asiapop project (together with other attempts to provide continuous raster datasets) reflects the high level of demand for such data.
From studying the literature we observe that methodologies for the development of grid-based population data are not readily available despite being urgently required in developing countries for crucial purposes such as emergency and crisis management, risk assessment, etc.Many developing countries such as Bangladesh have very high population densities.Apart from the urban areas, population densities are also high in the coastal and rural areas of these countries (Rabbani, 2009).These coastal and rural populations are often at great risk to the adverse effects of climate change and frequent natural disasters such as floods, cyclones, etc. (Mondal and Tatem, 2012).Grid-based population datasets with higher resolutions are therefore urgently needed for risk and vulnerability assessment in these areas (Kienberger, 2012;Rafiq and Blaschke, 2012;Roy and Blaschke, in press).
In this paper we hypothesize that census-based rural population data can be transformed into gridded data using various datasets and techniques at a relatively high resolution.At first, we outline the methodology for transforming census-based rural population data into gridded data at a relatively high resolution (100 m).Rural settlement data, population census data, and other geospatial datasets were collected from the relevant local authorities in Bangladesh; the methodologies and GIS techniques used for the transformation of population census data into gridded population data are described below.Finally, the resulting gridded population data is compared with the available LandScan global datasets, and validation techniques are used to evaluate the results of our transformation.
The approach developed in this study has a number of distinctive features.Firstly, it presents a grid-based methodology for developing gridded population data especially in the context of developing countries.This approach surmounts the problems of data availability especially in developing countries as higher resolution gridded population data is not readily available despite being urgently required for various crucial purposes.As described earlier, the spatial resolution for most of the existing gridded global population datasets is relatively coarse.As a result, they can be used at a continental level and sometimes at a national level, but are not suitable for use at local or community levels; such use would require population density datasets with higher resolutions.Therefore, the current approach focuses on developing gridded population data at higher resolution at the local or community level.

Study area
Bangladesh is bordered by India to the west, north, and northeast, by Myanmar to the southeast, and by the Bay of Bengal to the south.It lies between 20°34' and 26°38'N, and between 88°01' and 92°41' E. Bangladesh is located in the delta formed by three major rivers, the Ganges, the Brahmaputra, and the Meghna (the GBM), which is one of the largest deltas in the world.The combined basins of the GMB, together with their tributaries and distributaries, cover approximately 1.7 million km 2 in Bangladesh, Bhutan, India, Nepal and Tibetan China; only 7.5% of the combined catchment areas lie within Bangladesh.The country is mostly flat except for some areas in the northeast and southeast, with about 50% of the land lying less than 7 meters above the mean sea level (MoDMR, 2008).
The country has an area of 147,570 km 2 , divided into 7 administrative divisions and 64 districts.The selected study area is the Dacope upazila (sub-district) of the Khulna District, which is located in the south-western coastal region of Bangladesh (Figure 1).It lies between 22°24' and 22°40' N, and between 89°24' and 89°35' E. The upazila occupies a total area of 991.57km 2 , comprising 706 km 2 (71%) in the Sundarbans reserve forest and 285.57km 2 (29%) in non-forest areas (BBS, 2001).The 2001 national population census, which was the latest source of population information available during this study, indicated a total population for the Dacope upazila of 157,489.Using an annual growth rate of 1.4% the Upazila's population (excluding forest areas) is projected to have increased to 176,054 in 2010.This gives a population density for the total area of the Upazila in 2010 (including the forest areas) of 183 per km 2 .If the forest areas are excluded, the population density for the study area increases to 616 per km 2 .
The Upazila is divided into 10 unions (the lowest administrative units in the Bangladesh government system) and 26 mauzas (spatial units with one or more settlements).The southern border of the Upazila lies in protected forests that extend to the bay of bengal coastline.The study area is frequently damaged by floods and erosion due to a high density of rivers and canals: road infrastructure in the upazila is therefore not in good condition.According to the 2001 population census, the predominant form of housing in the Upazila is the 'kutcha' (89.81%), which is characterized by housing materials such as mud, thatch and bamboo (BBS, 2001).These 'kutcha' structures are very susceptible to natural hazards such as floods, cyclones, storm surges, etc.

Data collection
Up to date, accessible and reliable dataset are essential for transformation of census population data into gridded data.The collection of geospatial and ancillary data is a key step in the present study.Different datasets were collected for transformation into gridded data, including population census data, transportation and infrastructure datasets and data from satellite imagery.Population census data and other socio-economic data were obtained from the Bangladesh bureau of statistics (BBS).A census is conducted approximately every ten years in Bangladesh and, as stated previously, and the 2001 national population census was the latest source of information available at the time of our data Additional satellite images such as Landsat 7 ETM+, ASTER (advanced spaceborne thermal emission and reflection radiometer), and IRS (Indian remote sensing) images, were also acquired.The LandScan gridded population dataset was also obtained from the oak ridge national laboratory website (http://www.ornl.gov/sci/landscan).The contents of all the datasets collected in this study were checked for plausibility and topicality.Some of the GIS data layers such as settlements, roads, embankments, etc. were not considered to be sufficiently up-to-date or accurate for inclusion as changes that had occurred over time to these physical features within the study area were not reflected in the datasets.
In such instances, the relevant datasets were updated and modified with the help of available satellite imagery for the study area.These included Landsat 7 ETM+, ASTER, and IRS images from April, 2009, November, 2008and February, 2007 respectively.Figure 2 shows the census-based population density for the different mauzas within the study area (a), and a snapshot of the updated settlement data using the ASTER (15 m) satellite image (b).

GIS-based interpolation
As mentioned previously, our objective was to derive grid-based population data suitable for use in a variety of applications.As discussed in section 1, the spatial resolution of most gridded global population datasets is relatively coarse (example, approximately 1 km for the LandScan dataset).The recently developed AsiaPop dataset provides gridded population data at 100 m resolution but was not available at the time of data acquisition for this study.We have developed a methodology to transform the census population data into gridded population data (100 m).For this purpose two assumptions were made when considering population density, settlement patterns, expert consultations, and other characteristics of the study area: firstly, that people only live within the outlines of rural settlements, and secondly, that they are evenly distributed within these areas.We consulted a number of experts regarding their valuable opinions and experiences regarding the assumptions made in this study and other relevant aspects of population data in Bangladesh.The details about the selection of experts are provided in section 4.2.The experts' consultations indicate that the population are uniformly distributed over the settlements as the study area is predominantly rural in characteristics.The methodology of transforming the census population data into gridded population data was divided into a number of consecutive steps.
In the preliminary stage of our work, rural settlement data was carefully checked against the census data for each mauza.It was observed that rural settlement data was not separated for a particular mauza.The settlement data was only available for the whole study area.Thus, the spatial distribution of rural settlements was checked against the boundaries of each mauza.The spatial distribution of rural settlements within the mauzas showed that the boundaries of some settlements overlapped with more than one mauza.Therefore, the settlements that overlapped mauza boundaries were divided between the relevant mauzas in order to distribute population over the settlements in a particular mauza.For that purpose, the settlements that overlapped mauza boundaries were differentiated by overlaying the 'mauza' polygons on the 'settlement' polygons (Figure 3).Afterwards, the areas of individual settlements and the total area of all settlements in a particular mauza were calculated as a basis for estimating settlement populations.The proportion of the population in an individual settlement that was allocated to a particular mauza was calculated using the following equation: (1) where For this calculation the total population of mauza i (Pi) was available from the population census data.The area of an individual settlement j within mauza i (Sij) and the sum of areas of all settlements within mauza i ( Sij) were calculated in ArcGIS.Now the population of an individual settlement in a particular mauza is available.Our ultimate aim is to obtain population figures for grids with a size of 100 m.With a view to achieving this target, a polygon vector grid layer with a grid size of 100 m was created.The vector grid layer and the settlement layer with the population counts were then overlaid, in order to identify the intersected settlements (Figure 4).Here, the intersected settlements were created after intersecting settlement polygons with vector grid layer.The areas of the individual intersected settlements were then calculated and their respective population numbers estimated from these areas.The proportion of the population living in the intersected settlements under the respective grid cells was calculated using the following equation: (2) Where T km = Population of intersected settlement m within grid cell k A km = Area of intersected settlement m within grid cell k P ij = Population of settlement j in Mauza i S ij = Area of settlement j in Mauza i Following this calculation, each intersected settlement within a 100x100 m cell was assigned the corresponding population.After estimating the settlement populations the intersected settlements, which have individual population counts, were converted into the 100x100 m raster grid.Each cell then contained the relevant population number.Figure 5 shows the conversion process from the intersected settlements into the population grid (100x100 m).

Population maps and comparison
A comparison was made between the census population data and the developed gridded data in order to check the accuracy of the results.Little difference can be observed between the census population data and the population calculated from our own gridded data.The total population of the study area for 2010, as projected from the 2001 census population, was 176,054 while the total population from all grid cells in the studied area was 171,883, a difference of 4,171 or about 2.37%.The main reason for this difference is the rounding up or down of the population figures at the end of the calculation process.After the transformation process in the study, some settlements received fractional population numbers In order to evaluate the quality of our derived gridded population dataset it was compared with the LandScan gridded dataset, was obtained from the oak ridge national laboratory (ORNL) website and has a resolution of approximately 1 km.This population data model uses sub-national level census counts for each country and primary geospatial and ancillary datasets (LandScan, 2010).For an effective comparison, the same spatial resolution (approximately 1x1 km grid size) was used for both of the gridded population datasets.For this purpose, our 100x100 m population grid was aggregated into a 1 km grid using spatial aggregation techniques in ArcGIS.Figure 6 shows a comparison between our own gridded population data and that from LandScan.
The visual comparison reveals a greater variation in population density within our data than in the LandScan data, which consequently appears smoother.This seems plausible for a worldwide dataset that is based on statistical correlations between population, land cover, distance to roads, etc. (see Section 1).Examination of the LandScan data reveals a total population estimate for the studied area of 262,487, which is 86,433 or about 49% higher than the figure projected from the 2001 census population.As mentioned previously, the total population in our gridded data is 171,883, which is approximately 2.37% (4,171) less than the census population, which appears to indicate that our results from the spatial disaggregation of the population are more realistic, and accurate to within less than 3%.

Validation
The accuracy of the gridded population data was checked by both quantitative and qualitative validations.Two qualitative validations were performed following the quantitative comparison described in the previous section.Firstly, our gridded population data was compared with the settlement population data calculated in section 3.2 using the 2001 census data.Figure 7 shows a rural settlement and population grids (100 m) converted from the settlement using GIS techniques.As expected due to the algorithm used, the total population from all grids within a settlement was equal to the total population of that particular settlement.Secondly, our gridded population data was evaluated by 10 experts having a great deal of experience in the field of disaster management and emergency responses especially in the coastal areas of Bangladesh.The disaster experts were selected from various organizations such as government, non-government, local disaster management committees, voluntary, academic, and research institutions.During the field study, a standardized questionnaire survey was filled out by the disaster experts.This questionnaire survey was conducted with the objectives of obtaining their valuable opinions and suggestions from their past experiences in this field.For this purpose, a structured questionnaire was developed.Firstly, they were briefed about the background, the objectives, and the major components of the study.Secondly, they were asked to provide any relevant population data and information on the study area and were also invited to share their valuable opinions and experiences regarding the availability and accuracy of population data, the requirement for gridded data, and other relevant aspects of population data in Bangladesh.Finally, they were asked to investigate and assess the results, and in particular to comment on the accuracy of the derived results and to raise any important observations.
Based on these validations and the opinions of the local experts, our methodology for gridding rural population data shows promising and realistic results.One particular observation made by the local experts was that the areas that are highly populated are also assessed as highly populated in our gridded data.This observation is important in the context of vulnerability since it means we can assume that the vast majority of population hot spots are well represented.

Implications of the increased accuracy
The increased accuracy of gridded population data will impact disaster management, vulnerability assessment, In most of the countries, population densities are very high.The coastal and rural populations are often at great risk to the adverse effects of climate change and frequent natural disasters.It can be very useful to estimate the number of people at risk due to floods and other natural disasters in a particular region.The availability of gridded population data at higher resolution can be very effective in monitoring extreme events, rescue and evacuation operations, rehabilitation programs, etc.In developing countries, the increased accuracy of gridded population data can facilitate poverty analysis, poverty management, disease control, health care management, and other critical operations.

CONCLUSIONS
We have developed a grid-based methodology for calculating rural population density at relatively high resolution, which the authors consider to be applicable at a local level.The methodology uses census population data, rural settlement data, and other geospatial datasets for making grid-based estimates of rural population density which we have applied to a sub-district in Bangladesh.
The resulting datasets are shown to be far more accurate than the available LandScan global datasets.This is not simply because of the higher spatial resolution in our gridded dataset, since our results remain much closer to the population figures from the census data when the same spatial resolution is used for both datasets.We believe that the approach that we have presented herein provides a useful method for researchers or institutions to carry out geographical analyses linking population with land use and geographical features.Demand for this type of analysis is likely to increase in the future.Spatial databases of human population are particularly important in disease burden estimation and epidemic modelling, as well as in resource allocation, disaster management, accessibility modelling, transport and city planning, poverty mapping, and environmental impact assessment, etc. (AsiaPop, 2013;Linard et al., 2010).High-income countries often have extensive mapping resources and expertise at their disposal with which to create such databases.There is, however, either a complete lack of appropriate gridded population data for low-income regions of the world, or the data that is available is of poor quality.The scarcity of mapping resources, lack of reliable validation data, and difficulties in obtaining high-resolution contemporary census statistics, remain major obstacles to settlement and population mapping across the low income regions of the world.
Our methodology demonstrates several distinct advantages over other existing methodologies for calculating gridded population densities, yielding reasonably accurate results in terms of population counts and providing relatively high-resolution (100 m) gridded population data.However, the overall performance of our approach depends largely on the quality of the input data, in particular the rural settlement data and census population data.This data therefore needs to be accurate and up-to-date in order to develop qualitative and reliable gridded population data.
We anticipate that the methodology we have presented herein will be further improved in the future and can also be adapted to other appropriate contexts.More accurate and up-to-date rural settlement data will be required in order to develop high-accuracy population datasets in gridded (raster) format.The authors, feel that a number of potential areas exist for future research with regard to the development of high-resolution gridded population data.Firstly, further research is required into the development of gridded population datasets with higher resolutions, especially in the context of developing countries.Secondly, gridded datasets for other socio-economic indicators such as poverty, employment, health, water, and sanitation may be developed for effective application in, for example, spatial risk and vulnerability assessment, emergency management, e.t.c.Thirdly, the development and application of proper validation methodologies is important for effective evaluation of the resulting gridded population datasets.The authors stress that further validations will be required in order to improve existing methodologies.

Figure 1 .
Figure 1.Location of the study area in the context of Bangladesh and its coastal areas

Figure 2 .Figure 3 .
Figure 2. Census-based population density for the different mauzas within the study area (a), and settlement data updated from ASTER (15 m) satellite image (b)

Figure 4 .Figure 5 .
Figure 4. Intersected settlements created after intersection of settlement layer with 100 m vector grid data layer

Figure 6 .
Figure 6.Visual comparison between our own gridded population data (a) and that from LandScan (b)

Figure 7 .
Figure 7.A rural settlement and population grids (100 m) converted from the settlement