Monitoring the population change and urban growth of four major Pakistan cities through spatial analysis of open source data

ABSTRACT Cities are complex and dynamic entities in close proximity of people, implying multi temporal observations to analyse and understand the urban context. At present, open-source data and geospatial intelligence are becoming the important means of exploring, monitoring and predicting urban status of area growth and population increase. In last few decades, unemployment and absence of infrastructures in the rural areas promoted the unplanned and haphazard urbanization across the urban centres in Pakistan. This study focuses on exploring the potential of open-source/freely available datasets for city mapping and monitoring spatially. The study gives a spatial perspective of rapidly growing cities of Pakistan using Google Earth Engine to classify Landsat images over last four decades, and discovers sprawl patterns across cities. The study works out that the built-up area is significantly increasing with population growth over four decades and there is a strong positive correlation between population growth and built-up expansion. Using Open-Source Data (Landsat images and LandScan data), this study has offered a technical solution of Google Earth Engine-supported analysis of statistics and machine learning to spatially monitoring the population change and urban growth of four major Pakistan cities. It is undoubted that our working results will provide the timely and cost-effective information to policymakers, Govt Officials and citizens for more sustainable urbanization.


Introduction
Urbanization is a continuous and dynamic progression remaining a number of people in relatively small areas. This is the process derived through population growth, while population is increasing dramatically from several decades mainly due to migration, urbanization and rushing change in fertility rates raising the need of Urban planning (2021). Urban sprawl creates lots of socioeconomic and environmental related challenges both positive and negative (Omwenga 2010). Recently, 50% of the world's population are living in urban areas. This number is expected to be more than a billion by 2025 (Foley et al. 2005). Accelerated, poorly managed urbanization leads to inefficient resources and extensive scale agricultural lands deterioration in addition to atmospheric, land and water pollution (Hove, Ngwerume, and Muchemwa 2013). Such an increase is a major challenge to low-and middle-income countries in managing population growth and urban sprawl (Pauleit et al. 2010).
Frequently unplanned urbanization and fast-growing population struggle to maintain its multisource spatial data, and lack to monitor and understand the city morphology particularly in low-and middle-income countries. On the other hand, spatial information on population growth and urban sprawl has become indispensable for planning and management, and RS & GIS play important role . Conventional methods, such as population census statistics, to quantify urban sprawl are time-consuming. Cost effect cannot be efficacious tools for measuring the urban scenarios. Thus, there is an urgent task to analyse population growth and urbanization spatially, so as to develop integrating plans and mitigating measures.
In this study, supervised pixel-based categorization of remotely sensed images was carried out using Smile (CART) algorithms. The best input image for classification was chosen using a procedure that included band composition, statistical operators, seasonal composition strategies and derived class information (Praticò et al. 2021). The study's primary goal is to identify decisions made about land use, land cover and land management so that it can assess growth policies and their repercussions in Istanbul (Dereli et al. 2017).
To develop sustainable cities, understanding based on the in-depth analysis, freely available open-source datasets, imagery, big data analytics with geospatial, and remote-sensing technology offers a huge opportunity to improve the city structure in terms of area expansion and urban growth analysis by helping the government officials and urban planners. Remote sensing is one recognized approach to better characterize and identify major classes of land use and land cover (LULC) and Google Earth Engine (GEE) is used as cloud computation platform (Hashem and Balakrishnan 2015;Lu et al. 2020). Whereas GIS is a smart tool to integrate and analyse spatial information temporally with its applications to planning and managing urban growth patterns . Rapid population increase, economic development and infrastructure development activities have clearly resulted in the expansion of cities. For sustainable cities, it is essential to monitor and foresee urban growth in terms of settlement planning, transit, landscaping, etc. Currently, geographic information system (GIS) techniques and remote-sensing data, particularly satellite data, allow us to obtain the required data with high spatial, spectral and temporal resolution. The purpose of this study is to identify the land use and land cover (LULC) with the help of freely available data sources (Dereli 2018).
By exploring the potential of freely available datasets to understand the spatially heterogeneous landscape dynamics of city, substantially this study is aimed to analyse the urbanization and population growth spatially in four major cities in Pakistan. Landuse distribution and spatial growth pattern with population density dynamically in four cities are explored using different sensor data and high resolution imageries. Certainly, this study is valuable for conducting cost-effective planning in middle-income and rapidly urbanizing cities (Mahboob, Atif, and Iqbal 2017).

Study area
To explore the potential of open-source datasets and tools (refer Figure 1), we selected four major provincial capital cities of Pakistan, which are evenly distributed over the whole Pakistan. These cities are rapidly growing urbanizing cities of Pakistan, (Karachi, Lahore, Peshawar and Quetta) . These four cities are located in different locations in Pakistan but basic urban growth factors are common. We constructed the mathematical model of urban growth from 1990 to 2020 using publicly available data and Landsat images, and visually explored city expansion using Google Earth Engine.

Materials and methods
Driven by various human activities and economic factors, urbanization became a trend in province capitals. These cities, being province capitals, also ranked largest cities in their own provinces, and faces urbanization and various city problems. The past studies related to urban sprawl are theoretical widely in published papers. In this study, GIS and remote-sensing techniques were comprehensively used for change detection and directional aspects of urban growth for a duration of four decades, years starting from 1990 to 2020. It is noteworthy that cloud computation in the Google Earth Engine is carried out to map the most recent and previous urban sprawl extent for the Karachi, Lahore, Peshawar and Quetta Cities (1990Cities ( , 2000Cities ( , 2010Cities ( and 2020. Data are classified into land cover types with the GEE cloud computation (Lu et al. 2020;Xian, Crane, and McMahon 2008).

Datasets and sources
In this study, Landsat 5 TM (Thematic Mapper) and Landsat 8 OLI (Operational Land Imager) satellite imagery of 30-metre spatial resolution and population density dataset of LandScan have been used. The satellite imagery has been acquired form USGS (United States Geological Survey) for land cover assessment. We use the datasets and sources as shown in Table 1.

Conversion to top of atmosphere (TOA) radiance
We conducted images supervised classification of Landsat I MSS sensor, Landsat V TM sensor, Landsat VII ETM sensors with assistance of Landsat VII and Landsat VIII (Operational Land Imager) sensors real-time raw scenes data. The Landsat satellites are running in polar orbits, and scanning the experimental cities with a repetition overpass of every 16 days. However, for pixel value, digital number (DN) exists originally. These values are measured with the sensor and are integral results of energy transmission along the path of observed objects to satellite sensors. Digital number (DN) are essentially related to the surface reflectance values. Landsat data were converted into Top of Atmosphere (TOA) radiance by rescaling raster image. Conversion to TOA radiance was performed as follows ): Lλ is representing TOA radiance, ML is known as multiplicative rescaling factor, AL as additive rescaling factor, Qcal is known as the pixel value or digital number (DN).

Conversion to land surface reflectance
With the support of numerical weather forecast data, such NCEP grid data, atmospheric correction is done. As a result, land surface radiance and further land surface reflectance are obtained. Land surface reflectance is considered as the spectral property of land surface, which is used together with other properties to make LULC classification.

Extraction of urban extent and accuracy assessment
Classification of temporal Landsat imagery was performed on the set of pixels. GEE offers several classification algorithms, here Smile CART algorithm was used for supervised classification and further calculating urban extent of these four cities. The temporal images of Landsat TM 5 and Landsat 8 OLI were acquired to compute the urban extent after classification on the google earth engine platform. Training samples, test samples and validation samples were collected independently. For the Smile CART classifier, we collected distinct training sample for each class each year. From 1990 to 2020 each year, 30 sample points have been collected for each class in order to determine the accurate change in LULC class (Tariq et al. 2023). Comparing classification results to another reference dataset (or higher-resolution imagery) is an essential component of accuracy assessment. In this study, the accuracy of the classification results was evaluated as compared to ground truth sample points . All points were selected at random from categorized imagery and are validated using GEE in 1990GEE in , 2000GEE in , 2010GEE in and 2020. In particular, we employed 70% of sample points for classification testing and the remaining 30% for classification validation. We specify the system of categories including: 1) Built up, 2) Water body, 3) Barren land and 4) Vegetation (refer Figure 2) (Pijanowski et al. 2005). Through image classification, classification results of four different years are obtained. The results of classified LULC and calculated spatial extents are summarized in Figure 3 and Table 3 (Inostroza et al. 2019). In addition to identifying changes taken place, the monitoring analysis of LULC is carried out for discovering spatial pattern, areal extent and characteristics. Table 2 displays the results of the four cities' 1990-2020 LULC classifications. The accuracy evaluation is given by Kappa coefficient of 0.91 and overall accuracy of 0.93 (Tariq, Yan, and Mumtaz 2022;Kafi, Shafri, and Shariff 2014).
Supervised classification of maximum likelihood algorithm was performed for further LULC mapping. Landsat    Image and low-level data products, Subset of Area of Interest, and Sampling are prepared. Then, the classification of nearest neighbour algorithm produces a number of land cover classes, such as built-up, vegetation, barren land and water body. In this way, the four-phase images have been classified with reference to the same set of category for urban sprawl estimation (Firozjaei et al. 2018). The results of classification showed immense increase in the built-up area of these major cities (Guan et al. 2011). The spatial distribution of LULC classes is shown in Table 3 and the growth change calculation is shown in Table 4. The results of Table 4 show highest increase in area and change rate percentage during 2000-2019. Whereas some changes have been seen during 2000-2010.
The quality of GEE analysis results has been evaluated with classification accuracy. Accuracy assessment is the important part of any data classification, in which it compared with another reference dataset or higher resolution imagery (Mondal, Das, and Dolui 2015;Maithani

Change detection
Change detection can be determined on the basis of: (i) Spectral classification of satellite images, (ii) Vectors of radiometric changes and (iii) The band-specific spectral differences with algebraic calculations.
Amongst the widely used set of techniques, the hybrid approach was used for the evaluation of change detection (Shawul and Chakma 2019). However, the selection of the technique depends upon many factors, like data availability, data quality and the constraints of time and cost. Through investigation, here we used the radiometric change vector-conversion approach to mapping the built-up extent over four decades (refer Figure 3) (Mahar et al. 2020).

Results and discussions
Abnormal growth has been identified due to rapid population growth. Population growth act as a catalyst in escalating the urban sprawl. Urban sprawl is complex to precisely calculate and define (Bihamta et al. 2015). The urban areas as well as the fringes also effected by the sprawl. For the estimation of urban sprawl, many quantitative studies have been carried out. Integratively using GIS and RS techniques, the temporal analysis and change detection in data domain, together with sprawl analysis and growth analysis in application domain, has been carried out for high-precision results (Dadashpoor and Nateghi 2017). The LULC results over the years of 1990, 2000, 2010 and 2020 show that the study area has undergone various changes (Tariq and Shu 2020;Kumar, Radhakrishnan, and Mathew 2014).
Moreover, the built-up area is increasing over the time of 30 years. During 1990-2020, the built-up area of Karachi is increased 491 Sq. Km, Lahore increased 156 Sq. Km, Peshawar Increased 160 Sq. Km and Quetta

Sprawl analysis
Urban planning and urban decision making at the right time is very necessary for living quality, infrastructures, and better services. The hasty increase in the urban sprawl affects agricultural lands and living standards, and leads to the lack of services in the area (Ahmed, Lu, and Ye 2008;SHAIKH and GOTOH 2006).
The Sprawl analysis of Karachi shows that the built-up is spreading in north and east direction rapidly over agricultural lands for the past few decades. In Lahore, the built-up is increasing in south and east-south direction. In Peshawar, it is increasing in east and west direction, and in Quetta, it is increasing in the north and west-north direction. As the time passes, the sprawl was developing along the road sides linearly (D'Acci 2019). Most of the linear patterns of development take place during 2000-2020 (refer Figure 4). The sprawl in Lahore is in south direction, which was observed with limited main roads and highways (Schetke et al. 2016). Apparently, spatial planning and timely decision making are needed to control the haphazard growth. Predictably, if the same tendency in urban sprawl continues, these cities will expand abnormally and rapidly than before (Rimal et al. 2018).

Urban sprawl vs population growth
For population density, satellite image-generated LandScan population dataset has been used. The population density over the years of 2000 to 2019 has been calculated through LandScan data (Xian, Crane, and McMahon 2008; Rose et al. 2020). Also Census details of 1990 and 2017 have been integrated with the LandScan dataset for precise calculations and verification of population density. It is worth mentioning that the LandScan estimation of Population and the Census-based data of these four cities is providing a best match (96.4% accuracy) of population with the Population Census of Pakistan (Jokar Arsanjani et al. 2013;Batty 2005). The results indicate that, during the period of 2000-2019, the population is escalating in Karachi at the rate of 4.8% per annum, in Lahore at the rate of 3.3% per annum, in Peshawar at the rate of 3.5% per annum and in Quetta at the rate of 5.6% per annum according to the LandScan data. The population density mapping over the years of 2000 and 2019 is shown in Figure 5 also represented in Table 5. The census record is helpful for calculating the 1990 growth rate and population density (Sharma and Joshi 2013). (Bununu 2017).  Our working methodology was based on change detection, spatial pattern and continuity analysis, and statistical analysis at the grid/cell level. Our findings show that LandScan dataset can be used to accurately estimate the Pakistan population (Calka and Bielecka 2019). According to recent studies, LandScan performs the best in terms of spatial accuracy and estimated errors (Yin et al. 2021). These results determined that the population is increased highly during 2000-2019 in all four cities (refer Table 6). The concentrated areas of population are those areas where built-up vales are noticed higher and changed. From the above analysis, the existing growth pattern is recognized as linear pattern due to high values change in population density and increased in built-up. However, the future pattern of growth will not be depicted because it largely depends on population density, which is directly or indirectly depended on infrastructure development and creation of services (Bright and Coleman 2001;Rose et al. 2020).

Correlation of urban sprawl and population growth
Pakistan has witnessed a massive rural-urban migration trend over the last three decades. Technically, we have made population statistics of LandScan/Census data, and urban growth estimation based on satellite image classification of LULC. Then, we made a correlational analysis of population increase and urban growth, and a lot of useful results are obtained (refer Figure 6). Over the last three decades of rapid urbanization, Pakistan cities have lost much agricultural land (farmland) as well as vegetation.
Besides, the relationship between urban sprawl and population change has addressed the issue of energy resources meeting social needs (Mahtta et al. 2022). Moreover, the complex relationship between urban sprawl and population change is central to urban sustainable development research (Wang 2021). The spatial growth rate and urban compactness index of four cities were calculated out of land use and population data during the time periods of 1990, 2000, 2010 and 2020. The urban growth characteristics, changing population density and their relationship in the four cities were investigated during these four periods (Guo and Wang 2022). In general, there exists a strong positive correlation between population growth and urban expansion. In particular, there are Pearson's r of Karachi 0.88, Lahore 0.98, Peshawar 0.98 and Quetta 0.95 during the time period of 1990 to 2020 (Shang et al. 2018).

Concluding remarks
Using Open-Source Data (Landsat images and LandScan data), this study has offered a technical solution of Google Earth Engine-supported analysis of statistics and machine learning to spatially monitoring the population change and urban growth of four major Pakistan cities.
Some substantial findings are obtained. Subsequently, over the last four decades of 1990 to 2020, the built-up area of Karachi has been increased from 241.5 Sq.km to 733.4 Sq.km, whereas 158.2% total population growth rate increased from 2000 to 2019. For Lahore, 284.5 Sq. km to 440.6 Sq.km, whereas 94.0% total population growth rate increased. For Peshawar, 153.3 Sq.km to 313.5 Sq.km, whereas 99.4% total population growth rate increased and for Quetta 137.2 Sq.km to 494.5 Sq. km, whereas 201.0% total population growth rate has increased. The results also correlated with Pearson's r of Karachi 0.88, Lahore 0.98, Peshawar 0.98 and Quetta 0.95. Between 1990 and 2020 and there is a positive correlation between population growth and urban expansion.
The increase of population and built-up area leads to the decrease of agricultural lands and the shortage of basic services in future. There is a dire need of spatial planning for the high quality of citizen living and the sustainability of urban growth. If timely actions not being taken, as the urban sprawl going in abnormal mode, it will bring side effects to agricultural land utilization and regional economy development.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work is supported jointly by the Open Fund of Hubei Luojia Laboratory, and the Special Research Funding of LIESMARS at Wuhan University, China.

Data availability statement
Data will be available upon request to the corresponding author.