Introduction

The economic growth of any country or region is heavily influenced by their income and spending habits as well as their savings, loans, and investment patterns.A country with a developed economy is characterised by high levels of income and consumption (Headey, 2008). In contrast, a country's poor economic position is characterized by low income and expenditure levels. People living in underdeveloped or developing countries are interested in transmitting goods and services. However, their demands are unmet due to a lack of buying power. This may result from insufficient income or poverty (Giri, 2015). Poverty incidence by geographic orientation shows an alarming rise in global poverty linked to poor economic development and significant income disparity. The geographical depiction necessitates a thorough study of poverty traps that impact poverty via various causal processes, the most prominent of which is inequality, undermining the global pro-poor development process (Khan et al., 2019; Martinez et al., 2015). The main issue confronting the globe is the massive rise in poverty and hunger, as represented in the United Nations Sustainable Development Goals (SDGs), namely SDG-1 and SDG-2, which call on global economies to cut poverty and hunger by half by 2030 on a country-by-country basis(Mestrum, 2003).

Since ancient times, poverty has been a contention in global politics. In 1981, over 1.9 billion people in the developing world lived on less than US$ 1.25 per day. In 2005, this statistic for poverty in developing nations decreased to 1.4 billion people living on US$ 1.25 per day (S. Chen & Ravallion, 2009). According to the most recent estimates from 2015, 736 million people lived below the US$ 1.90 purchasing power parity poverty level, accounting for 10% of the world's population (The World Bank, 2019). Most were from India, Nigeria, the Democratic Republic of Congo, Ethiopia, and Bangladesh (World Bank, 2020). In India, several committees have estimated poverty, notably the Lakdawala Committee (1993), the Tendulkar Committee (2009), and the Rangarajan Committee (2012). According to the Suresh Tendulkar Committee's methodology, India's poor population was 35.4 million (29.6%) in 2009–2010 and 69 million (21.9%) in 2011–2012(Panagariya & Mukim, 2014). According to the Rangarajan Commission, India's poverty rate had decreased from 454 million in 2009–2010 (38.2%) to 363 million in 2011–2012. (29.5%) (De et al., 2017). However, as India's economy grows, poverty decreases, but it remains a significant problem. As a result, the most crucial task for the researcher is to discover the most frequent and the most intense causes of poverty to reduce poverty incidence.

Poverty has a geographical component to it. In every local community, especially in developing countries, geography, particularly the physical environment, significantly influences the prevalence of poverty (Bigman & Fofack, 2000). According to empirical data, geography and poverty are significant linksas the geographical environment influence the spatial pattern of poverty to some extent (Das, 2012; Gallup et al., 1999; ODI, 2013; Sachs et al., 2001; Zhou & Liu, 2019).Several geographic variables may contribute to substantial regional variation in the prevalence of poverty. These factors include natural resource endowments and agro-climatic conditions, market accessibility and proximity, access to transportation infrastructures, the influence of demographic variables, etc (Vista & Murayama, 2011a).So, poverty is not just a socio-economic issue but also an essentially spatial one (Glasmeier, 2002). However, discussions about poverty and its causes have traditionally focused on the socio-economic sphere, but recently academics and development practitioners have begun to realize the importance of geographical influence in understanding and evaluating poverty. The Millennium Development Goals (MDGs) adoption increased attention to geographical aspects of poverty, which has resulted in efforts in the international arena (Hyman et al., 2005).As a result of advances in geographic information systems (GIS), as well as advances in remote sensing (RS) and statistical approaches, spatial data and satellite images are now being used as tools for poverty mapping and analysis (Deichmann, 1999; Hyman et al., 2005). Spatial analysis allows combined biophysical data and socioeconomic indicators to provide a more systematic and analytical picture of human well-being and equality (Henninger & Snel, 2002).

On the other hand, statistical analysis can help us comprehend the impact and severity of poverty-related variables(S. G. Kim, 2015).Several studies have been carried out using various statistical methodologies to determine the occurrence and degree of poverty(Dagum & Costa, 2004; Gräb, 2015; Leonard, 2014; Sugiyarto, 2007; U N STATISTICS, 2005; Ubale, 2017; Wang & Wang, 2016). Among the several works one of the major work done by Roy et al. (2019), who employ a multi-stage random sampling approach in West Bengal to quantify the incidence, depth, and severity of multidimensional poverty (MDP) among rural families and the contributions of dimensions to MDP using logistic regression analysis.Another research by Guchhait and Sengupta (2021), who investigated poverty difficulties in rural homes in India, using Glimpses from the Purulia District of West Bengal as a case study. They employed quantitative approaches such as stepwise multiple regression and decomposition analysis using FGT indices to estimate the relative strengths of the variables influencing poverty.

The presentpoverty research was conducted using remote sensing (RS) and geographic information systems (GIS) to analyze the spatiality of determinants, as well as multivariate statistical techniques to examine the connection of critical deciding variables. Previously, many research studies have been conducted utilizing multivariate statistical approaches (Coppi & Statistica, 1998; Dagum & Costa, 2004; Gräb, 2015; Hanley, 1983; Ipsum et al., 2015; Mondala et al., 2018; Mukhopadhyay, 2008) in conjunction with remote sensing (RS) and geographic information systems (GIS) (ADB, 2020; Akinyemi, 2008; Hersh et al., 2021; Netzband, 2010; Sangli, 1999; Shah, 2011; Thongdara et al., 2012) to analyze the geographical connection as well as the relationship between various poverty factors and their measurement and mitigation strategies. Previously, different socioeconomic, agro-climatic, and physical variables were utilized to integrate this kind of study effort for analytical purposes. Based on previous research, this study examined the following variables associated with poverty, i.e., mean elevation, percent of the slope with and without agricultural constraints, percent of soil with and without agrarian restrictions, mean drainage density, mean pond density, and mean annual rainfall as agro-climatic factors; mean road density as access to transport and communication factor; the nearest estimated distance to town centers or major cities as proximity to major market-related factor; and population growth rate and literacy rate as demographic factors.

The primary objectives of this research are – (i) to extract the significant determinants of poverty using geospatial and geostatistical tools and technique and their interrelationship with poverty (ii) to identify the spatiality of the variable over the specific regions (iii) to find out the spatial distributions of significant factors throughout the regions and causes behind it.

This study area did not experience any study regarding this. That’s why this is the uniqueness of the research and authors were trying to fill this research gap as much as possible. The second research gap is that significant research projects deal with socio-economic factors to control poverty, but very few include physical factors (Minot et al., 2006; Vista & Murayama, 2011b). In this respect, this research simultaneously focuses on socio-economic and physical parameters.Based on the ‘economic determinism’ philosophy, the human economy is majorly determined by nature. So, authors were also trying to fill up this research gap.

This research is quantitative research investigating the test area's poverty factors. So, in a quantitative investigation, many qualitative characteristics are not considered; consequently, some deficiencies may happen during this study as the qualitative parameter is not accepted for analysis. It is one of the challenges of this research. Anotherdifficulty we faced during this research is that due to overspreading of COVID-19, we were not able to get extensive field survey and collect primary ground truth data, so the entire work was processed under secondary data. But the outcome of the work will be helpful for the planner and social workers for properly implementing policies to eradicate or reduce poverty.

Materials and methods

Study area

To achieve these goals, the Birbhum District of West Bengal, an impoverished area in eastern India, has been chosen as a test location, where, as per the 2011 census, about 27.7 percent of the total population is below the poverty line (Mondala et al., 2018). The study area is located between 23° 32′ 30" and 24° 35′ 0" N latitude and 87° 5′ 25" and 88° 1′ 40"E longitude and covers an area of 4,545 sq km. The Ajay River runs between Birbhum and Bardhama along the southern base. The state of Jharkhand lies on the northern and western border, while Murshidabad is on the east (Figure 1). According to the 2011 census of India, Birbhum has a population of 3,502,404. Males outnumber females by a ratio of 1,790,920 to 1,711,484. Literate males to literate females are 2,158,447 to 956,966 (Census, 2011).

Fig. 1
figure 1

Location map of study area- a India, b West Bengal, c Birbhum district

Figure 2 (a & b) shows the study's physiographic division and geological condition.The landscape of this region is extremely similar to the Rarh area of Murshidabad, Burdwan, Bankura, and Midnapore. The divided plateau of Jharkhand stretching south-southeast includes the district's western section. The landscape grows more undulated as you go east. The western highlands are made up of Gondwana sediments, laterite, and alluvium.

Fig. 2
figure 2

a Physiographic division and b Geological map of the study area

In contrast, the remainder of the region is made up of Archean crystalline rocks that are hard and impervious to water. There are several rock formations in this region. In the western half of the area, for example, a dissected plateau made of Archean granitic gneiss may be found. The Gondwana era sediments, as well as laterite and alluvium deposits, have produced the eastern section of the area.

Data source

The dependent variable in this study is poverty, parametrically called poverty incidence, which is defined as the percentage of individuals living below poverty in a particular nation. (Coudouel et al., 2002). Census data and Birbhum District Statistical Abstract, 2011 are utilized to integrate this study. In view of this, the block-wise poverty percentage (poverty incidence) was estimated based on the block-wise number of total households and the BPL families. Among Independent variables, mean elevation, percent of the slope with and without agricultural constraints, percent of soil with and without agrarian restrictions, mean drainage density, mean pond density, and mean annual rainfall are used as agro-climatic variables; mean road density is used as access to transport and communication variable; the nearest estimated distance to town centers or major cities is used as proximity to the major market-related variable, and population growth rate and literacy rate are used as demographic variables (Table 1).This test region's spatial distribution of rainfall was produced from the District planning map of NATMO on a 1:1,000,000 scale. Mean elevation data of this study region was collected from satellite imageries of the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM V3, 1-arc sec). The spatial data of slope gradient (percent) was produced, tuning the same elevation data from SRTM Dem in Arc GIS 10.5 software (Evaluated Copy) using the Spatial Analyst tool. The drainage density and pond frequency maps of this study area were produced by using Arc GIS software (Evaluated Copy) using the grid index method from the drainage map and pond map, respectively digitalized from the Survey of India (SOI) topographical maps (sheet no. 72P/12, 72P/14, 72P/15, 72P/16, 73 M/1, 73 M/5, 73 M/6, 73 M/9, 73 M/10, 73 M/13, 73 M/14, 73D/3, 73D/4) on 1:50,000 scale. The soil texture map was obtained from the Birbhum District Portal. This study area's road density map was prepared using Arc GIS software (Evaluated Copy) using the line density method from the road map obtained from the Survey of India (SOI) website. The closest distance from each town center to major cities was calculated to represent proximity to major markets. The distance between each town center and the nearest large town was estimated using the ArcGIS network analyst feature based on the existing road networks. Block-wise population growth was calculated using the 2001 and 2011 census data of Birbhum District. Lastly, block-wise literacy was also calculated using the Birbhum census, 2011. These variables directly or indirectly impact poverty in developing nations, particularly agriculture-based societies(see table 1).

Methods

Various geospatial and geostatistical methods have been deployed after collecting the dataset from different sources. The methodology section has been divided into three significant steps.

  • (a) Methodology regarding geospatial database generation.

  • (b) Methodology regarding the quantitative analysis.

  • (c) Methodology regarding the spatial mapping of controlling parameters.

The geographical data was created using Arc GIS 10.5 software (Evaluation Copy), and the statistical methods were examined with IBM SPSS Statistics 25 software (Evaluated Copy).Figure 3 illustrates the whole process in a methodological flow chart.

Fig. 3
figure 3

Methodological flow chart

Methodology regarding geospatial database generation

To produce thematic layers, digital image processing is used with the digitization of existing maps (Samson et al., 2017). 10 variables were examined in this study to determine block-level poverty incidence: mean annual rainfall; mean elevation; percent of slope with and without agricultural restrictions; mean drainage density; pond frequency; percent of soil with and without agricultural restrictions; population growth rate; and literacy rate. These variables are regarded to be the most important in determining poverty levels. Table 2 is an overview of the literature. The spatial database of this study was developed using ArcGIS 10.5 software (Evaluated Copy). To extract data from each thematic layer in each block, 252 random points were generated. The spatial analyst tool in Arc GIS 10.5 software was used to retrieve numerical data from all thematic levels of these random 252 sites.

Table 1 Data source of different variables and hypotheses
Table 2 Literatures on selection of conditioning variables for Poverty incidence

Methodology regarding quantitative analysis

Multiple linear regression and principal component analysis (PCA) have been performed using IBM SPSS Statistics 25 software to determine the linear relationship between explanatory variables and response variables and extract the significant factors influencing poverty.

Multiple linear regression (MLR)

We must understand the influence of independent variables (mean annual rainfall, mean elevation, percent of slope with and without agricultural restrictions, mean drainage density, pond frequency, and percent of soil with and without agricultural restrictions) on the dependent variable (mean estimated distance to town centers or major cities, population growth rate, and literacy rate) (poverty incidence). Linear regression is the most important statistical approach for discovering linear correlations. MLR is the best approach for extracting linearity when there are more than two independent variables and just one dependent variable. So, in this study, MLR was used to determine the direct effect of several independent factors on a single dependent variable (Jobson, 1991; Plotts, 2011; Turóczy & Marian, 2012; Uyanık & Güler, 2013). This statistical technique established a linear relationship between many explanatory (independent) and one response (dependent) variables (Jobson, 1991; Uyanık & Güler, 2013).

Multiple linear regression (MLR) mathematically expressed as-

$${\varvec{y}}_{{\varvec{i}}} = {\varvec{\beta}}_{0} + {\varvec{\beta}}_{1} {\varvec{x}}_{{{\varvec{i}}1}} + {\varvec{\beta}}_{2} {\varvec{x}}_{{{\varvec{i}}2}} + {\varvec{\beta}}_{3} {\varvec{x}}_{{{\varvec{i}}3}} \ldots \ldots \ldots + {\varvec{\beta}}_{{\varvec{p}}} {\varvec{x}}_{{{\varvec{ip}}}} + {\varvec{\varepsilon}}$$
(1)

where, yi = Response or dependent variable (poverty incidence); xi1, xi2, xi3………xip = Explanatory or independent variables (mean annual rainfall, mean elevation, percent of the slope with and without agricultural restrictions, mean drainage density, pond frequency, percent of soil with and without agricultural constraints, mean road density, nearest estimated distance to town centers or major cities, population growth rate, and literacy rate); β0 = Intercept or constant; β1, β2, β3……… βp = Slope coefficients for each explanatory variable; ε = Standard error of estimation.

We must first determine if the independent variables are linearly related before doing an MLR calculation. Multicollinearity tests may be used to assess the linear connections between independent variables. Multicollinearity occurs when the regression model includes many variables closely connected to the dependent and independent variables. (J. H. Kim, 2019; Young, 2017).

When an investigator tries to determine how effectively each variable can be used to predict or understand the response variable in a statistical model, multicollinearity may lead to skewed or misleading findings(Frank E. Harrell, 2001).Several research studies have addressed the multicollinearity issues in the regression model, stressing that the main problem consists of unequal and biased standard errors and impractical interpretations of findings(Jobson, 1991; Turóczy & Marian, 2012; Uyanık & Güler, 2013). The Variance Inflation Factor (VIF) is one of the most used methods for identifying multicollnearness (Shrestha, 2020). The variance inflation factor is used to determine how much the variance of the predicted regression coefficient is inflated by correlating the independent variables. The VIF is computed as follows:

$${\varvec{VIF}} = \frac{1}{{1 - {\varvec{R}}^{2} }} = \frac{1}{{{\varvec{Tolerance}}}}$$
(2)

The tolerance is only the opposite of the VIF. The lower the tolerance, the more probable it is that the variables are multicollinearity.

Principal component analysis (PCA)

Principal component analysis (PCA) is one of the most effective methods for extracting the major determining factors of poverty (Ipsum et al., 2015; Suryahadi et al., 2014). PCA has been performed using IBM SPSS 25 statistical software. The Karl Pearson correlation matrix has been created to identify and associate the individual factors of poverty incidence (Suryahadi et al., 2014). The correlation value of > 0.5 indicates a powerful correlation between variables, < 0.5 is a weak correlation between variables, and the 0.5 value suggests moderate correlation (Libório et al., 2020). PCA offers the most effective information on the components of extensive data collection and minimizes information loss. This research adequately interpreted PCA varimax rotation in accordance with the Guttman–Kaiser rule. Only those elements with a value higher than one (> 1) were considered as significant factors (Cangelosi & Goriely, 2007).Therefore, the eigenvalue in the IBM SPSS 25 program has been set to 1 for the appropriate extraction of the poverty incidence components. Cattle's (1966) scree Test is a graphical representation of the eigenvalue in terms of factors (Cattell, 1966). A simple line segment plot shows the percentage of the total variation. It has been demonstrated that when substantial factors are present, the slope of the line is steep, but when the factors correspond to the error, the slope is fat. Following steps have been utilized to the computation of PCA.

Step-1:The following formula has been used to standardization of data-

$$\user2{ Y}_{{{\varvec{ij}}}} = \frac{{{\varvec{X}}_{{{\varvec{ij}}}} - {\mathbf{X}}\overline{X}_{{\mathbf{j}}} }}{{{\varvec{S}}_{{\varvec{j}}} }}$$
(3)

where, i = 1, 2, 3……n is point of samples; j = 1, 2, 3……….p is the amount of original sample.

$${\varvec{X}}_{{{\varvec{ij}}}} = \frac{1}{{\varvec{n}}}\mathop \sum \limits_{{{\varvec{i}} = 1}}^{{\varvec{n}}} {\varvec{X}}_{{{\varvec{ij}}}} \;\& \;{\varvec{S}}_{{\varvec{j}}} = \sqrt {\frac{1}{{{\varvec{n}} - 1}}\mathop \sum \limits_{{{\varvec{i}} = 1}}^{{\varvec{n}}} ({\varvec{X}}_{{{\varvec{ij}}}} - {\varvec{X}}_{{\varvec{j}}} }$$
(4)

After standardization, the mean and variance of each sample were 0, 1.

Step-2:Calculated the correlation matrix of variables R.

Let the sample correlation matrix \(R = \left( {r_{{ij}} } \right) = {\text{ }}2X^{\prime } X,\) Correlation coefficient.

\({\varvec{r}}_{{{\varvec{ij}}}} = \frac{1}{{{\varvec{n}} - 1}}\sum ,\) and \({\varvec{r}}_{{{\varvec{ij}}}} = {\varvec{r}}_{{{\varvec{ji}}}}\), \({\varvec{r}}_{{{\varvec{ii}}}} = {\varvec{r}}_{{{\varvec{jj}}}} = 1\) So Ris a symmetric matrix; the main diagonal elements are 1.

Step-2:Calculated Eigenvalue \({\varvec{\lambda}}_{1} \ge {\varvec{\lambda}}_{2} \ge {\varvec{\lambda}}_{3} \ldots \ldots \ldots {\varvec{\lambda}}_{{\varvec{p}}} > 0\) of the correlation matrix of variables, and the Eigenvector corresponding

$${\varvec{\alpha}}_{1} = \left[ {\begin{array}{*{20}c} {{\varvec{\alpha}}_{11} } \\ {{\varvec{\alpha}}_{21} } \\ {\begin{array}{*{20}c} \vdots \\ {{\varvec{\alpha}}_{{{\varvec{p}}1}} } \\ \end{array} } \\ \end{array} } \right]\;{\varvec{\alpha}}_{2} = \left[ {\begin{array}{*{20}c} {{\varvec{\alpha}}_{21} } \\ {{\varvec{\alpha}}_{22} } \\ {\begin{array}{*{20}c} \vdots \\ {{\varvec{\alpha}}_{{{\varvec{p}}2}} } \\ \end{array} } \\ \end{array} } \right]\;,......{\varvec{\alpha}}_{{\varvec{p}}} = \left[ {\begin{array}{*{20}c} {{\varvec{\alpha}}_{{1{\varvec{p}}}} } \\ {{\varvec{\alpha}}_{{2{\varvec{p}}}} } \\ {\begin{array}{*{20}c} \vdots \\ {{\varvec{\alpha}}_{{{\varvec{pp}}}} } \\ \end{array} } \\ \end{array} } \right]$$
(5)

Step-4:The final PCA can be expressed with the help of Equation-

$$\user2{ F}_{1} = {\varvec{\alpha}}_{{1{\varvec{i}}}} {\varvec{X}}_{1} + {\varvec{\alpha}}_{{2{\varvec{i}}}} {\varvec{X}}_{2} + {\varvec{\alpha}}_{{3{\varvec{i}}}} {\varvec{X}}_{3} \ldots \ldots \ldots + {\varvec{\alpha}}_{{{\varvec{pi}}}} {\varvec{X}}_{{\varvec{p}}}$$
(6)

where F1 is the first principal component, X1, X2, X3,………Xpare the original variables (namely mean annual rainfall, mean elevation, percent of the slope with and without agricultural restrictions, mean drainage density, pond frequency, percent of soil with and without agricultural constraints, mean road density, nearest estimated distance to town centers, or major cities, population growth rate, and literacy rate), and α1i, α2i, α3i,……… αpi are the coefficient or weight of each variable for the first principal component F1.

Methodology regarding the spatial mapping of controlling parameters

The Inverse Distance Weighting (IDW) technique has been employed in ArcGIS 10.5 software using the spatial analyst tool to represent the spatial pattern of controlling factors derived from principal component analysis(PCA) (Ajaj et al., 2018). IDW is a type of interpolation technique in which missing values are approximated by averaging other sample values from the vicinity and considering that nearby values are more similar than the furthest value and is used to estimate the value of unknown stations (Burrough & McDonnell, 1999). Fig. 5 shows the IDW principle, which states that an unknown value may be estimated using the closest value. The National Weather Service of the United States established this idea in 1972 (F. W. Chen & Liu, 2012).IDW can be mathematically expressed using the given formula-

$$\widehat{R}_{{\varvec{p}}} = \mathop \sum \limits_{{{\varvec{i}} = 1}}^{{\varvec{N}}} {\varvec{w}}_{{\varvec{i}}} {\varvec{R}}_{{\varvec{i}}} \;\& \;{\varvec{w}}_{{\varvec{i}}} = \frac{{{\varvec{d}}_{{\varvec{i}}}^{{ - {\varvec{x}}}} }}{{\mathop \sum \nolimits_{{{\varvec{i}} = 1}}^{{\varvec{N}}} {\varvec{d}}_{{\varvec{i}}}^{{ - {\varvec{x}}}} }}$$
(7)

where, Ȓp = unknown data in thematic layers, Ri = known data value of the thematic layer, N = data value of each thematic layer, wi = weighting of each thematic layer, di = distance from the unknown station to a known station for each thematic layer.

Results

Spatial variability and importance of parameters

Spatial variability means the distribution determinants throughout the study region. The spatial variation of major determinants are discussed below.

Rainfall

Rainfall is the most important agro-climatic factor that influences crop production and poverty levels in every agro-based community (Kyei-Mensah et al., 2019; Ndamani & Watanabe, 2015). From June to September, the southwest monsoon winds provide 80 percent of the rainfall for the study region. In this district, the average annual rainfall is 1430.5 mm. The quantity of rainfall in the western half of the country is greater, while it is lower in the eastern section. The whole district has been split into five rainy zones based on rainfall distribution, i.e., very high (> 1400 mm), high (1300–1400 mm), moderate (1200–1300 mm), low (1100–1200 mm), and very low (< 1100 mm) rainfall zone (Figure 4a).

Fig. 4
figure 4

Principle of inverse distance weighting (IDW) technique

Elevation

Elevation, also an agro-climatic factor, directly affects living conditions and the prevalence of poverty in any region (Merrey et al., 2018; Olsson et al., 2015). As the Chhotanagpur plateau extends, the western portion of this district is highly elevated. So on the west side is undulating topography. On the other hand, the central and eastern parts of the district have flat terrain (Figure 4b).

Slope

Slope is one of the most significant agro-climate factors that affect people's economic and poverty conditions (Petterson et al., 2019). Slope affects the irrigation system and crop productivity because it influences surface runoff and vertical water percolation(Fombe & Tossa, 2015; Wezel et al., 2002). Slopes with agricultural limitations, i.e., slopes greater than 8%, directly affect agricultural output and the prevalence of poverty because of their rolling to hilly and steep slopes. Lands with a slope of 0–8 percent, defined by flatlands to rolling hills, on the other hand, have a comparatively low degree of poverty incidence. A 0–8% slope characterizes the majority of the research area. A higher than 8% slope may be seen in certain sections of the western side (Figure 4c).

Drainage density

The stream length per unit area in a region is referred to as drainage density (Horton, 1932; STRAHLER, 1952). Drainage density is also one of the most critical agro-climate variables that influence people's economic conditions and poverty circumstances. Individuals with greater access to river resources have a geographic advantage because rivers provide water for agriculture and fishing (Kamra et al., 2019). Higher drainage density has been seen in this research region over the western portion of the plateau border, whereas lower drainage density has been observed over the eastern part of the district, where the terrain has a gentle slope (Figure 4d).

Pond frequency

The number of ponds per unit area is called pond frequency.Pond frequency is also one of the most important agro-climatic factors that affect people's economic situations and poverty levels since ponds also offer water for agricultural irrigation and a platform for inland fishing(Bichsel et al., 2016; Golam Rabbani et al., 2018). The prevalence of ponds in this research area is shown in Fig. 4e. The eastern and central parts of the research region have extremely high to high frequencies, while the western portion has low to moderate pond frequency.

Soil texture

Soil texture is another significant agro-climatic variable affecting crop productivity and the poverty level of an agriculture-based community. Not all soils are conducive to agricultural production. The ideal agricultural soils are balanced in mineral components, soil organic matter, air, and water inputs (Kraaijvanger et al., 2016; Liu et al., 2014). So, there is a clear link between poverty incidence and soil with and without agricultural constraints. Fig. 4f depicts a soil texture map of the study region. The research region was divided into five main soil texture groups: sandy, clay loam, sandy loam, clay, and loam.

Access to transport and communication

Transportation and communication are a significant predictor of poverty incidence since well-connected transportation is critical for regional development, as agricultural and industrial output is transported to markets through transportation networks (Milewski & Zaloga, 2013; Rodrigue, 2016). This research considers mean road density a criterion for transport and communication. Fig. 4g shows the average road density map of the research area. In the central part of this research area, where major cities are situated, the maximum road density has been found than in the outer regions.

Proximity to major markets

Market location is another key factor of regional development, given that the market plays an essential role in the interchange of commodities and services. The sole market is focused on different agricultural and household industrial products. Additional employment in the market sector may thus be offered. This is why it is an important variable affecting the prevalence of poverty in an area (Imran et al., 2017; Kero, 2002).Fig. 4h shows the location of this research region's different markets and road networks.

Population growth

Population growth refers to the increase or decrease in the size of a population over time, which may be positive or negative depending on the balance of births, deaths, and migration. There is a direct relationship between population growth rate and poverty incidence. When the globe's population outnumbers its resources, the earth's resources carry the additional weight on land, making economic growth harder and increasing poverty and hunger throughout the world (Malthus, 1798). So, it is an important demographic factor that influences the incidence of poverty. The population growth rate in this research area is shown in Fig. 4i. The blocks in the western and northern portions, namely Md. Bazar, Rampurhat-II, Nalhati-II, and Murarai-II, have seen the most population growth. The blocks of Rajnagar, Suri-I, Nalhati-I, Mayureswar-I, and Murarai-I have seen the least amount of population growth. The rest of the blocks have seen moderate population growth.

Literacy

Literacy is another significant demographic variable that has an impact on economic growth and the prevalence of poverty. Literacy improves job chances and raises one's socioeconomic standing. Increased literacy rates also result in lower population growth rates, allowing a country's resources to be better distributed among fewer people. Those with inadequate literacy skills are much more likely to be poor and suffer health issues (Dahal, 2017; Desai, 2012). Fig. 4j shows the literacy map of this research area. Two blocks in the northern portions, namely Murarai-I, and Murarai-II, have seen the lowest literacy rate, i.e., less than 58.03 percent. Two blocks of Illambazar and Nalhati-I have seen the maximum literacy rate, i.e., more than 73.30 percent. The rest of the blocks have seen a moderate literacy rate, i.e., 85.03 to 73.30 percent.

Poverty incidence

This is the dependent variable for this research. It is the conventional metric, which may be calculated by calculating the proportion of people in a particular area who live below the poverty line (Coudouel et al., 2002). The greater the prevalence of poverty in a community, the poorer it is compared to another. Figure 4k shows the poverty incidence map of this study area. The blocks in the northern portions, namely Murarai-I, Murarai-II, Nalhati-II, Rampurhat-I, and Rampurhat-II have seen a high to the very high prevalence of poverty. The blocks like Rajnagar, Mayureswar-I, Mayureswar-II, and Nalhati-I have seen a moderate poverty rate. The rest of the blocks have seen a low incidence of poverty.

The statistical linearity between poverty incidence and other Explanatory variables

MLR checks statistical linearity. Table 3 shows the MLR model's tolerance and VIF values for each independent variable. VIF opposes tolerance. Smaller tolerance increases multicollinearity. A VIF of 5–10 shows multicollinearity. (O’Brien, 2007).Table 3 shows that the tolerance values of all independent variables employed in this study are significantly higher than 0.20. We can also observe from this table that the VIF values of all independent variables are much lower than 5. This result shows no problem with multicollinearity among the independent variables applied in this study. So, we will go for a multiple linear regression test to determine the relationship between the explanatory and response variables. (Fig. 5)

Table 3 Regression coefficient
Fig. 5
figure 5figure 5

Spatial pattern of different variables- a Annual Rainfall, b Elevation, c Slope in Percent, d Drainage Density, e Pond frequency, f Soil Texture, g Road Density, h Proximity to Major Market: Towns and Road Network, i Population Growth Rate, j Literacy Rate, k Poverty incidence: Percent of BPL Population

Multiple linear regression depicts the linear connection between numerous independent variables and a single dependent variable. The whole model was run using IBM SPSS Statistics 25 (evaluation version). According to Table 4 and Fig. 6, poverty incidence is positively related to mean elevation, percent of the slope and population growth and negatively related to rainfall, road density, and pond frequency. Except for soil texture, all factors in this study showed a significant link with poverty in the study region. Most of the correlational value is significant at 1% significance level.Thefollowing paragraph delves into the nuances of each element and their impact on poverty in this study area.

Table 4 Correlation matrix among variables
Fig. 6
figure 6

Diagrammatic presentation of Co-relation matrix

Table 4 shows the expected link between all explanatory factors and poverty incidence, whereas Fig. 6 reveals the analytical outcome. Rainfall and poverty incidence should be inversely related. The finding's − 0.899 regression coefficient also confirms the previous theory. Thus, higher rainfall benefits farmers by delivering more water for agricultural irrigation and thus boosts farmproductivity and resultant reduced poverty incidence. There is a direct correlation between a region's mean elevation and its prevalence of poverty. The positive regression coefficient confirms the hypothesis equals 0.966. Therefore, poverty is more prevalent at higher elevations. The finding, which has a regression coefficient of 0.957, substantiates the theory that came before it. The rate of agricultural production and poverty level are both directly influenced by the presence of hilly terrain and steep slopes.

Conversely, flatlands to rolling hills with a slope of 0–8% have a low frequency of poverty. The regression coefficients for drainage density and pond frequency are negative verifies, which was expected to be inverse. So, increasing drainage density and pond frequency would reduce poverty since rivers and ponds offer water for agricultural irrigation and fishing. However, regression results show that agricultural soil texture is not a statistically significant predictor of poverty occurrence. Soil is not a factor impacting poverty in the examined region. The negative regression coefficient (− 0.749) between road density and poverty incidence supports the original premise. Well-connected transportation infrastructure is essential for regional development, which relies on delivering agricultural and industrial products to markets. It was also expected that access to big marketplaces would reduce poverty. The.949 positive regression coefficient supports the previous hypothesis. Because marketplaces are essential for exchanging commodities and services, increasing the distance between cities and big markets reduces poverty. The positive regression coefficient (0.822) for the association between population growth and poverty incidence supports the first premise. It is also plausible to deduce that rising population growth will lead to poverty. Finally, discovering a negative regression coefficient (-0.945) between literacy and poverty incidence validates the original hypothesis. It is also plausible to infer that higher literacy will reduce poverty.

Predicting poverty incidence from explanatory factors requires explaining the model first. To summarize the model, r = 0.861, r2 = 0.741, and Durbin-Watson = 1.260 indicate positive autocorrelation. In this situation, the standard error of the estimate is 0.7891 (Table 5), showing that the MLR model is accurate. This model's significance level is 0.000 (Table 6), rejecting the null hypothesis (H0).

Table 5 Model summary of multi linear regression
Table 6 Anova

Table 3 is significant since it contains information on the forecast poverty level. From this table we can see that the intercept value (β0) is 98.711 and the values of pond frequency (xi1), mean drainage density (xi2), mean elevation (xi3), mean annual rainfall (xi4), literacy rate (xi5), population growth rate (xi6), slope in percent (xi7), mean road density (xi8), proximity to major market (xi9), and soil texture (xi10) are − 4.498, − 0.764, − 0.162, − 0.008, − 0.420, 0.668, 0.153, − 0.010, − 0.511 and 0.000respectively.When all of the values of all variables have been gathered, the linear relationship between the explanatory and response variables may be mathematically represented as follows:

Predicted poverty level (yi) = 98.711(β0) + (− 4.498) *Pond frequency (xi1) + (− 0.764) *Mean drainage density (xi2) + (− 0.162) *Mean elevation (xi3) + (− 0.008) *Mean annual rainfall (xi4) + (− 0.420) *Literacy rate (xi5) + 0.668*Population growth rate (xi6) + 0.153*Slope in percent (xi7) + (− 0.010) *Mean Road density (xi8) + (− 0.511) *Proximity to major market (xi9) + 0.000*Soil texture (xi10) + 0.7891.

If we look into the p-value of the Anova table (Table 6), it can be observed that the p-value is < 0.05 that means at a 5% significance level, the MLR model is valid and acceptable. So, if we get the values of all the explanatory variables, we can compute the projected poverty level for this study areain future years.

Major factors affecting poverty in the test region

This research used Principal Component Analysis (PCA) to identify important poverty determinants. This PCA shows five factors that predict poverty in this location. Table 7 shows variables explained for factor analysis, eigenvalues, and variance. The first five components (Table 7) define 70% of the total variance (the first principal component accounts for 20.577 percent of the total variance; the second principal component accounts for 15.052 percent; the third principal component accounts for 13.625 percent; the fourth and fifth principal components account for 10.686 percent and 10.082 percent of the total variance, respectively). Rests were omitted. This study only concentrates on the first five components while assessing data.

Table 7 Variable explained for factor analysis

The eigenvalues were shown using the Cattle's Scree Test (Cattell, 1966). This simple line segment graphic illustrates the volatile nature of the data. According to the slope line in Fig. 7, if more elements might influence the outcome, the line will rise at a quick rate; on the other hand, if the factors are connected to the inaccuracy, the line will remain flat. The first five components shown in Fig. 7 are the ones that matter, whereas the other components include errors that lead to flat slopes. The variables are rotated to generate new variables using principal component analysis. A strategy for determining factor loading and gaining knowledge of factors is called rotation (Acal et al., 2020; Kaiser, 1958; Rösler & Manzey, 1981). To generate uncorrelated components, this research usedthe varimax rotation, which can be seen in Table 8.

Fig. 7
figure 7

Scree plot for eigen value of factors

Table 8 Rotated Component Matrix

Factor 1:agro-climatic factors

It accounts for 20.577 percent of the overall variance in the data (Table 7). This factor is subject to considerable sway from the mean elevation, slope, and annual precipitation averages (Table 8). The agro-climatic elements have a role in the prevalence of poverty in the area under investigation. Therefore, the effect of agro-climatic factors on the poverty level in the study region was defined by factor 1.

Factor 2: infrastructural and educational factors

Explains 15.052% of overall variance (Table 7). This factor negatively affects literacy but positively affects the estimated distance to town centers or significant cities (Table 8). Therefore, proximity to big markets positively influences the incidence of poverty in the area under investigation, but the literacy rate has a negative effect. These are known as the educational and infrastructure aspects.

Factor 3: hydrological factors

13.526 percent of the total variation is accounted for by this factor (Table 7). This component significantly negatively impacts the average drainage density and pond frequency (Table 8). This component identifies mean drainage density and pond frequency as hydrological factors that negatively affect the incidence of poverty in this research area. Consequently, these elements fall under hydrological considerations.

Factor 4 and 5: demographic and pedological factors

Explain 10.686% and 10.087% of the total variance, respectively (Table 7). Factor 4 has a significant positive loading on the population growth rate, while Factor 5 has a large positive loading on the soil texture (Table 8). So, factor 4 defines population growth rate as a demographic component that positively affects poverty incidence in this study region. Factor 5, on the other hand, describes the percent of soil with and without agricultural constraints as an agro-climatic factor that positively influences the incidence of poverty in this research area.But collectively, these factors are considered as demographic and pedological factors.

Spatial pattern of factors

Mean elevation, yearly rainfall, and slope in percent are all included in Factor 1. Research shows that this factor is more concentrated in western research areas (Fig. 8a), and this factor is likewise more focused in the west part of the research areas than in the eastern part (Fig. 8b) (Fig. 5a, b & c). The west side of the Chota Nagpur plateau is more remarkable in height and slope than the east side.

Fig. 8
figure 8

Spatial pattern of different factors- a Factor-1, b Factor-2, c Factor-3, d Factor-4, e Factor-5

Factor 2 includes the literacy rate, the mean road density, and the closeness to critical markets. These are the elements that are not physical. According to Fig. 8b, the largest concentration of this factor has been proven to be in the middle of each block in this study location. This is consistent with the findings of other factors such as mean road density and proximity to important markets (Fig. 5g & h). Another aspect contributing to this concentration of factors is the high literacy rate in this area's southern and central regions (Fig. 5j).

Factor 3 Pond frequency and mean drainage density. These are physical factors. The largest concentrations of this factor were found in the eastern, south-eastern, southern, and south-western regions of this study area (Fig. 8c) (Fig. 5e). Due to its higher height and undulating topography (Fig. 5d), the western section of this region has the largest drainage density, contributing to the factor's concentration.

It's important to note that population is a factor in Factor 4. For this component, western and southern regions (Fig. 8d) have been determined to have the highest population growth rate in particular blocks (Fig. 8d). There has been a slight increase in population in the southern and central regions (Fig. 5i).

Soil texture, an essential pedological element, makes up Factor 5. Research in this area has shown the greatest concentrations of this component in the western regions of several blocks (Fig. 8e), which comprised both agricultural and non-agricultural soils (Fig. 5f).

Discussion

According to the correlation study, there is a negative relationship between poverty and physical parameters such as rainfall, pond frequency, etc. Thus, when the value of the specified physical characteristic increases, deprivationdecreases, and vice versa.If rainfall increases, agricultural productivity will increase, and more ponds will help the villagers engage in fishing or other aquatic activities. However, this relationship is positively correlated with elevation because, as altitude increases in the study region, agricultural activities may decrease, and there are fewer alternative sources of economic income. Also, there is a negative correlation between illiteracy and poverty; as individuals grow more literate, they will be able to locate other income sources. However, considering the positive relationship, we discover that poverty is positively associated with population increase. This relationship is logical since rapid population expansion may pressure food security, job security, income security, etc., accelerating the poverty rate increase.

Consider a discussion of the primary causes of poverty in the location under examination. Then, we might conclude that agro-climatic conditions have a crucial role in preventing poverty. The majority of this region's economy is dependent on agriculture. Instead of the agro-economic factors, marketing activities and education are the most critical factors in this region as there are six significant municipal marketing areas. Therefore, many rural residents rely on the marketing operations of these marketplaces and are interested in selling their agri-products in the municipal market areas. Soil is one of the five primary determinants; however, its influence on poverty is minimal. Primarily, the soil has an indirect effect on poverty, but it directly affects agricultural productivity. Therefore, we see soil as a small role in the occurrence of poverty.

Overall, it can be said that physical variables have a significant influence in determining the amount of poverty in the area under investigation. It distinguishes this study from earlier research, which did not emphasize physical factors. Notably, such research has not yet been conducted in this region; hence, we cannot compare our results to comparative analyses of this study area.

Conclusion

This research article identifies the major determinants of poverty in the study region. Numerous multivariate statistical methods such as multiple linear regression and factor analysis were used to extract the result. Multilinear regression demonstrates that independent variables can predict approximately 74% of the dependent variable. As a result of the linear relationship between dependent and independent variables, a mathematical model has been developed that can be applied to any region and can be used to forecast poverty incidence when other parameters are known. Additionally, because the p-value for multilinear regression is 0.000, this model is statistically significant. The following section of this research discusses another type of multivariate statistics, factor analysis. Five factors have been extracted in this case using the eigenvalue 1. Thus, the factor with a greater than one eigenvalue is considered to be the significant factor. The rotated component matrix demonstrates the effect of agro-climatic conditions on poverty in the study region, with 20.577 percent of variables explained. Factor 2 explains 15.052% of the total variance and can be broadly classified as a transportation and communication factor. Factor 3 accounts for 13.526 percent of the total variance and reflects physical factors influencing the incidence of poverty. Factors 4 and 5 accountsfor 10.686 percent and 10.087 percent of the total variance, with Factor 4 classified as a demographic component and Factor 5 as an agro-climatic fact.

It is one of the most critical studies on the current income issue. If a government wishes to enact measures to eliminate poverty, the first step will be to identify the primary causes of poverty. As though we could create strategies if we accurately identified the factors. As the execution of policies might vary by location, the elements in each region are not the same.

So, after this study, a planner or a social scientist can easily comprehend the current state of poverty incidence and the determinants that contribute to it. As we approach a critical pandemic situation, applying geospatial technology and statistical tools for identifying the specific determinants of poverty in the study area will undoubtedly assist the administration in adequately identifying and eradicating major poverty influence determinants.

There is some lacune of the study that, as a significant part of the study focuses on the physical parameters, very few socio-economic parameters were considered. Because due to the pandemic situation, an extensive field study was not possible. That’s why this study primarily emphasizeson physical parameters—but applying advanced statistical tools and geospatial technology made this work more scientific. So, the outcomes of the assignment help us better understand how scientifically we extract the determinants of poverty and how the concept of environmental determinism still controls humans’ economic earnings in the era of modern technology. Therefore, there is an essential future potentiality of this study. Whenever we implement policies to eradicate or reduce poverty, we will first justify why poverty prevails? The answer will be the extraction and justification of the causes of the poverty level, and this can only be done with the help of the extraction of the determinants of poverty. All the higher authorities need to do such research to better implement policies.