Predicting the Air Quality Index of Industrial Areas in an Industrialized City in India Using Adopting Markov Chain Model

ORIGINAL ARTICLE Introduction: The rapid urbanization coupled with industrial development in Indian cities has led to air pollution that causes adverse effects on the health of human beings. So, it is crucial to track the quality of air in industrial areas of a city to insulate the public from harmful air pollutants. The present study examined and predicted air quality index levels in industrial areas located in Hyderabad, India. Materials and Methods: Markov chain model was developed to predict the air quality index levels in three industrial areas of Hyderabad city. The secondary data pertaining to the air quality index was analyzed from January, 2016 to December 2019 by developing Markov chain model. The state transition probabilities were used to find the predicted probability for the next 4 years. The study also analyzed the mean return time for specific states. Results: According to the findings, the highest frequency observed for transition in a month to the next month was 31 for the second industrial area in moderate state. The longest time required to repeat the state was 23.585 months and 23.259 months for the industrial area 3. Conclusions: Air quality index varies in industrial areas depending on the nature of industries and type of emissions. The prediction of air quality index is useful for the local authorities to implement measures to minimize the impact of pollutants on human health. Article History: Received: 14 September 2020 Accepted: 20 November 2020


Introduction
Air quality impairment is posing an imminent danger for public health in India. Accelerated expansion of cities compounded with industrial development has alarmingly curtailed the air quality triggering dispersions. Amidst the significant negative impact conditions in India, air quality deficiency is positioned as the 5 th in mortality and 7 th in affecting the public health 1 . Air quality measures in India is dominatingly focused on metropolitan regions with a limited inclusion in rest of the nation 2 . Air pollution is one of the serious problems globally, especially in urban areas of developing countries like India, which not only experiences an exponential growth of population, but also industrialization. The other contributing factors involved in air pollution in India are crowded residential zones, insufficient public amenities, and solid waste management 3 . The remedy to reduce air pollution is a complex phenomenon and requires collective efforts from all stakeholders 4 . The national air quality monitoring Programme (NAMP) is a nation-wide drive initiated by central pollution control board of India (CPCB) to ascertain the pollutants including sulphur dioxide, nitrogen dioxide particular matter 10 and 2.5 μm in cities and towns. Indian Government has established sixty air quality monitoring stations in thirty five urban locations spread over fourteen states to continuously record the concentrations various pollutants. In addition to this, state governments are also regularly tracking the air pollution levels within the states 5,6 . Air pollution in urban areas is a serious issue across the globe. In the developing countries, urban areas are surfacing perceptive issues due to rise in particulate matter and nitrogen dioxide levels 7 .Santosa et al.reported that the deterioration of air quality in urban locations was presumably the intense environmental issue on rapid growth of industries 8 . Air quality prediction in cities is a constructive approach to shield the public health and to organize the public to actively involved in protecting the environment 9 .
Studies conducted recently in urban areas of USA concluded that metal/steel industries and emissions from the vehicles contribute significantly towards rise in particulate matter levels 10,11 .Result of the studies carried out in European cities concluded that mixed industrial/fuel-oil combustion was the key source of particulate matter concentrations 12 . Rapid industrialization and urbanization in China led to intensely alarming air pollution that increased the adverse effects on human wellbeing 13 . Release of emissions from industries and increase of vehicular traffic in rapidly growing urban cities pose a warning to human health because of pollutants. Several researchers and policymakers expressed concerns over declining air quality, particularly at urban regions where the population is very dense and emissions from vehicles and industries are constantly increasing 14 .
Researchers in the past have adopted several methods to predict air quality. Support vector regression (SVR) and multiple linear regressions were picked out to predict air quality index in Delhi. The findings of this method were consistent with SVR method 15 .In order to assess the pollutants in China, three measures were developed: mean air pollution index (MAPI), air pollution ratio (APR), and continuous air pollution ratio (CAPR) 16 . The current scenario towards industrialization and urbanization in the developing countries has a major impact on the environment. The sources of pollutants enhance through urbanization and give rise to environmental pollution 17 .The AQI of an Indian city was analyzed using the two forecasting models of autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA). Based on the findings, satisfactory results were obtained by ARIMA model 18 . Artificial neural network was developed for AQI prediction by considering four air pollutants of Nitrogen Dioxide, Sulphur Dioxide, Carbon monoxide, and ozone in USA on daily basis during 2008 to 2017 using ARIMA 19 . To predict AQI in different places in Malaysia, ARIMA, ANN and fuzzy time series were used. Comparative results of three methods show that the ANN is a reliable approach in order to control and manage air quality 20 . A study was conducted in Klang and Miri, Malaysia to forecast AQI developing using a Markov chain model. In Markov chain model, the state transition matrix and probability are the vital approaches in resolving the AQI, which depend on the prevailing conditions.. The findings indicated that the model was easy to apply and estimate the behavior of the pollutants in future 21 . Markov chain models are adopted to elucidate the probabilistic behaviors of wind direction using data from Mersing, Malaysia and the result demonstrate the dominant direction for the study area in terms of probability metrics 22 . It is vital to examine pollution information in heavily populated urban areas to ascertain the impact of air pollution on health 23 .
A two-phase method was developed to predict the air pollution levels in India using monthly data in 2000 to 2010. The dataset was preprocessed using python coding and the preprocessed data were analyzed to predict the air pollution levels in two phases; results of the study depicted an acceptable level of accuracy in performance 24 . Based on the past studies, the importance of predicting AQI is vital in industrialized urban areas and should be noted regularly. Furthermore, study proposes Markov chain model to analyze and estimate the AQI instead of the time series model since the proposed model does not require in depth analysis of the framework of dynamic change and is comparatively straightforward to infer from the AQI data. The objective of the study was to predict the air quality index in industrial areas located in cities. Past studies adopted several hybrid and non-hybrid methodologies to predict air quality index; however, these are elaborated and difficult to deduce for the air quality information. The present study applied Markov chain model in predicting AQI. A Markov chain is a random process indexed by time in the case that the future is independent of the past 25 . The goal was to develop precise models to predict the monthly AQI and to assess such models to monitor the AQI.

Materials and Methods
The methodology section is broadly divided into the case study location, data collection, and step by step procedure of the Markov model.

Study Area
Hyderabad is the capital of newly formed province of Telangana in India. It occupies 625 sq.km and is located in northern part of south India. Hyderabad City has a population of about one crore in urban areas, creating it the fourth most inhabited metropolis and sixth densely populated urban cluster in India. Hyderabad is an industrialized urban area in India. Its industries include pharmaceutical, chemical process, food process, and manufacturing units. These industries are located in industrial regions surrounded by residential areas. The three industrial areas are sanathnagar (IA 1), jeedimetla (IA 2), and pashmylaram (IA 3). Weanalyzedthe available data pertaining to these three areas.

Data Collection
The monthly available AQI data of three industrial areas located in Hyderabad City was collected from Telangana state pollution control board (TSPCB) from January 2016 to December 2019. The secondary data were used in the analysis. The AQI measures the general quality of the air on a scale that vary from 0 to 500, under six different levels from good to severely polluted) 26 . These levels represent the impact on public and provide a yardstick for the people's field operations in a quantifiable form. In this regard, a low and high scores suggests that good and lower level of air quality respectively, which has ramifications for people's field operations. The AQI levels are presented in Table 1 and therefore the datasets were categorized into six states in projected Markov chain model. Step by Step procedure The procedure adopted in the development of Markov chain are detailed as follows 22,25,27 .
Step1: Define the state for Markov chain process The data used in framing the model is required to demonstrate the states for the Markov chain process.
Step 2: Construct the state transition matrix, N, and state transition probability, P.
The transition matrix, N as defined by the Markov chain, indicates the observed frequency of transition from one state to another and shown as "equation (1) (1) Where,n ij is the number of transitionsin a sequence for state I followed by state j. let P be the transition matrix that describes all the transition probabilities for each state of the model and shown as "equation (2)". (2) Step 3: Confirmation of ergodic Markov chain The confirmation of an ergodic Markov chain must be made to identify the presence of limiting distribution in this chain by categorizing the state of P. It can be divided into three sections; irreducible and periodicity Markov chains; and recurrent and transient state 28 .
 Irreducible Markov chain State I is reachable from state j if P (n) ij = 0 for some n≥0. Both states are accessible and can be said that they communicate as I ↔j 29  Step-4 Markov process probability values For this step, stationary probability distribution and mean return time can be obtained for Markov process probability values. Stationary probability distribution will describe the dynamics of AQI in long term; where, the chain is adequate for a long period of time with steady state probabilities that are distinct from initial conditions 31 . For anergodic Markov chain, the limiting distribution exists for the stationary distribution.
Then, Pj(n) =Σ k P k (n-1)P kj becomes П j =Σ k П k P kj , as n→∞ for j=0,1,2…n. The value of ∏ j will be high if probability occurrence of state j is high. Prediction in the long run behavior also has pitfalls and disadvantages in various problems such as lacking information and accumulated errors 32 .
Furthermore, mean return time needs to be calculated to identify the average time for specific states to return to itself m ij . It can be denoted as, m ij = 1 / П j .
Step 5. Forecasting and validating model Initial and state transition probability are used to calculate the Forecasting value by using the "equation (3)". (3) Where,P(S i ) is an initial probability and P ij is a state transition probability.
For model validation, Chi-square test was used to check the validity of Markov chain based on the independence assumption 21 . The chi-square value was calculated using "equation (4)". Based on the null hypothesis, the data between two consecutive time periods is independent, while the alternative hypothesis is dependent.
(4) If the calculated value of chi square is greater than table value at 5% level of significance, then the null hypothesis is rejected 21 .

Results
The obtained results are as follows;

State transition matrix and probability
The monthly frequency of AQI state, state transition matrix, and probability were obtained during January 2016 to December 2019 in three industrial regions located in urban areas of Hyderabad (Tables 2 to 6). Based on the results of Table 4, the highest observed frequency in transition matrix is 31 days for the moderate state.       Vol (5), Issue (4), December 2020, 1135-1144

Jehsd.ssu.ac.ir
Similarly, the state transition probability matrix was developed for IA2 and IA3. According to the consequences shown in Table 6 of IA 1, the predicted probability for the next 4 years (2020-2023) was determined by formulating "equations 3,4,5,6, and 7". П Good = 0.6364П Good + 0.3636П Good  П Good + П Satisfactory + П Moderate +П Poor + П Very poor + П Severe = 1 (7) In the same way, the equations were framed for IA2 and IA3 from the respective state transition probability matrix. The formulated equations of three industrial areas were solved using MATLAB software for Markov chain model. Moreover, a transition probability chain was constructed for IA 1 (Figure1). Similarly, the chains can be constructed for IA 2 and IA 3.

Stationary probability distribution
Stationary probability distribution is required to assess the long term variations of the pollutants, which exist when the Markov chain is ergodic. For certain states, it does not depend on the initial state. The stationary distribution of three industrial areas is shown in Table 7. The highest and lowest probabilities of 0.7438 and zero were observed for moderate state of industrial area 2. From the initial secondary data collected from TSPCB, the industrial area 2 has never achieved the good state during the study period. Similarly, the probabilities of very poor state and severe statewere zeros since none of the three industrial areas has ever attained those states( Table 2).

Mean return time
The mean return time for each AQI state was calculated by following the step 4 of methodology and values ( Table 8). The longest time to return a state is 23.585 months, while for good and poor states of IA3, 23.259 months are required. The mean return time for good state of IA2 was infinite due to the fact that the frequency of AQI in the initial dataset is zero.

Forecasting and Model Validation
Based on "equation (3)", the next probability can be obtained by multiplying the initial state vector and state transition probability. The initial state vector for AQI of IA 1 at the end of 47 months was at a good state; so, it will be 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, and 0.0000. P AQI is the state transition probability matrix, which is shown in Table 6. The appropriateness for the data with the method was checked based on the hypothesis testing to establish suitability of the developed model. The null hypothesis indicated that the AQI is independent from the consecutive months while the alternative hypothesis was that the AQI is dependent on consecutive months. The calculated value of chi square using Equation (4) of IA1 was 183.43, which is greater than the value of 37.65 at 5% significance with 25 degrees of freedom. Similarly, the calculated values of chi square for IA 2 and IA 3 were 203. 24 and 196.78, respectively. Since the calculated value of three industrial areas is greater than the table value, the null hypothesis is rejected.

Discussion
The AQI levels vary in industrial areas depending on the nature of industries and type of emissions. In addition, numerous industries around the city, vehicular traffic, and construction activity are among the major effective factors on the concentrations of particulate matter in ambient air. All these factors have an impact in enraging the AQI levels 8 . Markov chain model was developed to predict the AQI levels in three industrial areas of Hyderabad City by considering the secondary data pertaining to the air quality index from January, 2016 to December 2019. This study introduced the Markov chain as an operator to evaluate the distribution of the AQI level in the long term. A Markov chain is commonly used in many areas because of its efficiency in predicting long run behavior 21 . The findings of the study established the way model adopted to the data and estimate the AQI pattern in future.
The close observations of AQI over the study period showed that AQI involves five different states of transition. The AQI of 47 months of IA 1 shows that in 11, 17, 15, and 4 months, the AQI rates were good, satisfactory, moderate, and poor in the studied states, respectively. Table 4 and Table 5 show that the frequencies of four states for IA2 and IA3 are 0,12,35, and 0 as well as 2,21,22, and 2 months, respectively. The state transitional probability matrix of IA1 presented in Table 6 provides a broad indication of changes in the direction of AQI levels in the study period. The row elements in the transitional probability matrix provided the needed information on the extent of decrease in AQI levels regarding the pollutants. The column and diagonal elements of the state transition probability matrix shown in Table 6 indicate the probability of gain and retention in AQI levels with reference to the industrial areas, respectively 27 .
Stationary probability distribution is required to evaluate the long run proportion of the air pollution behavior. The stationary probabilities of IA1 indicate that the state of equilibrium is 0.22, 0.36, 0.31, and 0.08 for the states with good, moderate, and poor satisfactory levels, respectively. Therefore, the risk of increase in AQI levels of three industrial areas is low in the future based on the proportions obtained in poor, very poor, and severe states 21 .
The mean return time shows the average time during which the states stay in the same state. The stationary probability distribution was used in order to determine the mean return time for all states of AQI. The longest time for IA1 to return from poor state was 11.765 months. With regard to IA3, the longest times to return to the original state were 23.58 and 23.25 months for good and poor states, respectively. The return of very poor and severe states will not be attain in future as the mean return time is infinitef or the three industrial areas 28 .
The suitability for the data was verified based on the hypothesis testing by applying the chisquare test. The results of the model validation show that the AQI of a current month is rely on the previous month, which validates the dependency assumption 21 .

Conclusions
The study was conducted in Hyderabad, as an industrialized city to predict the AQI levels by considering the secondary data from January 2016 to December2019. Based on the stationary probabilities of three industrial areas, the highest probabilities of IA2 were 0.7438 (moderate), followed by 0.4670 (moderate), and 0.4457 (satisfactory) of IA3. The results of mean return time for three industrial areas represented that the IA 3 require approximately two years to return to the original state from good and poor states. The industrial area 1 requires approximately one year to return to its original state from the state of poor. The good state of IA2 was infinite. The government authorities of state pollution control board, department of industries, and managements of industries should implement remedial measures particularly at IA 2 and IA 3 to safeguard the people's health in the vicinity. The study limitation is the fact that the results were based on probability of the state of AQI; therefore, the findings do not confirm with the actual value of AQI for the predicted results.
The analyses are beneficial to ascertain the status of AQI and to predict AQI in the future. So, AQI plays a major role for decision/policymakers to know about air pollution quality. Air quality behavior is ought to be monitored aptly to bestow the stakeholders to enforce abatement strategies and avert the threats due to air pollution.