Variability analysis in PM2.5 monitoring

United States Environmental Protection Agency (US EPA) and Central Pollution Control Board (CPCB) are two major air quality monitoring agencies in India that measure the concentration of particulate matter of size up to 2.5 μm (PM2.5). PM2.5 study over southern Asia has significance from the environment and ecosystem viewpoint (Abdullah et al.,2007; Dockery and Stone, 2007). In order to raise alert and controlling of pollutants, not only forecasting but the accuracy of forecasting has attracted attentions from various departments of research and air quality monitoring agencies. Quest for reducing error in forecasting has never come to pause. The precursor in forecasting is data monitoring. Keeping in focus the initial phase of data analysis, PM2.5 concentration was collected from both agencies within an area of radius 3.1 miles for the year 2016. Using the data, variability analysis is carried out for the efficiency of vital environment protection agencies.


Data
The daily concentration of fine particulate matter monitored by US EPA and CPCB from 1st January 2016 to 19th December 2016 has been taken for the present study. The data has been obtained from http://cpcb.nic.in/ and https://www.epa.gov/ for CPCB and US EPA respectively with time series in Fig. 1 and descriptive statistics shown in Table 1 [1,3].

Study area
The U.S Embassy and Consulates manage airborne fine particulate matter monitoring. PM2.5 is a standard recognized by US EPA and permit to examine against U.S. standard measures [5]. US EPA covers Chanakyapuri area in Delhi. Central Pollution Control Board (CPCB) of India is the apex organization in country for monitoring pollution [6]. One of its monitoring stations is at RK Puram, Delhi. RK Puram and Chanakyapuri are 3.1 miles away. New Delhi, the capital of India and has Latitude, longitude coordinates as 28.644800, 77.216721 respectively. Chanakyapuri in Delhi has Latitude, longitude coordinates as 28.593853, 77.188736 and RK Puram as 28.566008 and 77.176743 respectively. Value of the data The dataset used in this article reflects the variability in monitoring by United States Environmental Protection Agency and Central Pollution Control Board. Air Quality Index calculated using data gives status of air we breathe in. The dataset will help to determine the effect of fine particles. The information contained in this article can be used to assess environment impact. The information provided can form the basis for issuing health advisory.

Experimental design, materials and methods
The study is divided into two sections of statistical and predictive analysis. Descriptive and inferential statistics are a vital part of data analysis. Analyzing data includes studying the statistics of data. The descriptive analysis describes big data using different measurements as indicated in Table 1.
A further step is to observe if there is any significant relation between the data considered. The correlation coefficient is a measure of the strength of the linear relationship between two such variables and is calculated as (1) r uv lies between À1 and þ1 inclusive as discussed in Bhardwaj and Pruthi, 2016 [2]. The value of Pearson correlation is 0.933 significant at 0.01 level proving the reliability of data observed.
Location and scale (estimated normal distribution parameters) of US EPA and CPCB data sets for unweighted cases using Blom's proportion estimation formula is calculated in Table 2. The probabilityprobability plot in Fig. 2 depicts deviation from the normal distribution. Location and scale values show a persistent trend for the data sets of PM2.5.
The extensively used t-test is carried out to analyze the difference between data monitored by USEPA and CPCB. Null hypothesis, H 0 : No mean difference between USEPA and CPCB monitored data i.e. m USEPA À m CPCB ¼ 0 and alternative hypothesis, H a : m USEPA À m CPCB s0: Calculated t-value is compared to t-value corresponding to degree of freedom (see Table 3): where, m represents mean and SD standard deviation. The F-test statistic using one-way ANOVA is evaluated to emphasize the mean difference between two datasets (Table 4).
where, MSE is error sum of squares divided by df associated and MSR is regression sum of squares. ANOVA and t-test sufficiently emphasized the significant difference between PM2.5 data monitored by two major agencies US EPA and CPCB.

Auto regressive integrated moving average
The preferred method in time series modeling is Box-Jenkins. Box-Jenkins method constitutes three components "AR," "I," or "MA" thus resulting in ARIMA. An ARIMA model can be expressed as The above equations were fitted to PM2.5 data. An approach of identification, estimation and diagnostic is carried out for ARIMA modeling. It defines large-scale variation in behavior of stationary time series. ARIMA is build upon present and past values of response and residuals. The main steps of The objective is to determine whether the process is stationary (d ¼ 0) or not. If not than it has to be transformed into such. No connection between every two sequential observations implies p ¼ 0. Constructing models and estimating its parameters [7]. Diagnostics and selection of model e the residuals and the quality of approximation of the model are examined. Theoretically, it is assumed that residuals are random and normally distributed. Application of the predictive model, forecasts, analysis of dependencies, and study problem-solving capabilities [4].
Using ARIMA algorithm, PM2.5 is forecasted. Tables 5e7 summarizes the output of fitted ARIMA Model.

Air quality index
Air Quality Index is not just a number signifying the quality of air but also explaining what we are inhaling. AQI was introduced in 1968. The objective was to aware public about deteriorating air quality and raise alarm in order to take precautionary measures. AQI is calculated using www.cpcb.nic.in. and www.epa.gov. To focus on the effects of monitoring variability air quality index is calculated. AQI is divided into the following categories: In Fig. 3 AQI is represented for those days in which they fall in different categories. The first and second bar in Fig. 3 represents calculated AQI corresponding to USEPA and CPCB data respectively. In a short span of 354 days AQI falls in different categories for 58 days.

Concluding remark
It is noted that PM2.5 monitored by US EPA and CPCB show a significant difference. Mathematically, US EPA measured PM2.5 data can be formed from CPCB by adding 13.7425 ± 7.856331729 and viceversa by subtracting. AQI calculated fall in different categories as per National Standards. This might have lead to the issuance of wrong public health advisories in the past and if this difference is not observed as carried out in the present study, it may have a significant adverse impact on human health in future as well.