Complex PM2.5 Pollution andHospital Admission for Respiratory Diseases over Big Data in Cloud Environment

With the establishment of China’s national air quality monitoring network, large amounts of monitoring data are available for different kinds of users. How to process and use this big data is a tough problem for users: most users have limited computing power, and new data are collected at every moment. Cloud computing may be an efficient and low-cost way to solve this problem. )is paper investigates a problem of a complex system: the impact of PM2.5 on hospitalization for respiratory diseases. A changepoint detection method based on grey relation analysis was used to solve this problem. Daily air pollution monitoring data and patient data were used in this study. Our results showed that (1) PM2.5 pollution showed a positive correlation on hospital admission for respiratory disease; (2) most patients went to hospital 2 days after PM2.5 pollution events; and (3) male, children, and old people were significantly affected by PM2.5 pollution. Our study is of great significance to help the government formulate suitable policies to reduce the damage caused by PM2.5 pollution and help hospitals allocate medical resources efficiently.


Introduction
Generally speaking, the development of the global economy, especially in the ird World countries, is closely related to environmental problems. At present, the rapid development of Chinese economy and the acceleration of urbanization make the contradiction between economic growth and environment more and more prominent. Consequently, China is suffering serious air pollution. With a population of over 1.4 billion, China's air pollution situation is extraordinary [1]. In recent years, the annual death toll from air pollution in China is over1 million [2] and cost about China's 2.0% GDP (gross domestic product) [3].
In 2012, the newly revised Ambient Air Quality Standard went into effect [4], and China began to build national air quality monitoring network. e real-time hourly concentration of six monitoring indicators, sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), carbon monoxide (CO), ozone (O 3 ), particulate matter with a diameter smaller or equal to programs through a system composed of multiple servers to get results, and then return them to users. e advantages of cloud computing include high flexibility, scalability, and high cost performance [10]. Users no longer need expensive supercomputers with large storage space. ey can choose relatively inexpensive PCs to form a cloud, reducing costs, and computing performance not inferior to supercomputers.
In this paper, a change-point detection method based on grey relation analysis (GRA-CP) is introduced and fully investigated to reveal the correlation between PM 2.5 pollution and hospitalization for respiratory diseases by employing the collected air pollution big data and records of hospital admission. Our aim is to predict the potential impact of PM 2.5 pollution on patients with respiratory diseases. Our contributions include as follows: (1) Investigate whether the grey correlation method is suitable for solving public health problems (2) Identify which populations are more susceptible to air pollution and suffer from respiratory diseases (3) Study on residents' consultation after haze pollution

Related Work
As a critical component of air pollution, many scholars have pointed out that PM 2.5 poses serious threats to humans [11]. Because PM 2.5 particles are very small, they can be inhaled into the respiratory tract, pass through the lungs, enter the circulatory system through the alveoli, and damage other organs [12][13][14]. After entering the vascular system, PM 2.5 can cause thrombus, hypertension, and coronary heart disease [15,16]. PM 2.5 may also increase suicide rate and cause mental illnesses [17,18].
China has taken significant measures to reduce personal health risks and property losses caused by severe air pollution due to the government-oriented policies [19]; however, that is still much higher than the WHO (World Health Organization) standard: 10 μg/m3 [20]. Although China has made some achievements in controlling air pollution, China is still suffering burden of disease caused by PM 2.5 : each year PM 2.5 caused about 1.3 million deaths and reduced the life expectancy by about 3 years [21]. e influence of air pollution on the respiratory system has drawn many scholars' attention. Saldiva et al. [22] found out that São Paulo's air pollution was so bad that it could cause adverse health effect to exposed population. Research conducted by Zhao et al. [23] reviewed that the immune system can be weakened by severe air pollution. Farhat et al. [24] figured out that air pollution could greatly increase the probability of children suffering from respiratory diseases. e outdoor air pollution was classified by WHO (World Health Organization) as a cancer-causing agent [25] in 2013; their research shows that outdoor air pollution significantly adds the incidence of lung cancer and increases the risk of bladder cancer.
As PM 2.5 is the most lethal element of air pollution, it draws many researcher's attention. Ostro et al. [26] found out that some components of PM 2.5 were related to various respiratory diseases among children; the result form Vinikoor-Imler et al. [27] showed that PM 2.5 was closely related to lung cancer morbidity and mortality; Song et al. [28] suggested that, in 2015, PM2.5 was associated with 40.3% of stroke deaths and 23.9% of lung cancer deaths. Xing et al. [29] studied the harm of PM 2.5 to the respiratory system and suggested that residents should try their best to avoid exposure to air pollution. He et al. and Zheng et al. investigated the components and the source about PM 2.5 pollutants in Beijing separately [30,31].
Nanjing is the capital of Jiangsu Province. e city has 11 districts and an administrative area of 6,600 km 2 with a total population of 8,436,200 as of 2018. Air quality in Nanjing has shown some improvement in recent years, but the annual mean concentration of PM 2.5 in 2018 is about 41 μg/m3, which is still above the standards of WHO. e main sources of PM2.5 pollution in Nanjing are industrial emissions, vehicle exhaust emissions, construction site, and road dust. Although service industries are dominating, accounting for about 60% of the GDP of the city, there are still many heavy polluting industries in Nanjing, such as Yangzi Petrochemical, Jinling Petrochemical, Nanjing Chemical, and Nanjing Iron and Steel Company. By the end of 2018, there are about 2.6 million vehicles in Nanjing. Situated in the Yangtze River Delta region with a humid subtropical climate and influenced by the East Asian monsoon, the air contains a high level of atmospheric moisture which can act as a binder increasing PM 2.5 pollution. With different kinds of fine particulate coming from different sources, PM 2.5 pollution in Nanjing is a complex system. erefore, our study is of great significance to help the government formulate correct policies to reduce the damage caused by PM 2.5 pollution.

Grey Correlation eory. Grey relation analysis (GRA)
is a comparative method to study the trend in a system [32,33].
is method has many advantages: it does not require too much data sample size nor does it need a typical distribution law, the calculation hour is relatively low, and the results will be more consistent when compared with other qualitative analysis results.
GRA is diffusely applied in environment protection and air-quality evaluation. Lu et al. [34] pointed out that PM 2.5 pollution in eastern China is mainly caused by human activities, and for northwest of China, dust is also a component of PM 2.5 pollution; Han et al. [35] found that males in eastern China had a higher chance of developing lung cancer than in the western China; and Ouyang et al. [36] demonstrated the space distribution of PM 2.5 and cancer incidence are similar, meaning a close link between PM 2.5 and cancer incidence.
GRA has been described previously in our study in which GRA was used to identify which population segment was more susceptible to air pollution which caused lung cancer in Nanchang, China, and which air pollutant was the main cause of lung cancer [37]. Our previous study result shown that PM 10 is the main cause of lung cancer. 2 Complexity GRA has been applied and investigated as follows: (1) Construct the reference sequence and comparison sequences. (2), . . . , a 0 (n)] be the reference sequence (the characteristics of a system were reflected by data sequence), and let (2), . . . , a i (n)] be the comparison sequences (the behavior of a system was affected by that data sequences).
(2) Dimensionless processing of the reference sequence and comparison sequences.
Because the dimensions of factors in the system are usually not the same, which have no convenience of comparison, it is difficult to get the correct result when comparing. erefore, when performing grey correlation analysis, the dimensionless data processing is generally needed: (3) Find the grey correlation coefficient ξ(A i ). e degree of correlation refers to the geometric difference between the curves of the reference and compared sequences. For one reference sequence A 0 , there are many comparison sequences, A 1 , A 2 , . . . , A n , and the correlation coefficient ξ(A i ) at each time can be calculated by the following: where ρ > 0 is the resolution coefficient, and the value of ρ is usually taken as 0.5. (4) Correlation degree (r i ) calculation: r i refers to the correlation degree at each time, and it has not just one value, so the message is too fragmentary to conduct the whole comparison. So, to gather the correlation coefficients at each time into one value is very essential, which means to calculate the mean value as the degree of correlation: (5) Correlation degree ranking: e correlation degree of the subsequence to the parent sequence is sort by the order of size to form the correlation sequence x { }, which reflects the "superior or inferior" relationship of each subsequence to the parent sequence. If r 0 i > r 0 j, x i is said to be better than x j for the same parent sequence x 0 , and it is recorded as x i > x j .

GRA-CP.
Change point is a sudden change in the time series data set [38]. Change point search is to identify when time series change happens [39]. Change point reflects the qualitative change of things or processes. In order to accurately reflect the changes of the process and deal with them correctly, the change point problem cannot be ignored. e problem of change point has impacted many fields of production and life, such as computer, signal process, meteorology, finance, and medicine.
Based on GRA, some scholars have developed a new method to solve the change point problem: GRA-CP. is method keeps advantages of GRA: the amount of calculation is relatively small and no strict requirement for the amount of data. Wong et al. [40] came up with a grey correlation test method searching changing points, and the Shunde river network area is taken as an example. Zhang and Gong [41] used the time series example of the agricultural disaster area in eastern China to verify the practicability and strength of the grey correlation algorithm; Chen and Gong [42] identified CO 2 emission trends' change points and cycles from China's energy consumption; and Wang et al. [43] calculated the change points of cumulative CO 2 emission from 1995 to 2004 in three eastern China jurisdictions and performed cycle division.
GRA-CP was applied as follows: (1) Construct the reference sequence: where W s ≤ W e , W s , W and W e are integers. In this study, n is the number of days (n � 1095), and both daily air pollution monitoring data and patient data were processed with the same number of integers in each sequence.
(2) Comparison sequence construction: Based on the reference sequence, the comparison sequence is as follows: Formula (5) is a comparison sequence set of order n − 2W + 1.
(2) eoretically, W s can be taken as 1, but when W s takes a very small value, the method in this paper will be meaningless. erefore, in numerical applications, we should choose W s reasonably, for example, W s should be greater than or equal to 5.

Hospital Admission Data.
Daily hospital admission data were gathered from a local major hospital from January 1, 2013, to December 31, 2015. ese records include case number, gender, age, time of diagnosis, and ICD (International Classification of Diseases). Respiratory disease (ICD-10/J00-J99) records were screened out from all records.
en, those records were further categorized into groups by gender (female and male) and age (0-14, 15-64, and 65+). Table 1 shows the concentration of PM 2.5 from 2013 to 2015 in Nanjing. e annual average and the maximum and minimum concentration of PM 2.5 in Nanjing showed a downward trend, meaning that PM 2.5 pollution in Nanjing is falling, but the concentration of PM 2.5 is still much higher than the WHO safety standard (10 μg/m3). Figure 1 shows that, in Nanjing from 2013 to 2015, the daily concentration of PM 2.5 is much higher in winter, and spring and summer have lower concentration; this is related to the increase usage of fossil fuel for heating in winter, and the atmospheric circulation is relatively stable in winter, which make PM 2.5 pollution difficult to diffuse. Table 2 shows the percentage of respiratory diseases in different groups and the population proportion in Nanjing from 2013 to 2015. Among all hospital admission records, respiratory system diseases account for 42.34%, the largest among all diseases. 46.26% patients were children, 14.72% were older people, and disease percentage for age 0-14 and 65+ was much higher than that of population proportion, which means children and older people were more likely to suffer from respiratory diseases; about 54.33% patients were male, and disease percentage for males was higher than population proportion for males, which suggested that men are more susceptible to respiratory diseases. Table 3 shows the change point in different groups. Among all groups, the change point appeared on the second day, which means 2 days after the PM 2.5 pollution events, most patients would go to the hospital. Except the 65+ group, that group also has a change point which happened on the third day, which means some of the older people would go to the hospital on the third day after the PM 2.5 pollution event.

Discussion
In this paper, we studied the impact of PM 2.5 pollution on hospital admission for respiratory disease in Nanjing, China, from 2013 to 2015. We found that PM 2.5 pollution was closely related to hospital admission for respiratory disease. e lag between PM 2.5 pollution and hospital admission varied slightly among different age groups: most of patients went to the hospital 2 days after PM 2.5 pollution events, while some people over 65 years old decided to wait one more day. ese findings in our study may help mobilize medical resources more efficiently and reasonably. Table 4 shows that, among all 7 days of a week, most people see a doctor on Sunday followed by Monday; this trend appeared in all groups except the old people and was particularly significant among children. is might be because children and working-age people need to go to class or work during working days, while older people were laid off or retired; therefore, they might spend their time more flexible than younger people.
ere are some drawbacks regarding the hospital admission data in this article: data were collected from only one local hospital, which cannot cover the entire population of Nanjing and are not up to date, only from 2013 to 2015. e reason is that the Chinese government has strict restrictions on data access. We failed to get the hospital admission data of the entire Nanjing from Jiangsu Provincial Moreover, due to privacy reasons, the hospital admission data do not include the address of the patients, and we are unable to filter out the data of non-Nanjing patients.
Based on our findings, we make the following recommendations to the government: (1) Take a more flexible approach to coordinate medical resources and allocate the work and rest time of medical staff, and improve the number of staff at the peak of medical treatment on Monday and Sunday (2) Establish an early warning system to prepare for a surge in the number of patients after severe haze pollution appears (3) Take appropriate measures to reduce air pollutant emissions, such as increase the proportion of new energy vehicles, install energy saving and emission reduction equipment, and install dust suppression equipment at construction sites (4) Make insensitive data available to the public

Conclusions
In this paper, we utilized the GRA-CP to study the correlation between PM 2.5 pollution and hospitalization for respiratory diseases based on the analysis of the daily air pollution datasets and daily records of hospital admission. We found that the following: (1) PM 2.5 pollution was closely related to respiratory disease (2) Children and old people are more likely to suffer from respiratory diseases due to PM 2.5 pollution, and women are less susceptible to respiratory diseases caused by PM 2.5 pollution than men (3) Most patients went to hospital 2 days after PM 2.5 pollution events, while some of the old people waited one more day

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.    Conflicts of Interest e authors declare that there are no conflicts of interest regarding the publication of this paper.