China Source Profile Shared Service (CSPSS): The Chinese PM2.5 Database for Source Profiles

China Source Profile Shared Service (CSPSS, www.speciate.org.cn), a new database of emission source profiles for particulate matter (PM), has been developed by researchers from Chinese Research Academy of Environmental Sciences (CRAES). The first release of CSPSS 1.0 consists of comprehensive data from China that reveals the emission profiles of different sources in selected regions. Related source categories include coal-fired boiler, industrial process, fugitive dust, vehicle exhaust emissions, biomass burning and cooking. Compositing methodology and data quality control were applied to create high quality composite profiles of each source category. Statistical measures of correlation coefficient, t-test and and distribution of weighted differences can be used to compare the similarities and differences among individual and composite profiles. In addition, differences between data of SPECIATE and CSPSS were compared. The chemical composition shows special characteristics in different source categories. For example, SO4 and OC mark coal-fired bolier; Ca and Ca are the most abundant elements in cement production and construction dust emissions; Cl, K and K mark biomass burning; several metals such as V, Zn Sn and Pb could be used as tracers for paved road dust while Sr, Ba and Pb marked industrial emissions. The highest abundances of organic matter are observed in cooking emissions. Toxic species such as Cr and As are enriched in PM2.5 from coal combustion. Distinguished features of source profiles between SPECIATE and CSPSS indicate that the knowledge of local source profiles are needed for further research. This database should better reflect the emission profiles observed in Chinese environment. Sensitivity tests have been conducted to examine the impact of sub-composite source profiles usually used to establish the composite ones. The result shows that the use of sub-composite source profiles of coal combustion dose not impact the apportionment results for biomass burning, but other sources are varying influenced.


INTRODUCTION
High ambient concentrations of particulate matter (PM) have been received great concern in China, regarding its potential impact on air quality, global climate and human health (Cao et al., 2012a;Huang et al., 2014;Tao et al., 2014;Wang et al., 2014;Liu et al., 2016a). In order to guide local controls of urban air quality, the Ministry of Environmental Protection (MEP) in China recently issued the Technical Policy on the Comprehensive Prevention and Control of Atmospheric Fine Particulate Matter Pollution (MEP, 2013). Chemical source profiles are relative mass abundances of measured chemical species to the total PM in emission sources. Local or regional chemical profiles are usually used as input data for source apportionment of receptor models such as chemical mass balance (CMB) and Positive Matrix Factorization (PMF) (Watson et al., 2002;Samara, 2005;Held et al., 2005;Khan et al., 2010;Kong et al., 2010;Liu et al., 2015;Pernigotti et al., 2016;Zhang et al., 2016). In addition, the fingerprints of source profiles can be used to interpret ambient measurement data, verify with multivariate model factors, and create emission inventories (Simon et al., 2010).
Since 1980s, the US, Europe and other developed countries have begun to carry out in source apportionment and emission inventories (Bo et al., 2008). The SPECIATE database of US EPA is currently the most comprehensive collection of source profiles available, containing over 3000 PM profiles from literatures (Simon et al., 2010). Related source categories include fugitive dust, motor vehicle exhaust, biomass burning, industrial boilers, residential coal burning and so on. Detailed information, such as source categories, sampling and analytical methods and data quality assessment, is also recorded. To gain more knowledge about the European environment, The SPECIEUROPE database of PM emission source profiles became accessible in 2015 (Pernigotti et al., 2016); in that paper, the authors explored the relationships between profiles from different sources using cluster analysis. In China, studies on atmospheric pollution have been rapidly developed in recent years. An increasing number of studies have devoted to emission features for both anthropogenic and natural pollution sources (Zhang et al., 2007(Zhang et al., , 2008Zhang et al., 2012;Zhang et al., 2014a;Cheng et al., 2015;Huang et al., 2015;Zhao et al., 2015;Li et al., 2016;Liu et al., 2016b;Pei et al., 2016;Tian et al., 2016;Wu et al., 2016). For example, Shen et al. (2016) discussed chemical species' characteristics of fugitive dust from northern Chinese cities on regional scale. Wu et al. (2016) analyzed emission characteristics of diesel exhausts in Beijing, and compared differences between profiles of vehicle emission standard of China III and those of China IV. However, more local source profile measurements are still necessary for accurate source apportionment results. The source emission characteristics between China and other countries may be discrepant for the different fuel feeds, control technology or emission standard. For example, local and non-local source profiles of coal combustion may be different because the emissions are largely depended on constituents of local used coal, which vary greatly in different region of the world. Source profiles conforming to national conditions are needed.
China Source Profile Shared Service (CSPSS, www.spe ciate.org.cn) was developed by researchers in the Chinese Research Academy of Environmental Science (CRAES). Hundreds of original (derived from measured results) and composite (merging different source profiles of subcategories) profiles have been collected in the CSPSS. Related source categories include coal-fired boiler, industrial process, fugitive dust, vehicle exhaust, biomass burning cooking. The objectives of CSPSS are to develop the shared database service of speciation profiles for different regions of China, provide possible fingerprints for regional sources, and supply scientific supports for receptor models and air quality management.
The paper includes two parts: the first part shows the structure of database, methodology for establishing source profiles and characteristics of source profiles present in the database (see the section GENERAL DESCRIPTION OF DATABASE). The second part conducts sensitivity tests for source apportionment results (see the section SENSITIVITY OF SOURCE APPORTIONMENT RESULTS).

Structure of the Database
The construction of the database consists of two parts (as reported in Fig. 1), i.e., reference data input from previous studies and new source profile establishment. For the reference data, species with CAS ID, relative concentrations and their uncertainties are allocated on the basis of source categories. Basic information on sources and publication are also reported when available. For the new measured profiles, sampling and analytical methods were included. For example, dilution sampling system (dilution ratio is 11-15) was applied to stationary sources developed for real-world source characterization  while resuspended method was used for test of fugitive dust emissions Ho et al., 2003;Cao et al., 2008). The filter samples were chemically analyzed to obtain original source profiles, with three parallels for each. Specifies of analytical methods used in this study are  detailed in Ren et al. (2014). The quality rating scheme in this study refers to the profile rating criteria described in EPA SPECIATE database development documentation (Ying et al., 2016). These data were uploaded to CSPSS for further research. The java web developed by struts spring hibernate (SSH) framework is applied to CSPSS database. SSH framework is a collection of Spring, Struts and Hibernate, which are three Java-based frameworks for web development. SSH framework improves the efficiency of software development and separate the whole project into low coupling layers. The web mainly contains three parts. Source categories and profiles information, analytical methods and data sources are described in Search part. Licensed users can upload source profiles in Upload part, which should be agreed by administrator for the quality control. For Source Apportionment part, the shared platform was designed to be integrated with the receptor models, which would make the online source apportionment analysis be carried out conveniently.

Methodology for Establishing Source Profiles Data Quality Control Methodology
Over 200 profiles of CSPSS1.0 were found to be derived from measurements of PM 10 or TSP. These profiles were excluded. To obtain high quality PM 2.5 source profiles, some of these profiles were excluded for the sum of measured chemical abundances exceeding 100%. Source profiles of which the number of samples < 3 and test year before 2006 were also excluded according to SPECIATE's profile rating criteria. In addition, we calculated additional species that were not in the original source profiles to obtain the reconstructed mass (RM). Six major constituents in profiles were estimated: crustal minerals, trace components, organic matter, inorganic ions, elemental carbon and other ions. The seven constituents composed of multiple species are calculated as follows: (1) Crustal minerals were expressed as 1.89Al + 2.14 Si + 1.4Ca + 1.2K + 1.43Fe + 1.67Ti, assuming the common oxide forms of Al 2 O 3 , SiO 2 , CaO, K 2 O, Fe 2 O 3 and TiO 2 (Macias et al., 1981;Ni et al., 2013). The IMPROVE recommended soil formula expressed minerals as the sum of the oxides of Al, Si, Ca, Ti and Fe, and other unmeasured compounds were compensated by multiply a factor of 1.16. However, this factor was thought to be overestimated. This can be examined by comparing the calculated crustal mass with the measured mass of samples after subtracting organic matter and ionic concentrations . Thus the first formula was used in our research.
(2) Trace components were determined by multiplying trace elemental abundances by an oxygen to mental ratio (except for Al, Si, Ca, K, Fe and Ti). Each ratio of the element is obtained from Reff et al. (2009) based on the most common oxidation states of metals.
(3) Organic matter (OM) was calculated by multiplying OC abundance by ratio of OM/OC. Chow et al. (2015) found that multipliers varied from 1.2 to 2.6 depending on the extent of OM oxidation and secondary organic aerosol formation. In this study, the ratio of 1.25 was used for vehicle exhaust and 1.7 for biomass burning refer to Reff et al. (2009); the authors computed the median of OM/OC ratios obtained in previous studies. 1.4 was applied to all other source categories based on the long-standing and most common value used in numerous studies . (4) SO 4 2-, NO 3 and NH 4 + are summed without weighting factors for Inorganic ions (Chow et al., 1994). (5) The Other ions includes Na + , Mg 2+ , Ca 2+ , K + , Fand Cl -. (6) EC abundances are obtained from original source profiles without any multiplier. As such, RM equations take the following form: RM = OM + Crustal minerals + Trace components + Inorganic ions + Other ions + EC. The RM abundances within 80%-120% of the PM 2.5 emissions were reserved. The deviation may be attributed to unknown sources, measurement errors and improper multipliers.
For the carbon fraction, it is worth noting that different analytical methods lead to different results for OC and EC in the same sample, whereas TC is fairly consistent (Reff et al., 2009;Chow et al., 2015). In this study, the IMPROVE thermal-optical reflectance (TOR) method recommended by IMPROVE (Interagency Monitoring of Protected Visual Environments) was taken as the reference method. To ensure the consistency of test results, the raw OC and EC fractions were summed to calculate TC in each source profile. Average (OC)/average (TC) ratios were calculated for all the profiles using a TOR method for each source category. These ratios were multiplied by the TC values from non-TOR profiles in the same source category to estimate OC values. EC was then re-computed as TC minus the estimated OC for each non-TOR profile (Reff et al., 2009).

Compositing Methodology
After the excluding and carbon correction, 182 highquality PM 2.5 profiles were chosen to develop the final composite emission profiles, as shown in Table1. Averages and standard deviations were calculated to create composite profiles of sub-source categories. Missing values were excluded whereas zeros were included during statistical analysis. To research the representation of the composite source profiles, similarities and differences among individual and composite profiles were compared based on each subsource categories. Statistical measures used in this section are described as follows: (1) the t-test determines distinction of the chemical abundances; (2) the Pearson's correlation coefficient (r) quantifies the strength of statistical relationship between paired profiles, with r > 0.8 as a good correlation according to previous studies (Cheng et al., 2015;; and (3) the distribution of weighted differences [residual (R)/uncertainty (U) = (C i1 − C i2 )/(σ i1 2 + σ i2 2 ) 0.5 ] quantifies the differences between certain species from paired profiles where C ij is chemical abundances for species i from source j and σ ij is uncertainties (standard deviations in this study) of C ij . The normal probability function is used to evaluate the R/U ratios (68%, 95.5% and 99.7% for ± 1σ, ± 2σ and ± 3σ, respectively). When r > 0.8, p > 0.05 and 80% of the R/U ratios are within ± 3σ, the two profiles are considered to be similar, as described by Chow et al. (2003). It is worth noting that different data source and artifacts such as analysis process may also result in large R/U ratios. Table 2 gives a case for gasoline vehicle exhaust profiles. Correlation coefficients (r) exceed 0.8 and p > 0.05. More than 90% of the R/U ratios showed similarity between paired abundances within ± 3σ except for GV2/GVC. The composite gasoline vehicle exhaust profile is sufficient to represent most chemical abundances from different individual profiles. Similar results are obtained from other sub-source categories. However, large differences are found among the coal-fired boiler and fugitive dust profiles. Over 30% of the Correlation coefficients (r) between paired profiles are lower than 0.7 while average 27% of distribution based on R/U ratios are out of ± 3σ. This may be attributed to the different coal and geological characteristics which need to further research. In this study are only discussed the composite profiles deriving from the individual ones.

Characteristics of Source Profiles Chemical Composition
Twenty-eight elements (Al, Si, K, Ca, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, As, Se, Sr, Mo, Cd, Sn, Sb, Ba, La, Ce, Be, W, Tl and Pb), seven ions (Na + , NH 4 + , K + , Ca 2+ , Cl -, NO 3 and SO 4 2-), and carbon-containing species (organic carbon and elemental carbon) were determined to construct these profiles. Fig. 2 summarized the distributions of chemical abundances from the final composites for eighteen source and sub-source categories in CSPSS.
Distinguishing features were observed among composite profiles for different source types. Coal-fired boiler emissions are fluctuant due to the different coal as well as the methods of desulfurization and dust elimination. SO 4 2and organic carbon were most abundant in coal-fired boiler emissions in this study, accounting for 7.5%-19.3% and 5.3%-19.5%. NH 4 + accounts for 3.4% ± 2.5% of PM 2.5 in converter, which indicates that some NH 3 used as reducing agent in denitration device might have escaped and reacted with SO 2 and SO 3 (Pei et al., 2016). Similar results were found in previous studies, for example, SO 4 2-(20%-54%) was most abundant species emitted from industrial boilers tested by Wang et al. (2009). Zhang et al. (2008) observed that elemental and organic carbon emissions from industrial boilers were much lower than those from residential stoves. Zhang et al. (2012) observed good correlation between NH 4 + and SO 4 2from residential coal burning. Watson et al. (2001) investigated PM 2.5 chemical source profiles for vehicle exhaust, biomass and coal burning in northwestern Colorado. In that paper, carbon fraction ranged from 1% to 10% in coal-fired power plant emissions, from 50% to 90% in biomass burning emissions and over 95% in vehicle exhaust emissions. Heavy metals such as Cu, Pb, Zn and As were highly enriched in coal-fired boiler emissions, which was consistent with the SPECIEUROPE study in Europe (Pernigotti et al., 2016).
The highest abundances of organic matter were observed in cooking emissions. Fe marked steel and iron production emissions. Sofilić et al. (2004) 10 -4 10 -2 10 0 10 2 Metallurgy   Fig. 3. Composite source profiles of industrial process from CSPSS, including brick kiln, cement, iron and steel production and metallurgy. The height of each bar indicates the chemical abundances to PM 2.5 . The position of each triangle shows the uncertainties, which includes measurement errors and source variabilities PM 2.5 from brick production. Metallurgic process release specific metals like Cu, Fe and Zn (Minguillón et al., 2007). Cl -, K + and K marked biomass burning. Prior literature only used K + or K as an indicator of biomass burning (Watson et al., 2001;Li et al., 2007;Zhang et al., 2007). However, it is not necessarily a unique tracer, as other sources may contribute significantly to K + and can change from day to day (Brown et al., 2016). Simoneit (2002) identified levoglucose as a more specific tracer, which promote the source apportionment study for targeting biomass burning. Tao et al. (2015) lowered the uncertainties of source attribution to PM 2.5 using combined biomass burning tracers of K + and LG. One of the major contributors to urban PM is fugitive dust, which contributes 17%-32% of summer PM 2.5 mass and 12%-34% of winter PM 2.5 mass in 14 Chinese cities (Cao et al., 2012b). Geological dust is also a great concern in other parts of the world. For example, Watson et al. (2001) estimated regional PM 2.5 emissions in western Colorado. The authors found that in the summer natural dust contribute 21% of PM 2.5 , while 11% was emitted from agricultural tilling. Ca and Ca 2+ could be as tracers of cement production and construction dust Kong et al., 2014;Shen et al., 2016) with abundances 8-32 times higher than those of other profiles. The mean Ca/Al ratios ranged from 0.25-0.39 in vehicle exhaust emissions to 0.76-1.99 in urban fugitive dust, confirming previous observations that Ca/Al is a good marker for urban fugitive dust. Shen et al. (2016) used high Ca 2+ /Ca ratios (0.73-0.81) to indicate urban fugitive dust from Chinese north cities. This ratios (0.15-0.47) are relatively low in our study may account for dust samples mostly from southwest China (Liu et al., 2016b). Ratios of other crustal-related elements such as Si, Fe, Ti and K to Al were also taken as markers to characterize the soil dust from Loess Plateau, desert regions and Asian dust (Kim et al., 2003;Cao et al., 2008;Zhang et al., 2014a;Zhang et al., 2014b;Shen et al., 2016). This component can  be resuspended from bare soil by local winds (Belis et al., 2013). Also long-range transport such as Asian dust events (ADEs) can touch north even to the southwest China (Zhao et al., 2010;Li et al., 2015). Paved road dust could be characterized by several metals such as Sn, Sb and Ba, which may be attributed to the effect of motor vehicle contributions such as brake, oil drips and tire wear (Pant et al., 2015). In addition, Cr, V and Ni shows a good correlation (r > 0.89) and likely related to vehicle exhaust emissions (Cheng et al., 2015). Correlations among the crustal elements such as Al, Si, Ti were good, with 0.73 < r < 0.89 across the four fugitive dust sub-source categories. Vehicle exhaust emissions may be fluctuant deriving from different engine types and fuel combustion processes. OC/EC ratios range from 1.4 to 9.9 in diesel, gasoline and gas vehicle exhaust as an increasing trend in this study. Carbon fraction (OC and EC) together with metals such as Cu, Fe, Ba, and Zn can be used in RMs to distinguish emissions from gasoline and diesel vehicle (Belis et al., 2013).

Comparison with SPECIATE Source Profiles
The USEPA SPECIATE database of source profiles has become available since 1988 (Simon et al., 2010), and now it is version 4.5 in September 2016. It is the most comprehensive repository for source category-specific emission speciation profiles. The SPECIATE datasets for PM 2.5 source profiles were analyzed to overall compare with CSPSS. For comparison, profiles in SPECIATE were averaged (compute the median) together based on source category to create a composite profile. The median was calculated over the mean for reducing large errors stemming from the presence of outlier samples and measurements (Reff et al., 2009). To prevent over-weighting, raw profiles which were repeated samples from a single study were gathered to create sub-composites prior to their inclusion in the final composite. Sub-composites used same method above to average profiles. The data quality control were same as the methods applied in CSPSS. Source categories not included in CSPSS were eliminated for the consistency of comparison. A number of high quality source profiles were chosen for the composite. Information about composite and sub-composite profile numbers are shown in Tables 3 and 4, respectively. Fig. 6 shows the species' mass concentration of composite source profiles of CSPSS and SPECIATE for the given source categories. Twenty-nine common species are displayed. Concentrations of various species were broadly different in source profiles between CSPSS and SPECIATE. In most  cases, abundances of crustal material such as Al, Si, Ca, Ti and Fe were much higher in SPECIATE than those in CSPSS except for cement production, biomass burning and vehicle exhaust. The differences may be on account of different geological characteristics between China and America. The emissions of K, K + and Clfrom biomass burning were higher in CSPSS remarkably than those in SPECIATE whereas carbon fractions were reverse. In the vehicle exhaust, carbon fraction in SPECIATE are much higher than that in CSPSS. This may be attributed to the quality of PM source profiles of vehicle exhaust (over 90% were established before 2008) in SPECIATE; the fuel, lubricating oil and engine technology have been updated in recent years. OC/EC ratios range from 1.4 to 62.9 across the given source categories in CSPSS while 0.6-17.3 in SPECIATE. It is worth noting that Pb is still persistent in paved road dust from CSPSS. Direct emissions of Pb from vehicles have been forbidden since 2003 in China. This may be attributed to deposits of emissions from earlier vehicle exhaust and industrial emissions as Shen et al. (2016) reported in 14 northern Chinese cities.
The ambient data was collected for 24h average samples every 7 th day from the same site during January, April, July and October in 2015 (Table S2). All the measurements were blank corrected. Twenty samples were randomly chosen. These samples were used to run the CMB model four times, once with the coal combustion composite profile and others with sub-composite ones. The phrases "composite case" and "sub-composite case" will be used for the convenience.
CMB 8.2 developed by USEPA was used to apportion PM 2.5 in this study. Chemical abundances and uncertainties were both taken into account in the calculations based on source profiles and ambient data. Several indicators were evaluated to meet performance standards, such as R 2 > 0.8, χ 2 < 4 and percent mass between 80% and 120%. Fig. 7 shows the variation in results between the composite and sub-composite cases. The source contribution estimation (SCE) from the composite case is represented on the x-axis, while the SEC for the sub-composite cases are shown on the y-axis. The CMB model results may be insensitive when the slope (k) and R 2 of regression line is close to 1. Secondary aerosols contribution are not shown for its consistent results in both cases. Biomass burning is insensitive to change the composite profiles to sub-composite ones. The slope values are 0.95-1, which show that SCE of biomass burning are consistent when sub-composite profiles of coal combustion are applied. The average uncertainty of SCE for biomass burning is 5.1% for the composite case, which is larger than errors caused by sub-composite cases. R 2 > 0.96 suggests that the SCE was stable in the CMB model calculations. This maybe attribute to the stable tracer such as K + for biomass burning. The correlation coefficient between SCE Emission characteristics of coal combustion are complicated. Coal combustion contribution is sensitive to variation caused by sub-composite source profiles. More elaborate source apportionment are needed for the air quality management.

CONCLUSIONS
The online open-access CSPSS database is crucial for better understanding and advancing the research of atmospheric environment in China. The database contains latest information related to emission profiles of different emission sector. The database can be used as (1) reference (chemical composition of the PM sources) for source apportionment and emission inventory, (2) scientific evidences for new emission targets, (3) support for regional classification system of air quality management.
Compositing and quality control methodology applied in this study were expected to be references to other related studies. The R/U ratios can be used to quantify similarity and difference among source profiles and distinguish profiles of collinearity such as soil dust and road dust.
Different chemical composition characteristics of source profiles between SPECIATE and CSPSS indicate that the better knowledge of local source profiles are needed for further studies. More local and regional source profiles are also needed to develop a more comprehensive CSPSS database and guide the air pollution control efforts suited for China.
Sensitivity tests have been conducted to examine the impact of sub-composite source profiles usually used to establish the composite ones. The result shows that the use of sub-composite source profiles of coal combustion dose not impact the apportionment results for biomass burning, but other sources are varying influenced. Coal combustion contribution is sensitive to variation caused by sub-composite source profiles. More elaborate source apportionment are needed for the air quality management.
As the first attempt to establish a comprehensive database to China, the current version of CSPSS database coverage is limited for either source categories or geographical regions. In addition, comparison between different size fractions such as PM 10 and TSP is needed for further study.