Ordinal analysis applied to the results of positive matrix factorization of chemical ionization mass spectrometry data

As an innovative analytical approach ordinal analysis is applied to positive matrix factorization (PMF) analysis outputs to identify the most important species and factors in chemical ionization mass spectrometry (CIMS) data. The procedure and outcome of the ordinal analysis facilitate further automated data analysis. Prior to standard PMF analysis, CIMS data were normalized to assure equal comparisons and facilitate the analysis process. The ordinal analysis was applied to the Factor Profiles (FPs) results, where mass numbers m/z are ranked by their FP fractions. Such ranking seeks the most influential compounds leading each factor, and the top m/z can be further investigated, e.g. by peak assignments. Rank maps can be plotted based on the ordinal results where the FPs are converted into a different space, which can potentially be used for cluster analysis. The rank maps provide an additional method for factor identification, especially when time series or other forms of the dataset are difficult to recognize. • Ordinal analysis identifies the most important fingerprint species leading each factor. • Rank map visualizes the features of each factors. • The method can be used as an online approach for source appointments of atmospheric pollutants.


Specification table
Atmospheric Chemistry More specific subject area Organic aerosol; biomass burning; emission; source appointment Method name PMF ordinal analysis Name and reference of original method PMF (Paatero P, et al., 1994;5: 111-126.) Resource availability The ordinal analysis can be carried out in PMF solutions from any PMF programs, and the EPA PMF 5.0 used in the study is an open source program.

Methods
Positive Matrix Factorization (PMF) [6 , 7] is an analytical technique that has received widespread attention in the atmospheric science community [8][9][10][11] . Here we report an innovative ordinal analysis method to be applied to standard PMF analysis outputs. The input dataset for the PMF analysis was measured by a high-resolution time-of-flight chemical ionization mass spectrometer (HR-ToF-CIMS) [3] equipped with a filter inlet for gases and aerosols (FIGAERO) [4] . The FIGAERO inlet was switched between gas and particle phase measurements with 45 min intervals. The PMF analysis was conducted with the open source program EPA PMF 5.0 [5] . The data matrix includes the peak integrals over a ± 0.5 span around each integer m/z value between m/z = 140 and m/z = 339. The CIMS data were averaged to a time resolution of 1 min.
Normalization was applied to the intensities in each nominal mass, and such process is motivated by the following aspects: (1) the normalization allows for equal comparison of all investigated signals; (2) units are eliminated, which allows for gas (time series) and particle phase (time series of thermograms) data to be analyzed by PMF simultaneously; (3) normalized data can be processed more efficiently compared to the raw data, as the latter one has huge variations in signal magnitudes. The analytical uncertainties, which dominated the overall errors, are proportional to the signal intensities [1] . Here we used a typical error, ~5% of the signal intensities, based on laboratory calibration. An example input dataset is shown in Table 1 .
After the PMF runs, the outputs include Factor Profiles (FPs), Factor Contributions, Residuals and Run Comparisons. The ordinal analysis only involves the FP results. An example of the FP results is shown as Table 2 , where the fraction of each factor in each m/z is listed in the matrix. Fig. 1 displays an example FP matrix in a color map. The CIMS data was used in the format of ± 0.5 span around each integer m/z value, which was at a low resolution. One advantage of the PMF is that the solution can indicates how much factors are influencing the nominal masses (Factor Fraction, Table 2 & Fig. 1 ), which related to the multiple components one would deconvolute from the high-rev MS spectra. Thus, this study shows that even without looking into the high-rev data, the PMF can reasonably reflect the composition of the low-rev mass peaks.
The FP fractions directly reflect the significance of each m/z in each factor, i.e. how strongly they influence the factors. Therefore, for every factor, their most influential units can be identified by ranking the m/z by the fractions. An example of ranked m/z is shown in Table 3 .   The highest ranked m/z well represent the most typical species in each factor, and in-depth investigations of the high-resolution mass spectra could further identify the molecular information. Details can be found in a recent paper [2] , and as an example Table 4 shows representative molecules for each factor identified by the ordinal analysis. Note that 2 of the 7 factors (Factor Lignin 1 and Factor Lignin 2, as shown in Fig. 1 ) were combined as 1 factor (Factor Lignin). The   The ranking results can be visualized in rank maps. Fig. 2 shows a series of example rank maps of 4 factors, where the ranks are plotted against the m/z. Note that a high ranking results in a low position in the figures.
An important advantage of the rank maps is that the factor profiles are converted into a different space (rank versus m/z) that can be used as a 2D plot for cluster analysis. To show the advantage of the ordinal analysis, Fig. 4 shows the raw factor profiles of 4-factor and 8-factor solutions without ordinal analysis, from which it is difficult to recognize the similar factors. Moreover, in some occasions the rank maps can assist factor identification when it is difficult to recognize factors from their time series (Factor Contributions, which is one of the PMF outputs). Fig. 3 shows an example case, where some factors disappeared when increasing the number of factors (the green factor disappeared when changing from a 5-factor to a 6-factor solution), but when further increasing the number of factors the missing factor reappeared again (the green factor reappeared in the 8-factor solution). To summarize, the ordinal analysis and the rank maps provide additional ways to identify factors and fingerprint species, which has a good potential for automatic data analysis approaches related to PMF methods. One important application of this method is online analysis and source appointments of complex atmospheric pollutants, which orients quick responses during air quality monitoring.