PTRwid : A new widget tool for processing PTR-TOF-MS data

PTRwid is a fast and user friendly tool that has been developed to process data from proton-transfer-reaction time-of-flight mass spectrometers (PTR-TOF-MS) that use HTOF (high-resolution time-of-flight) mass spectrometers from Tofwerk AG (Switzerland). PTRwid is designed for a comprehensive evaluation of whole laboratory or field-based studies. All processing runs autonomously, and entire laboratory or field campaigns can, in principle, be processed with a few mouse clicks. Unique features of PTRwid include (i) an autonomous and accurate mass scale calibration, (ii) the computation of a “unified mass list” that – in addition to a uniform data structure – provides a robust method to determine the precision of attributed peak masses, and (iii) fast data analysis due to well considered choices in data processing.


Introduction
The development of PTR-TOF-MS (Jordan et al., 2009;Graus et al., 2010) has leaped forward this technology mainly through two key features: (i) the high mass resolution allows distinguishing between ions with a fractional mass difference such as, for example, isoprene and furan that are both detected on the same integer mass (69 Da), and (ii) the recording of full mass spectra is an inherent feature of time of flight (TOF) mass spectrometers.While the PTR-TOF-MS community is readily exploiting the former feature (e.g., Liu et al., 2013;Veres et al., 2013), there are only a handful publications that evaluate full mass spectra (e.g., Brilli et al., 2014, Park et al., 2013a).One reason for this is that a full evaluation of TOF mass spectra is still challenging and currently only one community based evaluation tool is available (Müller et al., 2013; https://sites.google.com/site/ptrtof/).
The data processing package that is introduced here pursues different (and unconventional) approaches in several aspects.For example, the mass scale calibration does not rely on peaks of known ion masses and subsequent statistical analysis of ion counts as in Müller et al. (2013), but on maximizing the matches with a library of ion masses.The underlying idea is that C x H y O 0-5 compounds are always abundantly detected either as constituents in the sample or due to contamination in an experimental setup.Another example is the method to determine the peak shape, which is not retrieved from selected peaks that can be entirely attributed to ions of the same composition (Müller et al., 2013), but from all detected ion peaks -disregarding the fact that many peaks are distorted due to signal from ions with different composition.The underlying idea is that superimposing all detected peaks will envelop the true peak shape.The third and last example worth mentioning here is the determination of the precision of detected peak masses, which is not based on statistics and propagation of errors from calibration peaks (Müller et al., 2013), but on a top-down evaluation of typically hundreds of peak lists obtained from entire field or lab campaigns.Besides all statistical uncertainties, these precisions also include uncertainties associated with the contribution of ions of different compositions to a detected peak (unresolved ions).These three examples constitute essential parts of the backbone of PTRwid that allows autonomous operation with a minimum of user input (e.g., no calibration peaks need to be defined, and no peaks to retrieve the peak shape need to be identified).
In order to reduce the computational cost, a large fraction of the required analysis -most importantly peak detection and mass scale calibration -is done on the SumSpectrum, which is the total of all mass spectra saved in a HDF5 data file.This approach works best when new data files are created every 2 h or more frequently, including high frequency data Published by Copernicus Publications on behalf of the European Geosciences Union.R. Holzinger: PTRwid: A new widget-tool for processing PTR-TOF-MS data appropriate for eddy covariance analysis (Park et al., 2013b).Using the SumSpectrum for basic analysis largely increases computing speed, but it is not possible to capture drifts in these parameters that might have occurred during the period that is covered by the data in the file.However, under typical operating conditions such drifts should be minor on time scales of 1-2 h.The core of PTRwid performs basic data processing such as peak detection, mass scale calibration, baseline correction, determination of peak shape and mass resolution, signal corrections for ion peaks that are not fully separated due to the limited mass resolving power of the mass spectrometer, and the computation of raw signals (counts per second) and volume mixing ratios (nmol mol −1 ) for all detected peaks.In the current version two "extended processing" tools for general use are available: (i) a tool for averaging, categorizing, and merging data, and (ii) a tool for attributing possible chemical formulas to detected ion masses.In future releases, dedicated analysis tools can be integrated as additional extended processing tools.
PTRwid has been designed for an efficient and consistent analysis of full TOF mass spectra obtained over time periods that range from single lab measurements (> hour) to field campaigns (weeks) and long term monitoring (year).The processing speed of a full mass spectra analysis is comparable to the processing time required to extract and process selected ions.PTRwid is programmed in IDL (Interactive Data Language) and also runs on the free IDL Virtual Machine.The code is based on routines that have been used in our earlier publications (e.g., Holzinger et al., 2010aHolzinger et al., , 2013;;Park et al., 2013b), but has now been cast into a user-friendly graphical interface and is ready to be shared with a broader community.The full source code is available at http://www.imau.nl/PTRwid/.The following sections describe the algorithms and evaluate their performance.However a second aim is familiarizing new users with the PTRwid framework so that the brief information in the Appendices is sufficient to successfully start using PTRwid.

The key features of PTRwid
During the development of PTRwid, maximizing autonomous processing and minimizing required user input was emphasized.The organization of the data processing framework is modular: different tasks are typically performed by dedicated procedures.The modular design facilitates adjustments and extensions, e.g., the development and assimilation of new add-on tools that perform dedicated tasks.

Basic data processing
The basic data processing is controlled by the PTRwid main widget window (Fig. B1).Brief start instructions, organiza-tion of data, and a brief description of interactive elements of PTRwid are provided in Appendices A and B, respectively.

Peak detection
The DetectPeaks routine applies Savitzky-Golay smoothing filters (Vetterling et al., 1992) on the SumSpectrum and scans through the smoothed SumSpectrum and its derivative (referred to as Spec and dSpec hereafter), respectively.Depending on the settings of the TOF mass spectrometer, the Sum-Spectrum contains several 10 5 data points which correspond to the signal detected at the TOF bins (typically 10 −10 s per TOF bin).The following strategy is applied to identify peaks.
1.A potential start of a peak is detected at a TOF bin where dSpec exceeds 6 times the median of the absolute value of dSpec in the neighboring range (±1 Da).This TOF bin marks the point where the signal emerges from the noise and is labeled as "PeakStart".
2. The corresponding TOF bins of the peak maximum (dSpec = 0), and the position at which the slope becomes positive again after the peak maximum (dSpec > 0), are identified and labeled "PeakMax" and "PeakEnd", respectively 1 .
After scanning through dSpec, the TOF-bin scale is converted to an ion-mass scale (see next Sect.2.1.2).Using the ion masses of "PeakStart", "PeakMax" and "PeakEnd" the "peak broadness" is calculated, i.e., m[PeakMax] / (m[PeakEnd] − m[PeakStart]).Peaks of which "peak broadness" is outside the range 20-10 000 are dismissed.As a second criterion the signal at "PeakMax" must exceed the background by 8 times the variability of the background.Here, the background in the vicinity of the peak is calculated as the median signal around "PeakMax" in the range ±0.5 Da.The variability of the background is calculated as the standard deviation of all values in the vicinity of "PeakMax" that are below the background.Figure 1 shows a small section of Spec and dSpec (TOF bins 100294-100644) that corresponds to the mass range 142.6-143.6Da.According to the procedure described above, five peaks were detected in this section.The range "PeakStart" to "PeakEnd" is indicated by horizontal blue lines below and above the peak in the upper chart, and the ticks on these lines mark the position of "PeakMax".

Mass scale calibration
The conversion of the TOF-bin scale to an ion-mass scale is done by three different methods, the quality of which is assessed in Fig. 2. All methods convert the scales according  002, 143.036, 143.071, 143.105, and 143.142, respectively.The range "PeakStart" to "PeakEnd" is indicated by two horizontal blue lines above the x axis and in the upper chart area, respectively.The blue ticks on these lines mark the position of "PeakMax".The vertical red lines in the upper chart indicate the integration boundaries (Sect.2.1.6.) that correspond to a range of ±2σ of the quasi-Gaussian peak shape (Sect.2.1.4.).The black and red ticks above the selected peak in the upper chart correspond to ion masses of the closest matches in the compound library (see masslib function in Appendix C).The largest peak at 143.071 has been matched to C 7 H 10 O 3 H + ; other compounds in the vicinity are plotted in the chart area together with their deviation from the detected peak.
to Eq. (1), where m and t TOF represent the ion mass and TOF bin, respectively, and a, t 0 , and ex are parameters that can be optimized.Since the ions enter the TOF region with the same kinetic energy, t TOF should be proportional to the square root of m.Thus ex in Eq. ( 1) should be equal to 0.5.However, in practice better results are obtained if small deviations from 0.5 are allowed for this parameter.
The first method (CalCrude, Appendix C) builds on the assumption that primary ions are among the 16 largest signals in the mass spectrum.An example spectrum is plotted in the top chart of Fig. 2. The 16 largest peaks are marked by red ticks.For this method the parameter ex is kept constant at 0.5.Parameters a and t 0 are calculated by combining any two of the largest 16 peaks and assuming that these peaks correspond to a pair of primary ions.For the standard operation mode based on proton transfer from H 3 O + the pair of primary ions is H In order to find the right set of a and t 0 (and to further refine it), all these sets of parameters are used to convert the TOF-bin scale to a mass scale.Subsequently it is checked if the scale conversion produces peaks that can be attributed to a second pair of ions -isotope peaks of the primary ion signal or other omnipresent ions, which do not suffer from detector saturation and should thus yield a much improved conversion.For the standard proton transfer mode H 18 3 O + (21.022Da) and C 3 H 6 OH + (59.049Da) are used as second pair, and for operation in the charge transfer mode 15 NO + and N 18 OO + , H 18 3 O + and 15 NO + , 18 OO + and N 18 OO + , and H 18 3 O + and 18 OO + are used.If positively identified, the parameters a and t 0 are re-calculated based on a second pair, and all detected peaks are matched with a compound library (∼ 2400 compounds, see masslib function in appendix C).The correct set of parameters yield the most matches with the library.Note that this procedure does not require any user input and nevertheless provides a robust method that allows (i) a reasonable time to mass conversion, and (ii) auto- detection of the operation mode (proton transfer or charge transfer).The second chart in Fig. 2 shows the result of comparing the obtained peak masses with library compounds.The x axis represents the mass attributed to all detected peaks, and the y axis shows the deviation of the attributed mass to the closest match in the library.The points plotted in red are considered to be matched as their smallest deviation from a library compound is less than 20 ppm.As can be seen clearly in Fig. 2, for masses above 80 Da the deviation starts drifting towards negative values, thereby indicating the limitations of this method that relies on a constant ex parameter and two ions with relatively low masses.
The second method (CalFine, Appendix C) uses the parameters a, t 0 , and ex from the first method and performs a variation of constants.The parameters are optimized by maximizing the number of matches with library compounds.The improved parameters resulting from this procedure (third chart of Fig. 2) no longer show the obvious drift as in chart two of Fig. 2. The number of matches with library compounds increased from 88 to 252.
The third method (Cal3pt, Appendix C) allows the user to specify ion masses (see Appendix D for a list with adjustable parameters) in the low, middle, and high region of the mass scale, respectively.In a first step the parameters a, t 0 , and ex are calculated by using the specified ion masses and the corresponding "PeakMax" values.In a second step the specified ion masses are allowed to vary within given boundaries and the number of matches with library compounds is maximized by performing a variation of constants.The improved parameters resulting from this procedure further increased the number of matches with library compounds to 255 (bottom chart of Fig. 2).
These three methods yield three sets of parameters a, t 0 , and ex.The set that yields most matches with library compounds is chosen for all subsequent analysis.

Baseline signal
The baseline signal consists of electronic noise that causes signal in the absence of ions.Basically the baseline can be considered the signal between regions where ions are expected to be detected, i.e, the regions between integer masses.This signal has to be subtracted in order to determine the correct peak shape (Sect.2.1.4)and to calculate the signal that is attributed to an ion (Sect.2.1.6). Figure 3 show the baseline of a SumSpectrum.The baseline varies over several orders of magnitude and remains enhanced for several 10 −7 s after ion peaks of high intensity.
The baseline is computed according to the following procedure.First, the SumSpectrum is divided into segments of 90 ns.In each segment the following two steps are repeated seven times: (1) the position with the highest signal is identified and (2) 9 nanoseconds of data around this position are symmetrically removed (unless asymmetry is forced by the edges of the 90 ns segment).These steps remove 70 % of the original data in each segment.The baseline values of the segments are calculated as the mean of the remaining 30 % of the data.In Fig. 3 the baseline values of individual segments can be easily seen at TOF-bin numbers in the range 3.5-5.5 ×10 4 (blue data row).If the baseline value for an individual segment is calculated to be zero, it is replaced by the minimum of all non-zero segment values.The baseline is calculated by smoothing over the individual segment values (red data in Fig. 3).
Calculating the baseline signal on basis of the SumSpectrum can cause high and low biases when high count rates in the SumSpectrum are not caused by a constant high signal at a certain mass but by fluctuating signals ranging from low to very high.During the periods with low count rates the baseline is calculated too high, and during the episode with high count rates the baseline is calculated too low.These artifacts may even result in negative values for the computed volume mixing ratio; however, under most circumstances these artifacts are very minor and hardly noticeable.On the other hand, baseline calculations that rely on single measurements (or shorter periods) can become inaccurate due to poor counting statistics.

Peak shape and resolution
Peak shape analysis is needed for three fundamental reasons: (i) to determine the mass resolution, (ii) to set correct boundaries for peak integration (Sect.2.1.6),(iii) and for corrections of overlapping peaks (Sect.2.1.6).
In order to determine the peak shape, we consider all peaks with a maximum signal (in counts per TOF bin) in the following range: (a) a predefined minimum (the default value is 800 counts, see Appendix D), and (b) a maximum which is the larger of either 10 times the minimum signal or 1 % of the maximum signal of the entire SumSpectrum.The lower boundary is set to avoid distortion of the peak shape due to poor counting statistics, and the upper boundary is set to avoid distortion by the largest signals that cause saturation in the counting electronics of the mass spectrometer.
The peak shape analysis starts by estimating the mass resolution (defined as full width at half maximum, FWHM) as mean of all calculated "peak broadness" values (Sect.2.1.1).In the following the baseline signal is subtracted and all individual peaks are rescaled (relative to FWHM) and normalized.The left hand chart of Fig. 4 shows the relative (normalized) intensity of all considered individual peaks (137) in the range −4 to 4 times FWHM.Not surprisingly, for a number of peaks, the signal does not decrease before and/or after the peak maximum because overlapping peaks interfere.However, all individual peaks together envelop the true peak shape.Technically, the true peak shape is retrieved by calculating the 10 % quantile of all considered peaks and using a Savitzky-Golay filter to obtain a smoothed true peak shape.The right hand chart of Fig. 4 shows that there is little difference between the individual peak at 59.049 Da and the true peak shape (smoothed in blue, unsmoothed in red).In this example, the mass resolution has been calculated to be 3371 (FWHM).Assuming quasi-Gaussian peak shape the ratio of the mass resolution and the standard deviation (σ ) of the dis- = 2.35).Using this relationship, we calculated the fraction of the total signal that is within and outside the boundaries of ±2 and ±4σ (right hand chart of Fig. 4).Typically more than 90 % of the total signal is expected within the ±2σ range and these are the standard boundaries for peak integration (Sect.2.1.6).

Unified mass list
For extended lab and field studies, the problem arises that it is unlikely to retrieve exactly the same set of peaks for different measurement intervals.The main reason is trivial; that is, different samples yield different results.Other reasons include the possibility that the operating conditions of the PTR-TOF-MS instrument changed over time, and that -within the limits of precision -the same ion may be attributed to a slightly different mass.The "unified mass list" routine produces an uniform peak lists that is obtained from file peak lists, which in turn are calculated from the SumSpectra of individual HDF5 data files (Sect.2.1.1).The "unified mass list" routine sequentially calls all routines described above.However, if the maximum signal of a SumSpectrum does not exceed a certain threshold value (the default value is 10 6 counts, Appendix D), the SumSpectra of the subsequent files are added.The summing of SumSpectra is interrupted when the threshold value is exceeded, or the gap between the end of the last measurement in the current file and the start of the first measurement in the following file exceeds 10 min (default value, Appendix D).A preparatory step is the computation of a bin-mass scale with bin widths of 1 mDa for ion masses below and 8 ppm bins above 125 Da.Thus, above 125 Da the bin width increases steadily and corresponds to 1.6, 3.2, and 4.8 mDa for ion masses of 200, 400, and 600 Da, respectively.All individual file peak lists (one for each HDF5 raw data file) are subsequently loaded and the peak count value of a mass bin is increased by one for every peak that is detected within the mass range covered by the bin. Figure 5 shows a small section of the detections-per-mass bin data (DpB) corresponding to the range 142.92-143.22Da.The data are from a ∼ 2-week field campaign with our thermaldesorption proton-transfer-reaction mass spectrometer (TD-PTR-MS, Holzinger et al., 2010a) at a rural site in the Netherlands.More than 300 peak lists from individual files have been included each covering a period of 75 min.
In the following the DpB data, the smoothed DpB data (by a running mean of five points, DpB_sm), and its derivative (dDpB_sm) are evaluated in order to create the unified mass list.First, all masses where dDpB_sm crosses from positive to negative are added to the "unified mass list".Second, all peaks are removed that do not fulfil following two criteria: (i) DpB_sm must be larger than both, 0.55 and 5 % of the maximum value of DpB_sm; (ii) the minimum distance between two peaks must be at least six bins on the bin mass scale.This corresponds to 6 mDa or 48 ppm for ion masses below or above 125 Da, respectively.The precision of the mass attributed to the peak is calculated as σ (standard deviation) of a Gaussian fit including 11 data points (i.e., a range of at least 11 mDa or 88 ppm) around a detected peak.An example fit is shown in Fig. 5, where the precision of the peak at 143.070 Da has been calculated to be ±11 ppm.The unified mass list and associated parameters (such as precision and integration boundaries) are saved in the data directory together with a list of possible molecular formulas within the ±2σ boundaries for every peak.

Peak integration, applied corrections, calculation of mixing ratios
The export procedure is activated by clicking on the "Export" button, and implements (together with the associated subroutines CorrPoissDead, Integrate, Overlap, and Calcppb) peak integration, several corrections, and the computation of mixing ratios.The "file mass list" and the "unified mass list" and different formats to store data can be optionally chosen for the data export (Appendices C and D).First, the CorrPoissDead routine corrects for physical limitations of the ion detection system.The reason for these limitation are recovery times (dead times up to 2 × 10 −8 s) that follow an ion detection event, and the inability to discriminate between single and multiple ion detections in a data acquisition interval of typically 1 − 2 × 10 −10 s.These effects have been thoroughly studied and appropriate corrections have been developed and tested (Titzmann et al., 2010, Cappellin et al., 2011).PTRwid corrects for these issues using the formula for a combined Poisson and dead time correction developed by Titzmann et al. (2010).Since the effect on lower signals is very minor, the corrections are applied only for signals that exceed one detection per 1000 data acquisition periods (of typically 35-60 µs).The instrument specific extending and non-extending dead times can be adjusted (Appendix C), 1.3 and 15 ns are used as default values, respectively.
Second, the integrate routine calculates the raw signal for all ions included in the chosen mass list (unified or file mass list).The signal is calculated as the total signal within 2 standard deviations around the peak maximum.In case of overlapping integration boundaries, the boundaries are set at equal distance between the peak maxima (see upper chart of Fig. 1).Note that a fraction of the signal within the integration boundaries of a particular peak may originate from neighbor peaks, if these neighbor peaks are close enough.Furthermore, the fraction outside the ±2σ range is not accounted for in the raw signal data.Note that constant integration boundaries are used for all spectra in any particular R. Holzinger: PTRwid: A new widget-tool for processing PTR-TOF-MS data raw-data file.This is a consequence of the strategy to calculate the time to mass conversion parameters and the resolution per raw-data file (from the SumSpectrum).
Third, the overlap routine corrects for the abovementioned deviancies of the raw signal data.Mathematically, this represents the problem of solving a system of coupled linear equations for each measurement.Knowledge of peak shape and the distance of a close-by neighbor peak allows to calculate the fraction of the neighbor's total signal that is expected within the integration boundaries.Solving the system of linear equations results in a correct attribution of (i) signal (ion counts) from neighbors that are expected within the integration boundaries, and (ii) ion counts outside the integration boundaries that have not been accounted for by the integrate routine.The result of the integration and the overlap routines is a data set that contains the measured signal for every detected peak in units of counts per second (cps).Mixing ratios are subsequently calculated from this data set.
Fourth, the Calcppb routine calculates mixing ratios according to the method outlined in Holzinger et al. (2010b).Input parameter are temperature, pressure and the electrical potential across the PTR-TOF-MS reaction chamber (drift tube), and the electrical potential between the last ring of the drift tube and the entrance lens to the TOF region.The mass-dependent transmission efficiency of ions through the TOF mass spectrometer is the result of the increasing duty cycle for larger ion masses (Chernushevich et al., 2001).The transmission efficiency is by default corrected using the simple parameterization that has been suggested and tested by Cappellin et al. (2012).However, other correction curves can be easily implemented by adjusting the parameters of the 6th order polynomial that defines the mass-dependent transmission efficiency (see Appendices C and D).Aging of the front multi-channel plate (MCP, the device detecting ions in the TOF) or an insufficient electrical potential between the MCPs produces non-linear mass dependent discrimination that cannot be corrected (Müller et al., 2014).Therefore, a proper tuning and monitoring of TOF parameters during the measurements (according to the recommendations given in Müller et al., 2014) is essential to obtain correct mixing ratios.As default reaction rate constant for the reactions with both H 3 O + and H 2 O • H 3 O + a value of 3 × 10 −9 cm 3 s −1 is used.Note, that the reaction rate constant for organics with H 2 O • H 3 O + can be set to zero (Appendix C), if the reaction with the water-hydronium cluster should not be considered.The primary ion signal is not measured directly, but calculated from signals at isotope peaks using the natural abundance of stable oxygen isotopes: the H 3 O + signal is calculated from the signal measured at 21.022 (mostly H

Extended processing
Extended processing is activated by clicking on the "Extended Processing" button in PTRwid (Fig. B1).A pop-up menu shows the available tools for extended analysis -currently the "Average and Merge Data" tool, the "Attributing Formulas" tool, and the "Filter Sampling" tool.The first two tools are of general use and will be briefly described below.
The "Filter Sampling" tool is more specialized since it is dedicated to process aerosol filter data obtained with our offline-TD-PTR-MS method (Timkovsky et al., 2015) and will not be discussed here.In the future more add-on analysis tools can be integrated here.Note that, so far, all extended analysis tools use data exports that are based on the unified mass list (and cannot be used on data based on the file mass list).

The "Average and Merge Data" tool
The main purpose of this tool is categorizing data and reducing data size by averaging over pre-defined periods.This tool opens a new window (Fig. B2a) that (i) allows displaying data from multiple selected files, (ii) creating index files to categorize data, and (iii) export merged, averaged and categorized data.As an example, Fig. 6a shows a 7.5 h timeline of benzene mixing ratios in ambient air at the CESAR observatory (http://www.cesar-observatory.nl/)recorded with 5 s time resolution, which corresponds to six selected data files.Two periods of background measurement (ambient air cleaned through a Pt-catalyst operated at 350 • C) are visible around 1496.7 and 1496.95, and a plume with mixing ratios up to 1 nmol mol −1 was observed shortly after 1496.8.The time-period for averaging, and the index that separates background and ambient data can be set in a new window by clicking on the "Create New Index File" button (label 7 in Fig. B2).The index file name and time period of averaging (in seconds, i.e., 300 s in Fig. B2b) are set in the first row.In the following rows up to 10 conditions can be defined that -if met -assign index values to individual measurements.In the example shown in Figs.B2b and 6b only one condition is defined: if the data m/z = 79.053(benzene) is larger than 0.085 nmol mol −1 the index value is 100 (ambient air), otherwise the index value is 300 (zero air).Besides ion count data, all collected engineering data can be used to specify conditions.Note that false attributions due to statistical noise are suppressed by considering neighboring data points.Figure 6b shows that the above defined condition (in combination with suppressing noise attributions) is sufficient to separate between ambient and zero air.After clicking the "Export" button (Fig. B2a) all data files (specified by file numbers below the "Export" button) are loaded and individual measurements are averaged until (i) the time-period for averaging is reached, or (ii) the index value changes.An example for advanced categorization is shown in Fig. 7, which depicts an almost 4 h period of m/z 83.085 Da 2 More information at http://climate.envsci.rutgers.edu/SOAS/SOAS_White_Paper_final.pdf and http://wiki.envsci.rutgers.edu/index.php/Main_Pagehigh concentrations of organics were typically observed as the sampler was heated to the next temperature level (blue and red data in Fig. 7).The three-stage denuder sampled semivolatile and volatile organic compounds on three serially assembled denuders.After sampling the denuders were sequentially heated which was associated with high concentrations of organics in the N 2 carrier gas (green data in Fig. 7).
Besides defining (up to 10) conditions up to four index values can be manipulated in the "hardcopy section" (Fig. B2b).This feature has been used, for example, to separate the different desorption temperatures of the two, externally controlled aerosol samplers, of our TD-PTR-MS system during the SOAS campaign.The PTR-TOF-MS instrument recorded only if the PTR-TOF-MS was connected to aerosol sampler A or B, to which index values of 100 and 200 were assigned, respectively.In the hardcopy section we added 1 to the first 36 measurements (i.e., 3 min since a measurement was recorded every 5 s), 2 for the following 36 measurements and so on (see Fig. B3 and reddish and bluish index values in Fig. 7), whenever the original index value was 100 or 200, respectively.In this way the hardcopy section could be used to separate different desorption temperatures even though this parameter was not directly monitored by the PTR-TOF-MS.
The last part in the "New Index" window (Fig. B2b) allows to create a "sampling" index for up to three different inlets, which is useful for all in-situ PTR-TOF-MS applications that involve a sampling/pre-concentration step.The sampling index can take two values (10 or 20) which are typically associated with sampling of ambient or zero air, respectively.

The "Attribute Formulas" tool
The "Attribute Formulas" tool lists possible empirical formulas in the range ±150 ppm around a detected peak.Since the precision of the detected mass peaks is typically in the range 10-25 ppm, this list contains much more compounds than the list saved along with the unified mass list (Sect.2.1.6).Constraints are provided through the analysis of a data set, which is used to check for consistency with the natural abundance of the two stable carbon and nitrogen isotopes, i.e., 13 C / 12 C ≈ 0.011 and 15 N / 14 N ≈ 0.0037.Optionally, the selected data set can be reduced by providing index values, in which case the analysis is limited to the specified subset.
The tool loops through all possible formulas (referred to as candidates hereafter) and performs following tasks: (i) the stable isotope distribution of the candidate is calculated; (ii) the candidate's major isotopologues are matched with the unified mass list; (ii) in case a match could be found (in the range ±150 ppm) the ratio of the two data rows (corresponding to the major isotopologues) is calculated.In order to reduce the uncertainty due to low signal/high noise ratio, values are dismissed when the corresponding values of the data assigned to the candidate are below the 0.6 percentile.Of the remaining 40 % of the ratio values the median is used for further evaluations.A candidate is labeled as "dismissed" if (i) the median ratio suggests that the candidate accounts for less than 10 % of the measured signal, or (ii) if a more abundant isotopologue (at least by a factor of 3) could not be matched with the unified mass list (labeled "no parent peak" in Fig. B4).The tool returns a report that lists possible formulas, the fractional mass difference with the detected mass, the maximum fraction of the total signal that can be attributed to the compound (only if constrained by isotopologues), and a statistical summary (mean and median) of the data set (Fig. B4).

Performance
The performance is demonstrated by processing 54 (17-20 June 2013) of measurements with our PTR-TOF 8000 instrument (Ionicon Analytik GmbH, Austria) during the SOAS campaign.Throughout the campaign, ions were injected into the time of flight region of the mass spectrometer every 60 µs and the detector was operated at 10 GHz (0.1 ns time bins).A 5 s time resolution was obtained by internally totalling the signal of 83 333 initial mass spectra.So, throughout the 54 h period, more than 3.8 × 10 4 mass spectra (6 × 10 5 data points per spectra) were collected and processed.The data were stored in 43 HDF5 data files each containing 900 measurements or 75 min of data.
Processing was done on a laptop computer (purchased in 2013, Dell Latitude E6430) with an Intel (Core ™ i7-3720QM CPU @2.60 GHz, 4 GB RAM) processor and 32-bit operating system (Windows 7).The analysis based on the Sum-Spectrum of individual files (peak detection, mass scale calibration, baseline signal, peak shape, mass resolution, and unified mass list) was started by clicking the "unified mass list" button (Fig. B1) and completed in less than 15 min.The unified mass list contained 843 ion peaks in the range 16-820 Da.The extraction of these 843 ion signals from the raw data (including Poisson and dead time corrections, correction of overlapping peaks, and the computation of mixing ratios) was started by clicking the "Export" button (Fig. B1) and completed in 46 min.Note that the processing time of this step is largely independent of the number of ion peaks, since a large fraction of the processing time is consumed by opening the compressed HDF5 raw data.
The extended analysis tools require less processing power: averaging and merging according to the categories shown in Fig. 7 is completed in less than 2 min, and creating a report with the "Attribute Formulas" tool is a matter of ∼ 20 s.

Conclusions and outlook
PTRwid processes PTR-TOF-MS data and runs under IDL or the free IDL Virtual Machine.Much of the analysis is done on the SumSpectrum; which reduces the processing time significantly.The main innovative features are the autonomous mass scale calibration, and the computation of a uniform "unified mass list", which also provides a robust method to determine the precision of attributed ion masses.
The modular design allows for flexible adjustments and easy integration of extended processing tools that can be addon procedures dedicated to specialized tasks.An example of this is the "Filter" tool that allows efficient processing of data collected with our offline-TD-PTR-TOF-MS setup (Timkovsky et al., 2015).Several future extensions are already underway and anticipated for the near future.These include (i) a peak modeling tool that allows identifying low intensity peaks the signal of which is completely submerged by high intensity peaks, (ii) a tool to create plots such as van Krevelen diagrams and presentations of desorption thermograms or carbon oxidation state, which are useful for the interpretation of TD-PTR-MS data (e.g., Holzinger et al., 2013), and (iii) a tool for exploiting high frequency data sets, for example, to calculate fluxes of biogenic volatile organic compounds (BVOCs) according to Park et al. (2013b).and 7) in order to produce and test useful data categorizations (Figs.6b and 7) by creating an index file.The interactive elements are labeled with bold red numbers and briefly described in Appendix B. The "New Index File" widget window (b) allows creating complex data categories by using up to 10 conditions.Up to four index values can be manipulated in the hardcopy section, and up to three sampling indices can be created for in situ PTR-TOF-MS applications.
B2 "Average and Merge Data" tool widget windows (Fig. B2) A short description of all interactive elements of the main window (Fig. B2a) is given below.Additional information on the New Index window is provided in Fig. B3.Select data directory field (label 1 in Fig. B2a): a click with the right mouse button activates a context menu which allows to select a data directory with data exports (usually the "w_data" subdirectory of the HDF5 raw data directory).
File list field (label 2 in Fig. B2a): after the directory has been specified, all available data files are listed.Multiple files can be selected in the list.Available engineering data and the mass peaks of the unified mass list are displayed in the engineering data and the mass peaks fields, respectively (labels 3 and 4 in Fig. B2a).
Engineering data field (label 3 in Fig. B2a): individual engineering data can be selected and are plotted.
Mass peaks field (label 4 in Fig. B2a): individual mass peaks can be selected and are plotted.
Drop-down lists (label 5 in Fig. B2a): these four dropdown lists allow for the specification of the data for the parameters (pdrift, udrift, udx, Tdrift) which are needed to calculate the mixing ratios from ion counts (cps).Note that volume mixing ratios are re-calculated here based on averaged counts of primary and product ion signals.
Index file field (label 6 in Fig. B2a): a click with the right mouse button activates a context menu which allows to select an index file (*.ind).When an index file is selected the displayed data are categorized according to the index parameter.
New Index button (label 7 in Fig. B2a): this button opens a new window (Fig. B2b) in which a new index can be created and saved.
Export button (label 8 in Fig. B2a): averages and merges all data selected in the text fields below the button (label 9 in Fig. B2a) into one data file.

Figure B3
. The New Index window with the values that were used to categorize data from the SOAS campaign (as shown in Fig. 7).
The two text fields in the first row allow for the specification of an index name and the maximum length (in seconds) of junk data that are averaged.The get current values button reads the values of the selected index file (label 6 in Fig. B2a) into the respective fields.The three main sections are marked by red brackets: (1) up to 10 conditions can be used to categorize the PTR-TOF-MS data, (2) point wise addition of the provided values to specified index values, and (3) 6 conditions to define up to three sampling indices (two conditions, i.e., sample and blank, per sampling index).The different types of entry columns are marked by blue letters: (A) droplists that allow selecting engineering data and adding the corresponding index to column B; (B) engineering data indices or ion masses can be provided here and will be processed according the condition selected in column C; (C) droplists that allow selecting different rules to compare the selected data (B) with the provided target value (D); in column E index values are provided that are attributed if the condition is met; (F) specifies the index value to which sampling indices are associated or to which the values provided in column G are added.

Figure 1 .
Figure 1.A small section of Spec (upper chart, smoothed in blue, unsmoothed SumSpectrum in black) and dSpec (lower chart).The baseline signal (Sect.2.1.3)is plotted as red line in the upper chart.The peak search routine detected five peaks at 143.002, 143.036, 143.071, 143.105,and 143.142, respectively.The range "PeakStart" to "PeakEnd" is indicated by two horizontal blue lines above the x axis and in the upper chart area, respectively.The blue ticks on these lines mark the position of "PeakMax".The vertical red lines in the upper chart indicate the integration boundaries (Sect.2.1.6.) that correspond to a range of ±2σ of the quasi-Gaussian peak shape (Sect.2.1.4.).The black and red ticks above the selected peak in the upper chart correspond to ion masses of the closest matches in the compound library (see masslib function in Appendix C).The largest peak at 143.071 has been matched to C 7 H 10 O 3 H + ; other compounds in the vicinity are plotted in the chart area together with their deviation from the detected peak.

Figure 2 .
Figure 2. The top chart shows the SumSpectrum, the largest 16 peaks are marked in red.For better visibility the SumSpectrum has been truncated by setting high values equal to 10 % of the maximum value.Charts 2-4 demonstrate the quality of the three methods of mass scale calibration (see text for details): each point represents a detected peak (ion mass, Da, on the x axis).The peaks were compared with the compound library.The y axis shows deviation to the closest library match (in mDa).The quality of the scale conversion is assessed by the number of matches (red data) with library compounds.The mass scale calibration is the best for method 3 (chart 4).

Figure 3 .
Figure 3.An example SumSpectrum zoomed in to the baseline (grey data).The blue stepwise data show the baseline values of individual 90 ns segments (see text for details).The baseline is obtained by smoothing over these individual baseline values (red data).

Figure 4 .
Figure 4.The left hand chart shows 137 normalized and re-scaled individual peaks retrieved from an example mass spectrum.The individual peaks envelop the true peak shape (blue line) which is retrieved by smoothing the 0.1 percentile of the relative intensity of all 137 peaks (brown line, barely visible because superimposed by the blue line).The right hand chart shows the true peak shape (as in the left hand chart) together with an example peak detected at m/z 59.049.The calculated mass resolution (FWHM) and the signal fractions expected within and outside the ±2 and ±4σ boundaries are labeled in the plot area.92.4 % of the total signal is expected within the ±2σ boundary.A small shoulder is apparent at the high mass end, and therefore 1.7 % of the signal are detected outside the +4σ boundary.

Figure 5 .
Figure 5. Histogram of peak detections in 8 ppm mass bins in the range 142.92-143.22Da.More than 300 peak lists from individual files each covering a period of 75 min have been included; i.e., the data are from a ∼ 2-week field campaign.The precision of a peak at 143.070 has been determined as σ (standard deviation) of a Gaussian fit.The red markers are the 11 points that have been fitted, and the red line shows the Gaussian fit.The long red and blue vertical lines above the peak mark the ±1 and ±2σ boundaries, respectively.The short vertical lines mark the position of close matches with compounds in the library (also plotted in the chart area).Compounds plotted in red and blue correspond to compounds that are within ±1 and ±2σ , respectively.

Figure 6 .
Figure 6.A 7 h timeline of m/z = 79.053(benzene) collected at the CESAR observatory (http://www.cesar-observatory.nl/)during the ACTRIS winter campaign in Jan/Feb 2013.The time axis displays days since 1 January 2009; the displayed period (in ordinary date format) is labeled on the top.The upper chart (a) displays the raw data.In the lower chart, markers of different colors correspond to index values as defined in the "New Index File" widget window (Fig.B2b).The black lines correspond to the 300 s means and the error bars indicate the standard error due to counting statistics.

Figure 7 .
Figure 7. Mixing ratios detected at m/z 83.085 over 4 h period during the SOAS campaign in summer 2013.See text body for more details.The data have been categorized using the index parameters depicted in Fig. B3.The PTR-TOF-MS was connected to three inlets: aerosol sampler A (blue index values, 100-108), aerosol sampler B (red index values, 200-208), and a thermal desorption three-stage denuder inlet (green index values, 300-308) to sample semivolatile compounds in the gas phase.The sampling times for aerosol inlets A (blue), B (red), and the three-stage denuder (green) are indicated as horizontal lines below the zero line.The lower lines indicate sampling of background (i.e., particle free ambient air for the aerosol inlets and zero air for the gas phase inlet), and the upper lines indicate sampling of ambient air.The lower chart shows engineering data that show the status of different valves (i.e., 0 closed, 1 open) in the setup.The blue line, for example, is 1 when aerosol inlet B is analyzed.This data set can thus be used to label data associated with aerosol inlet B.

Figure B1 .
Figure B1.Main PTRwid control window.In the top-left text field the user selects a data directory.The creation times (i.e., start of measurement) of all HDF5 files in the data folder (including sub directories) are listed in the field labeled "file creation times".Individual files can be selected and first processing results are displayed in the session log on the right hand side of the window.The names of available engineering data and the detected mass peaks of the selected file are displayed in separate lists, respectively.The interactive elements are labeled with bold red numbers and briefly described in Appendix B.

Figure B2 .
Figure B2.The "Average and Merge Data" tool widget windows.The main window (a) allows selecting and displaying data (example in Figs.6 and 7) in order to produce and test useful data categorizations (Figs.6b and 7) by creating an index file.The interactive elements are labeled with bold red numbers and briefly described in Appendix B. The "New Index File" widget window (b) allows creating complex data categories by using up to 10 conditions.Up to four index values can be manipulated in the hardcopy section, and up to three sampling indices can be created for in situ PTR-TOF-MS applications.

Figure B4 .
Figure B4.The Attribute Formulas tool window and fractions of the report produced from a 3-day period during the SOAS campaign.The full report is also saved in a report file.

Table D1 .
Continued.ExportOne 0 Export If set to 1, only one data file is exported (existing files are overwritten) and jpg pictures depicting the entire data set (similar to Fig. D2 in Holzinger et al., 2010a) are saved for quality control purposes.Export Maximum number of values that are loaded as one junk from the HDF5 raw data file.Low numbers increase the processing time, but if the value is too high insufficient virtual memory will cause the program to crash.Export Unit is nanoseconds.More information in Titzmann et al. (2010) and Müller et al. (2013).