Software for peak finding and elemental composition assignment for glycosaminoglycan tandem mass spectra

Glycosaminoglycans (GAGs) covalently linked to proteoglycans (PGs) are characterized by repeating disaccharide units and variable sulfation patterns along the chain. GAG length and sulfation patterns impact disease etiology, cellular signaling, and structural support for cells. We and others have demonstrated the usefulness of tandem mass spectrometry (MS2) for assigning the structures of GAG saccharides; however, manual interpretation of tandem mass spectra is time-consuming, so computational methods must be employed. In the proteomics domain, the identification of monoisotopic peaks and charge states relies on algorithms that use averagine, or the average building block of the compound class being analyzed. Although these methods perform well for protein and peptide spectra, they perform poorly on GAG tandem mass spectra, because a single average building block does not characterize the variable sulfation of GAG disaccharide units. In addition, it is necessary to assign product ion isotope patterns to interpret the tandem mass spectra of GAG saccharides. To address these problems, we developed GAGfinder, the first tandem mass spectrum peak finding algorithm developed specifically for GAGs. We define peak finding as assigning experimental isotopic peaks directly to a given product ion composition, as opposed to deconvolution or peak picking, which are terms more accurately describing the existing methods previously mentioned. GAGfinder is a targeted, brute force approach to spectrum analysis that uses precursor composition information to generate all theoretical fragments. GAGfinder also performs peak isotope composition annotation, which is typically a subsequent step for averagine-based methods. Data are available via ProteomeXchange with identifier PXD009101.


Summary
Glycosaminoglycans (GAGs) covalently linked to proteoglycans (PGs) are characterized by repeating disaccharide units and variable sulfation patterns along the chain. GAG length and sulfation patterns impact disease etiology, cellular signaling, and structural support for cells. We and others have demonstrated the usefulness of tandem mass spectrometry (MS 2 ) for assigning the structures of GAG saccharides; however, manual interpretation of tandem mass spectra is time-consuming, so computational methods must be employed. In the proteomics domain, the identification of monoisotopic peaks and charge states relies on algorithms that use averagine, or the average building block of the compound class being analyzed. While these methods perform well for protein and peptide spectra, they perform poorly on GAG tandem mass spectra, due to the fact that a single average building block does not characterize the variable sulfation of GAG disaccharide units. In addition, it is necessary to assign product ion isotope patterns in order to interpret the tandem mass spectra of GAG saccharides. To address these problems, we developed GAGfinder, the first tandem mass spectrum peak finding algorithm developed specifically for GAGs. We define peak finding as assigning experimental isotopic peaks directly to a given product ion composition, as opposed to deconvolution or peak picking, which are terms more accurately describing the existing methods previously mentioned. GAGfinder is a targeted, brute force approach to spectrum analysis that utilizes precursor composition information to generate all theoretical fragments. GAGfinder also performs peak isotope composition annotation, which is typically a subsequent step for averagine-based methods. Data are available via ProteomeXchange with identifier PXD009101.

Introduction
Glycosaminoglycans (GAGs) exist either as the glycan portion of proteoglycans (PGs) or as extracellular matrix (ECM) polysaccharides. The three classes of sulfated GAGs, heparan sulfate (HS), chondroitin sulfate (CS), and keratan sulfate (KS), are characterized by their long, linear chain, a repeating disaccharide unit (specific to each GAG class), and variable patterns of sulfation and acetylation. Due to their locations on the cell surface and in the ECM, as well as their sequence variation, they interact with many growth factors and growth factor receptors and therefore modulate cellular signaling and signal transduction pathways [1][2]. Furthermore, spatial and temporal regulation of the structures of GAGs characterizes physiology and pathophysiology in eukaryotes. For instance, cancer cells remodel HS chains in their microenvironments to avoid immune system targeting and allow proliferation [3]. In the motor neuron-degenerative disease amyotrophic lateral sclerosis, KS sulfation has been shown to correlate with disease progression [4]. Indeed, GAG expression is required for embryonic development [5], and GAGs are required for the proper functioning of all mammalian biological systems [6].
Clearly, assigning GAG sequences from tandem mass spectral data is necessary to establish their roles in diverse disease mechanisms.
Tandem mass spectrometry (MS 2 ) entails isolating a precursor ion in the first stage, and dissociating it in subsequent stages. Manual interpretation of tandem mass spectra is tedious, time-consuming, and subjective. The first step of interpretation is to assign the m/z and charge states for product ions. Once this is done, neutral masses and isotope compositions can be assigned. Once these assignments are made, an algorithm can be used to identify the GAG sequence [7].
Wolff and colleagues first applied electron activated dissociation methods to GAG oligosaccharides, using both electron detachment dissociation (EDD) [8] and negative electron transfer dissociation (NETD) [9]. More recently, Huang and colleagues showed the effectiveness of electron activated dissociation for minimizing sulfate loss during HS mass spectrometry experiments [10]. Resulting tandem mass spectra after electron activated dissociation are extremely rich in that they contain a large number of product ions with varying charge states and isotope patterns. In the proteomics domain, several computational methods for automatic recognition of isotopic patterns and assignment of charge states and neutral mass values have been developed, including THRASH [11], Decon2LS [12], and MS-Deconv [13], among others. These methods assume product ion isotopic distributions will match the pattern produced by the molecule's average building block, or averagine; however, performance for GAG saccharide tandem mass spectra is inadequate, due to the variable levels of sulfation along their chains and the relatively abundant 34 S isotope. Figure 1 shows two examples of the large difference in the expected isotopic distributions of non-sulfated and fully sulfated GAG fragments. Plainly, there is no GAG averagine that would accurately recover the correct monoisotopic peak for each fragment, and that leads to incorrect and missing assignments. Averagine-based approaches also do not assign elemental compositions for monoisotopic ions, a step necessary for interpretation of GAG saccharide tandem mass spectra. We sought to solve these problems.

Suggested location for Figure 1
Previous work in GAG tandem mass spectra analysis and annotation has typically been a step in a further sequencing project. For instance, Yu and colleagues recently sequenced the dermatan sulfate (DS) chain of the pericellular PG decorin using a genetic algorithm based on known sulfate modification information from disaccharide analysis, but mentioned in-house data interpretation software in passing [14]. And two GAG sequencing efforts from Chiu and colleagues, GAG-ID [15] and a multivariate mixture model to estimate identification accuracy [16] represent recent attempts at automated GAG sequencing using a weighted hypergeometric distribution to match spectra to potential sequences. However, these papers both describe a method that only considers high intensity peaks, rather than full isotopic distributions, and their method requires an intense experimental workup for chemical derivatization that replaces sulfate groups with heavy isotope acetyl groups.
Averagine-based deisotoping and charge state deconvolution algorithms were developed to circumvent the combinatorial explosion of the number of possible protein sequences as the length of the chain increases. Due to this expansion, brute force methods searching all possible proteins and protein product ions are not feasible. While the number of possible GAGs also increases exponentially as a function of chain length, the rate of increase is much lower. Figure 2 shows the log10 of the number of possible structures of unmodified proteins, HS GAG saccharides, CS GAG saccharides, and KS GAG saccharides, as a function of the length of the chain. Notice how the slopes for each GAG class are much smaller than the slope for proteins, and consider how many more protein structures are possible when post-translational modifications are included. Given the reduced search space and the variable sulfation along GAG chains, we developed a brute force product ion search algorithm using the Python programming language, GAGfinder, for MS 2 of GAG saccharides of a given composition. GAGfinder iterates through every possible fragment of a GAG composition at multiple charge states and tests its theoretical isotopic distribution against the observed spectral pattern. GAGfinder is available for download at http://www.bumc.bu.edu/msr/software. This paper describes the steps in GAGfinder and its performance as a means to identify the GAG monoisotopic product ions, charge states, and neutral mass values versus an averagine-based peak finding algorithm.

GAGfinder overview
A flowchart of the steps GAGfinder can be viewed in Figure 3. The details of each step are described below. The term "product ion" will be used to refer to ions observed in tandem mass spectra. The term "fragment" will be used to refer to theoretical GAG saccharide substructures in a database. Figure 3 Inputs -There are a number of required and optional inputs for GAGfinder to return accurate results. The spectrum data must be in the mzML file format [17]; the raw data can be converted using any format conversion tool, such as MSConvert [18] or compassXport (Bruker Daltonics, Inc.). Other required inputs include the GAG class, the precursor m/z, the precursor charge, and the output format for the results. Either the top percentile or the top N results can be returned, but not both. Optional inputs include the reducing-end derivatization formula (if any), the adducted metal and the number of adducts (if there is metal adduction), the NETD cation reagent (if NETD), a user-specified internal precision for mapping fragments to isotopic distributions, a Boolean value for whether noise has already been removed from the spectrum, and the number of labile sulfate losses to consider. These inputs are arguments for the GAGfinder command line program.

Suggested location for
Step 1: Load mzML file and connect to GAG fragment database -The first step of GAGfinder is connecting to GAGfragDB, the database developed in SQLite for easy storing and retrieval of all possible fragments of a precursor composition up to hexadecamer. There are 4,150 unique compositions, 65,664 fragments, and 17,156,928 precursor-fragment mappings in GAGfragDB. The composition with the most possible fragments - (1,7,8,4,15) with a key of (dHexA, HexA, HexN, Ac, SO3) -has 21,299 child fragments associated with it in HS. GAGfragDB includes a controlled vocabulary designed to give each fragment a unique text identifier that does not assume anything about the structure of the precursor or the fragment. In other words, a fragment that has one composition but could be a terminal fragment or any number of internal fragments will have only one identifier. Supplemental Figure S1 shows the relational schema for GAGfragDB. The connection to GAGfragDB is established by the Python sqlite3 module.
After connecting to GAGfragDB, GAGfinder loads the mzML file into Python using the pymzML module [19]. The pymzML module has a number of spectrum processing methods, including centroiding peaks, finding peaks in the spectrum within a particular error tolerance, and a number of others.
Step 2: Normalize scan(s) and remove noise -Once the tandem mass spectral data have been loaded into Python, GAGfinder normalizes and averages the scans of the data file using the total ion current (TIC). GAGfinder first divides each scan in the file by the summed TIC intensity and then calculates the average over all scans. This step prevents any of the scans from biasing the results over the rest of the scans, and is performed using methods in the pymzML package. After normalizing the scans, GAGfinder removes noise from the spectrum, if the spectrum has not already been denoised by the user prior to runtime. GAGfinder uses an implementation of the noise reduction algorithm MasSPIKE [20].
Step 3: Determine precursor composition -Given the precursor m/z and charge, the neutral mass of the precursor can be calculated, and based on this and the GAG class, the precursor composition can be determined. GAGfinder considers metal adduction and reducing end derivatization information in order to calculate the neutral mass matching the composition in GAGfragDB. GAGfinder selects the composition with the neutral mass closest to the calculated precursor mass as the precursor composition.
Step 4: Determine reducing end and non-reducing end monosaccharides -In order to reduce the search space as much as possible, GAGfinder attempts to determine the monosaccharides at each precursor saccharide terminus. There are several cases in which this is possible, and Figure 4 shows the decision tree for determining this. First, if the non-reducing end is an unsaturated uronic acid (in the cases of CS and HS saccharides generated by polysaccharide lyase enzyme digestion), GAGfinder first assumes that the reducing end monosaccharide is a hexuronic acid if the precursor contains an odd number of monosaccharides, and a hexosamine if the precursor contains an even number of monosaccharides. If this is not the case, then GAGfinder checks whether there is an unequal number of the parts of the repeating disaccharide for the current GAG class. If the number is unequal, then whichever monosaccharide there is more of will be on both the non-reducing and reducing end. If the number is equal, then GAGfinder cannot assign the end fragments and must search through the entire search space.

Suggested location for Figure 4
Step 5: Retrieve and modify all theoretical fragments for the precursor -Next, GAGfinder retrieves every possible fragment for the current precursor from GAGfragDB. The possible fragments stored in GAGfragDB include glycosidic bond cleavages and all crossring cleavages except for those involving cleavage of adjacent bonds. Supplemental Figure S2 shows each cross-ring cleavage GAGfinder considers. GAGfragDB stores the theoretical fragments as neutral masses without considering sulfate losses or any other modification information, so GAGfinder must modify and search each fragment in order to maximize spectrum coverage. For each fragment, the modifications included are water loss (for glycosidic fragments only), hydrogen loss (up to 2), sulfate loss (up to the amount designated by the user), and reducing end derivatization (if any). This information is used to determine whether a given fragment corresponds to the reducing terminus. Product ions that have the same chemical composition are merged. For every combination of these modifications, the fragments are pushed through the algorithm.
Step 6: Score each theoretical fragment -Once all of the theoretical fragments have been retrieved and modified as need be, they are scored against the tandem mass spectrum.
GAGfinder considers charge states from -1 to that of the precursor ion plus one for each fragment. The decision to use the charge state of the precursor ion plus one for the upper bound rather than that of the precursor ion is due to two main reasons. First, the number of product ions with the same charge state as the precursor is a small percentage of all of the product ions, meaning including this charge state in GAGfinder's searching would find only a few more product ions while introducing more false positives. Second, many of the product ions with the same charge state as the precursor are actually derivatives of the precursor, meaning they provide no additional structural information. A theoretical relative isotopic distribution (TID) is calculated for each fragment using the BRAIN algorithm [21], which employs polynomial expansion and applies the Newton-Girard theorem and Viète's formulae to this end. Once the TID is calculated, GAGfinder searches the tandem mass spectrum for product ion peaks at the m/z values of the TID within either a user-specified error tolerance or the default error tolerance of 20 parts-per-million (ppm), storing them as the experimental isotopic distribution (EID). The EID is then divided by the sum of its intensities so that it is also a relative distribution. GAGfinder employs a G-test of goodness-of-fit to determine how similar the EID is to the TID. Equation 1 shows the expression for the G score, where i is the index of each peak in the matched isotopic distributions. According to the G-test, the G score follows a chi-squared distribution under the null hypothesis that the EID has the same distribution as the TID, and so can be used to compute p-values. This way, a lower G score yields a higher pvalue and thus represents a better fit. (Eq.

1)
Step 7: Rank product ions by G score and return top hits -Once all theoretical fragments have been scored for goodness-of-fit, they are ranked by increasing G score. Depending

Data acquisition and preprocessing
We chose ten synthetic GAG standards to demonstrate the effectiveness of GAGfinder ( Figure 5). These standards were chosen due to their range of modification distribution and precursor charges. Compounds 1 and 10 were synthesized as described [22]. Details regarding the tandem mass spectrometric acquisition methods can be found in Hu, et al. [7]. Raw data files were converted to mzML format for input into GAGfinder by either MSConvert GUI version 3.0.5084 [13] or compassXport command line utility 3.0.13 (Bruker Daltonics, Inc.). The mass spectrometry glycomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [24] partner repository with the dataset identifier PXD009101.

Suggested location for Figure 5
We first sought to demonstrate the ability of GAGfinder to identify product ion isotope clusters and charge states. To do this, we generated a list of product ions using a traditional averagine-based method (the SNAP peak finder in Bruker DataAnalysis 4.2) versus that for GAGfinder. In order to retrieve every product ion SNAP identified, we set the quality factor threshold at 0, the signal-to-noise ratio (S/N) threshold at 1, the relative intensity threshold (base peak) at 0%, and the absolute intensity threshold at 0. For each GAG saccharide tested, we set the maximum charge state to the absolute value of the precursor charge state minus one, so that SNAP would behave comparatively to GAGfinder. We set the repetitive building block to C6H11.375N1.125O9.5S1.5, as used in previous methods [25]. SNAP returned a matrix with columns for m/z, charge, intensity, resolving power, and quality factor.

Method comparison
In order to judge GAGfinder's performance in assigning tandem mass spectral monoisotopic product ions and charge states, we employed two separate statistical methods. Each method required unbiased expert manual selection of monoisotopic product ion peaks to serve as the set of true positives. In both methods we had GAGfinder return scores for 100% of the tested theoretical fragments in order to ensure maximum spectral coverage. The first method compared the GAGfinder performance against that of a random selection of monoisotopic product ions. The second compared GAGfinder's performance to that of an averagine-based peak finding algorithm.

The first method for judging GAGfinder's performance was a permutation test that gauged
GAGfinder's performance in selecting true positive product ion peaks compared against random selection of product ion peaks. First, we calculated a performance score (PerfScore) for the GAGfinder results using the equation (Eq.

2)
where j is the index of the current product ion, Gj is the G score for fragment j, and Hit j  1, if product ion j is a "real" hit 0, if product ion j is not a "real" hit (Eq.

3)
Once we calculated the performance score for the GAGfinder results, we permuted the Hit vector 10,000 times and recalculated the performance score for each permutation.
Since G scores are smaller for better fits, a smaller performance score represents a better performance. The performance scores of the 10,000 permutations represent a background distribution for performance against which we compared the GAGfinder performance score. We plotted GAGfinder's performance score against the background distribution and recorded its rank among all of the permuted performance scores.
PerfScore  G j Hit j j The second method for testing GAGfinder's performance was a binary classifier evaluation that compared the GAGfinder performance versus that of an averagine based algorithm, SNAP. Precision-recall (P-R) curves show how the classifier's precision and recall change as the classifier's threshold is changed, and the area under the curve (AUC) represents the classifier's performance. Precision is defined as

4)
where TP stands for true positives and FP stands for false positives, and recall, also known as sensitivity, is defined as (Eq.

5)
where TP stands for true positives and FN stands for false negatives. A perfect classifier has a precision and a recall of 1, and therefore, the closer the P-R AUC is to 1, the better the classifier has performed. The complete results for GAGfinder were generated by requesting 100% of the product ions tested.
For GAGfinder results, we generated the vector of precision and recall values by ordering the results by G-score in ascending order and calculating the precision and recall of GAGfinder at each G-score threshold. Similarly, for SNAP, we ordered the results by quality factor in descending order and calculated its precision and recall at each quality factor threshold. Because the number of true positive peaks is limited to those fragment masses extracted from GAGfragDB, GAGfinder identifies fewer monoisotopic peaks and charge states than does an averagine-based algorithm. In order to compare the effectiveness of the peaks assigned in common by both algorithms, we removed peaks that were not searched by GAGfinder. These peaks were likely due to fragmentation or chemistry that GAGfinder does not consider.

GAGfinder performance compared to random sampling
For each of the ten GAG saccharide tandem mass spectra tested, the GAGfinder performance score significantly outperformed that of the permutations. Table 1 compares GAGfinder's performance score versus the mean and standard deviation of the 10,000 random permutations for each saccharide; the distribution plots for each saccharide can be seen in Supplemental Figure S3. For every compound, the PerfScore for GAGfinder was in the top ten lowest scores, and for seven of the compounds, the PerfScore for GAGfinder was lower than every permutation's PerfScore. This indicated that GAGfinder produced a better performance than a random selection. Furthermore, the GAGfinder PerfScore was at least three standard deviations lower than the average of the permutations for every saccharide, signifying significant outperformance compared to a random selection of peaks. There was no correlation between the PerfScores for GAGfinder and the means and standard deviations of the permutations, the dissociation method, or the precursor charge state. This indicated a lack of bias for the GAGfinder algorithm. We concluded based on these numbers that GAGfinder significantly outperforms a random selection of peaks. Table 1 GAGfinder performance compared to averagine-based peak finding Table 2 compares the P-R curve AUCs for each spectrum for GAGfinder versus SNAP, including summary statistics. The P-R curves for each spectrum are shown in Supplemental Figure S4. The average GAGfinder P-R AUC is higher than the average SNAP P-R AUC, while the median GAGfinder P-R AUC is almost equal to the median SNAP P-R AUC. For seven of the ten spectra, GAGfinder has a higher P-R AUC. Of these seven, four were generated by NETD, while the other three were generated by EDD, and there is no correlation between charge state and performance difference between GAGfinder and SNAP, indicating a lack of bias in the performance of each. These numbers show that GAGfinder identifies monoisotopic peaks and charge states with similar accuracy as does the averagine-based SNAP algorithm. We note again that GAGfinder assigns the elemental compositions for all identified monoisotopic peaks. Table 2 Runtime numbers for GAGfinder GAGfinder tracks and reports the length of runtime for each analysis. The amount of time required for GAGfinder to search each fragment varies based on a variety of factors, but the two that affect runtime the most are the number of possible fragments and whether or not the noise was removed prior to analysis. Table 3 shows GAGfinder's runtime for each saccharide, as well as the total number of fragments for that composition and charge state combination and whether or not the data was pre-processed. As can be seen, analyzing a spectrum without noise removed greatly increases the runtime. This is not due to GAGfinder's noise removal step taking an inordinate amount of time, but rather due to the larger number of data points to average across scans. For instance, samples 1-5 did not have noise removed prior to analysis, leaving that step for GAGfinder, which slowed down runtime. However, samples 6-10 did have noise removed prior to analysis, and their faster runtime shows it. Table 3 Discussion

Suggested location for
Here we have presented GAGfinder, the first GAG-specific isotopic distribution finding software for high resolution tandem mass spectra. GAGfinder uses a targeted, brute force approach to search observed product ions against a set of theoretical fragments calculated based on the precursor ion exact mass, composition based on GAG biosynthesis rules, and expected NETD and EDD tandem mass spectrometry dissociation patterns. The software is easy to use on any operating system and outperforms traditional peak finding software that was designed for peptide fragments. For this manuscript, GAGfinder was run as a command line utility on a MacBook Pro, and all tandem mass spectrometric data are available on the PRIDE Proteomics IDEntifications archive. While the software is currently only available in command line form, a web application and interface is currently under development and will be available soon.
We tested GAGfinder on the EDD and NETD spectra of a diverse set of synthetic GAGs and showed that it accurately and consistently returns valid fragments for the precursor being tested. GAGfinder consistently scored true positive fragments better than false fragments across all tested GAGs, and performed comparably to traditional peak finding methods. Unlike traditional peak finding methods, GAGfinder assigns elemental compositions to the monoisotopic product ions that are essential for assigning the saccharide structure. While we tested GAGfinder exclusively on high resolution spectra in the negative ion mode, the software was designed in principle to handle any resolution level in either the negative or positive ion mode. For low resolution spectra, we hypothesize that the G-scores for assigned monoisotopic product ions will be worse than with high resolution data; however, this is due to the whole distribution of G-scores shifting, and we anticipate that the correct IDs will still be found at or near the top of the ranked list of G-scores.
While GAGfinder succeeds at identifying product ions that fall within the set defined in the GAGfragDB, it does not identify product ions that arise from undefined dissociation processes. Such undefined processes include rare dissociation patterns, a charge state equal to or higher in absolute value than that of the precursor, and random instrument noise. In these cases, traditional methods will have a greater likelihood of identifying m/z values and charge states but will not identify the elemental composition. Furthermore, these ions are not actually useful for GAG structure determination, which is the ultimate goal of GAG sequencing. While it is possible to add rare dissociation processes to the GAGfragDB, this would increase search space size at the expense of algorithm run time.
An interesting case where GAGfinder outperforms the traditional peak finding method SNAP arises when the fragment composition substantially differs from that of the averagine used. As shown in Figure 1, selecting an appropriate averagine that fits all GAG fragments is difficult due to the variable number of sulfur atoms in the fragments.
Compound #9 contains a heavily sulfated reducing end, with three sulfate groups on one GlcNAc. While GAGfinder finds the Y1-S and Y1 ions for this compound and scores them in the top ten, SNAP is unable to find them. Figure S5 shows the annotated spectra, using the top 20 (or so) most intense fragments for each saccharide, and figure S6 shows the portion of the spectrum containing these fragments. In both cases, there are other isotopic distributions interspersed, but none of these precisely overlap with their peaks.
Wolff and colleagues first showed how metal cationization can help curb sulfate loss in EDD [26], an approach that has gained popularity in the years since. While our group typically avoids metal adduction during GAG analysis due to the negative effects on the instrument and the extra work up, we nonetheless designed GAGfinder to be able to handle samples that have been cationized. In GAGfinder, cationization adds to the search space, and therefore the runtime, without necessarily improving peak finding performance, reduced sulfate loss aside. While metal cationization can help remove the ambiguity of tandem mass spectra, allowing for easier GAG sequencing, its utility is seen mostly in that step of the sequencing pipeline. GAGfinder is only looking for fragments and isotopic distributions of given compositions, regardless of whether there is metal cationization or not, and therefore, metal cationization should not affect GAGfinder's peak finding performance.
In conclusion, use of GAGfinder will allow researchers to swiftly and accurately assign elemental compositions and product ion types to product ions in GAG saccharide tandem mass spectra. While GAGfinder was tested exclusively on pure, synthetic compounds, we are evaluating its ability to assign product ion m/z, charge state, and elemental composition for biological samples. Finally, we demonstrate that the use of a brute force method for peak finding balances search space size and overall analysis time compared to traditional methods.       Tables   Table 1. Performance scores for GAGfinder compared to the mean and standard deviation of the 10,000 permutations for each of the ten synthetic compounds. In each case, GAGfinder's PerfScore was lower than at least 99.9% of the permutations', indicating a better performance.