Exploring the Relationship between Polymer Surface Chemistry and Bacterial Attachment Using ToF‐SIMS and Self‐Organizing maps

Biofilm formation is a major cause of hospital‐acquired infections. Research into biofilm‐resistant materials is therefore critical to reduce the frequency of these events. Polymer microarrays offer a high‐throughput approach to enable the efficient discovery of novel biofilm‐resistant polymers. Herein, bacterial attachment and surface chemistry are studied for a polymer microarray to improve the understanding of Pseudomonas aeruginosa biofilm formation on a diverse set of polymeric surfaces. The relationships between time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) data and biofilm formation are analyzed using linear multivariate analysis (partial least squares [PLS] regression) and a nonlinear self‐organizing map (SOM). The SOM models revealed several combinations of fragment ions that are positively or negatively associated with bacterial biofilm formation, which are not identified by PLS. With these insights, a second PLS model is calculated, in which interactions between key fragments (identified by the SOM) are explicitly considered. Inclusion of these terms improved the PLS model performance and shows that, without such terms, certain key fragment ions correlated with bacterial attachment may not be identified. The chemical insights provided by the combination of PLS regression and SOM will be useful for the design of materials that support negligible pathogen attachment.


Introduction
Nosocomial bacterial infections (hospitalacquired infections) that are resistant to standard antimicrobial treatments are the result of biofilm formation 80% of the time, as reported by Davies. [1,2] It is the most frequent adverse event in hospital and requires urgent control measures to be implemented for implanted and indwelling medical devices. Most strategies for reducing biofilm-associated infections focus on the modification of existing materials used to manufacture indwelling medical devices via introduction of anti-biofilm compounds such as antibiotics. [3] For example, polyurethane central-venous catheters impregnated with minocycline and rifampicin reduced bacteria colonization by 18% and prevented bloodstream infections. [4,5] Non-antibiotic approaches with various efficacies have also been described, including treatments employing silver sulfadiazine, Biofilm formation is a major cause of hospital-acquired infections. Research into biofilm-resistant materials is therefore critical to reduce the frequency of these events. Polymer microarrays offer a high-throughput approach to enable the efficient discovery of novel biofilm-resistant polymers. Herein, bacterial attachment and surface chemistry are studied for a polymer microarray to improve the understanding of Pseudomonas aeruginosa biofilm formation on a diverse set of polymeric surfaces. The relationships between time-of-flight secondary ion mass spectrometry (ToF-SIMS) data and biofilm formation are analyzed using linear multivariate analysis (partial least squares [PLS] regression) and a nonlinear self-organizing map (SOM). The SOM models revealed several combinations of fragment ions that are positively or negatively associated with bacterial biofilm formation, which are not identified by PLS. With these insights, a second PLS model is calculated, in which interactions between key fragments (identified by the SOM) are explicitly considered. Inclusion of these terms improved the PLS model performance and shows that, without such terms, certain key fragment ions correlated with bacterial attachment may not be identified. The chemical insights provided by the combination of PLS regression and SOM will be useful for the design of materials that support negligible pathogen attachment.
nitrofurazone, chlorhexidine and polymerized quaternary ammonium surfactants, [6][7][8][9][10][11] whilst a number of strategies to disrupt biofilm have also been explored. [12] Sustaining the efficacy of loaded devices is challenging and has limited their clinical impact. [13][14][15][16] Materials that are inherently resistant to bacterial attachment and biofilm formation provide an alternative approach to preventing device-associated infections that avoids the development of antimicrobial resistance. [17] However, the necessary understanding of biofilm formation on polymeric materials is not sufficiently developed to enable ab initio design of biofilm-resistant materials.
High-throughput experimental polymer microarray screening has enabled the discovery of anti-biofilm materials without the requirement of understanding the biological-material interaction. Beyond materials discovery, the hundreds of experimentally assessed biofilm formation data points acquired on these polymeric materials provide an ideal dataset to develop the understanding of how polymers influence bacterial behavior.
Time-of-flight secondary ion mass spectrometry (ToF-SIMS) provides a comprehensive assessment of the surface chemistry of polymers, [18][19][20] delivering high surface sensitivity and detailed molecular insights, [21,22] that can be used for the construction of structure-function relationships. [23] Multivariate analysis (MVA) techniques, such as partial least squares (PLS) regression and principal component analysis (PCA) have traditionally been used to interpret large, high dimensional ToF-SIMS spectral datasets. [24,25] MVA is useful for datasets containing linear correlations; [26] however, many materials' structure-property relationships are nonlinear. The presence of outliers and missing data also affect the quality of MVA outcomes. [27] Hook et al. reported a PLS regression study of the role of weak amphiphilic properties on resistance to biofilm formation. [28] Multiple chemical moieties were implicated in bacterial adhesion, specifically, ethylene glycol and hydroxyl side groups were found to promote bacterial adhesion whilst the combination of esters with hydrophobic side groups was shown to resist bacterial adhesion. [28,29] The same datasets were also used to construct non-linear quantitative structure-property relationship machine learning models using a Bayesian regularized artificial neural network (ANN). [30][31][32] These studies showed that nonlinear machine learning models are often more robust and predictive than conventional linear regression models. This work identified the relationships between specific features associated with hydrophobicity or proton transfer, and molecular shape and their propensity to prevent bacterial attachment to surfaces. The effect of combinations of chemical features, however, could not be completely resolved, suggesting that further study of important and complex features present in the system is needed.
McCulloch and Pitts first introduced the concept of threshold logic, a computational model created to mimic the biological neural system based on mathematics and algorithms, 80 years ago. [33][34][35] This model was the precursor of the widely used ANNs algorithms now very much in vogue. Kohonen introduced a non-linear algorithm known as the self-organizing map (SOM), or Kohonen network, based on ANNs. [36] Variants of the SOM have been developed, such as the counter-propagation artificial neural network (CPN), supervised Kohonen network (SKN), and the X-Y-fused Kohonen network (XYF) that are capable of predicting characteristics of interest. [27] They have found broad application in diverse fields such as engineering, [37] environmental science, [38] hyperspectral imaging, [39] and finance. [40] The ability of SOMs to provide a visual interpretation of high dimensional data make them highly valuable in many contexts. We have previously studied the application of unsupervised SOM to acrylate polymer microarrays. [41] SOMs showed exceptional performance in distinguishing chemically very similar polymers based on the topological relatedness of their surface chemistries.
In this study a polymer microarray has been prepared to include material properties and surface chemistries previously implicated in preventing bacterial biofilm formation, to enable an investigation of bacterial attachment to polymers using ToF-SIMS to characterize surface chemistry (Figure 1). The study has a particular focus on identifying where multiple factors play a role in complex, complementary, and competitive ways. A PLS regression analysis is first presented to compare the conventional analysis approach with SOMs directly. We then employ SOMs to identify relationships between polymer surface chemistry, resolved at the level of discrete molecular structures by ToF-SIMS, and functionality, namely the attachment of Pseudomonas aeruginosa, a pathogen frequently implicated in hospital-acquired bacterial infections. Figure 2 shows a schematic description of the SOMs employed in this study, highlighting the information flow in each type of SOM regarding sample class assignment. Using the dataset shown in Figure 1 as an example, each weight layer (W x ) represents each secondary ion fragment selected in the ToF-SIMS spectra (Figure 1c), whereas each class layer (C L ) represents the degree of P. aeruginosa attachment acquired from the bacterial attachment assay (Figure 1d). All SOMs are initialized by assigning a random number to the weight of each map layer of all network units, which are known as neurons (N x ). In other words, each neuron contains a full list secondary ion fragment selected, and a random number is assigned to each weight layer of the neuron. The UKN, illustrated as the grey block in Figure 2, is an unsupervised SOM where the input objects of sample S were presented to the entire set of network units. The neuron with weight vectors most like the sample is then labelled the winning neuron. The weight vectors of the winning neuron and its nearest neighbors are then updated according to the sample assigned to the winning neuron. This matching and adjustment process would be repeated based on the number of epochs defined by user. Eventually, clusters of neurons with similar weight vectors would form and samples with similar ToF-SIMS intensity would be grouped into their respective clusters, hence the name self-organizing map. The term "unsupervised" indicates that no classification based on a property of interest (label) was incorporated into the UKN and its data array. The principal focus of this study was the relationship between input data and assigned class. Hence, the UKN model was not used in the current work. Its description is provided to introduce the related supervised SOMs that have been developed from it.

Principles of the SOM
The CPN, also known as a pseudo-supervised model, is calculated using the same approach as the UKN. The only difference is that an associated output map is trained simultaneously as the output vector is calculated for the input signals. The unidirectional information flow is indicated by the arrow from the Kohonen layer to output layer (Figure 2a). The class membership of the neuron is therefore predicted from the class assigned to the samples and has no effect on the training of Kohonen layer. This property of the CPN makes it powerful in highlighting and confirming the indirect relationship between input data and  Over 400 different combinations of monomers in (a) were printed on a microarray. c) ToF-SIMS analysis to create positive and negative spectra for each polymer, whereby data was acquired using peak selection. d) Intensity map after incubating the microarray with green fluorescent protein (GFP) transformed P. aeruginosa for 72 h. e) SOM training was done by feeding the intensities of selected peaks to the Kohonen layer and P. aeruginosa count to the class layer to find the relationship between surface chemistry and bacteria attachment properties. assigned classes. In the SKN (Figure 2b), the input data and the class vector of a sample are merged and served as input for the training of the network. After training, the combined input and output layers are decoupled and presented as a topographical map. The training of SKN is affected by the class memberships of the samples and is, therefore, a supervised model. For the XYF network, the input and output layers are calculated independently and updated simultaneously, as indicated by the arrows in Figure 2c. The common winning unit is identified by the winner shared for both input and output layers. When the training is complete, both input objects and predefined class contributed equally to the organization of the XYF map. The mathematics and strategy behind each supervised model is detailed in Melssen et al. [27] Genetic algorithms (GA) are commonly used to identify the optimal settings for supervised SOM calculations, [42] based on the model fitness, to avoid overfitting or underfitting. The details of the GA strategy used in this study is explained in Ballabio et al. [43] Each chromosome consists of two sets of gene, one representing different sizes of the network (number of neurons) and the other representing numbers of training iterations (number of epochs). A chromosome with a random set of genes is used for each evaluation. Ten runs are performed for a GA calculation, with 25 evaluations for each run. Ten percent of the data is excluded for cross validation in each evaluation. Each evaluation is repeated 5 times to obtain a mean non-error rate of correctly assigned samples in the internal validation set (NERvalid ) and a non-error rate of correctly assigned calibration sample (NER calib ) for the calculation of the fitness function, [43] which is the optimization criterion for GA. Ten chromosomes with the highest fitness values for each run are filtered out as surviving units. The chromosome with the highest rate of survival and the highest fitness function is then chosen as the optimal architecture for weight analysis and prediction.

Bacteria Attachment on a Polymer Microarray Library
Polymer microarrays were prepared using procedures described in Anderson et al. (2004); [44] bacterial microarray screening and ToF-SIMS analyses were undertaken using the approach reported by Hook et al. (2012). [28] In brief, stock solutions were prepared at a ratio of v/v 75% monomer, 25% dimethylformamide (DMF) and 1% w/v 2,2-dimethoxy-2-phenyl acetophenone (DMPA). [44] Monomers were purchased from Aldrich, Scientific Polymers and Polysciences, and printed onto epoxy-coated slides (Xenopore), dip-coated into 4% w/v pHEMA (Aldrich) using 946MP6B pins (ArrayIt) and a Pixsys 5500 robot (Cartesian) or a XYZ3200 dispensing workstation (Biodot). A UV lamp (UVP Blak-Ray) was added to the workstation for poly merization by exposing the microarray to long wave UV for ≈10 s after each round of printing. The chemical structures of the monomers are shown in Figure 1a. Monomers 1-16 were mixed with Monomers A-F at molar ratios of 100:0 (homopolymer, 6 repeats), 90:10, 70:30, 60:40, 50:50, and 0:100 (homopolymer, 16 repeats) to create 576 solutions. The number in the name of the sample represents the molar percentage of Monomer A-F. The arrays were dried at a pressure of less than 50 mTorr for at least 7 days. Arrays were sterilized by exposure to UV for 30 min on each side, and then washed twice with phosphate buffered saline (PBS) for 30 min and then twice with medium for 30 min before use, to remove residual monomer or solvent.
The microarrays were incubated in suspensions of planktonic P. aeruginosa (PAO1, Nottingham strain) transformed with the constantly green fluorescent protein (GFP) expressing plasmid pME6032-GFP for 72 h. [45] The attachment of bacteria was quantified based on the fluorescence signal (F) measured by a GenePix Autoloader 4200AL Scanner (Molecular Devices, US) with a 488 nm excitation laser and standard blue emission filter (510-560 nm). The total fluorescent intensity from the polymer spots were acquired using GenePix Pro 6 software (Molecular Devices, US). The details of bacterial growth conditions and the model for estimating bacterial count from the fluorescence signal had been previously presented by Hook et al. [28]

ToF-SIMS Analysis of the Polymer Microarray
ToF-SIMS measurements were conducted on a ToF-SIMS IV (IONTOF GmbH, Germany) instrument operated using a monoisotopic 25 keV Bi 3 + primary ion source in "bunched mode." A 1 pA primary ion beam was rastered, and both positive and negative secondary ions were collected from a 100 × 100 µm area. The typical mass resolution (at m/z 41) was approximately 6000. Charge compensation was achieved by the use of a flood gun. Positive spectra were calibrated to ions CH 3 + , C 2 H 5 + , C 3 H 7 + and C 4 H 7 + , whilst negative spectra were calibrated to CH − , C 2 H − , C 3 H − and C 4 H − . Identical ToF-SIMS spectra in the positive polarity were collected for sample 521 to 576, which was likely to be a sample navigation issue. These 54 samples were excluded, giving a total of 520 (N) samples for analysis.

Data Preprocessing
ToF-SIMS spectra (positive ion and negative ion) were transformed into data arrays for subsequent SOM analyses by conventional peak list formation using curated peak selections. Peaks corresponding to secondary ions that represented molecular species and structures of interest were selected for both positive and negative ion mass spectra. A total 415 peaks were selected and processed for calculation. The intensities of the selected molecular ion peaks were extracted and combined as a data array and normalized to total ion count (based on the selected peaks) per pixel. Data columns/ion images were autoscaled prior to PLS regression and range scaled independently between 0 and 1 prior to SOM training. The order of the samples was randomized and 10% of randomly selected samples were excluded as a test set, for prediction.
Five classes based on the measured bacterial fluorescence were defined for the SOM analysis of each of 520 homopolymers and copolymers, with Class 1 representing zero attachment through to Class 5 with the highest measurable attachment, as shown in Table 1.
These five classes were employed in subsequent CPN, SKN, and XYF analyses to elucidate and quantify relationships between bacterial attachment and molecular surface chemistry.

PLS Analysis
The quantified P. aeruginosa count was mean centered prior to PLS regression analysis, which was performed using PLS_ Toolbox (Version 9.0 Eigenvector Research, Manson, WA) in MATLAB R2020a (The MathWorks Inc., USA). Contiguous blocks were used for cross validation, with 20 maximum latent variables (LVs) and 10 data folds. The number of LVs with the lowest root mean square error of cross validation (RMSECV) was selected for model calculation.

SOM Analysis
Bacterial attachment classes were used to supervise the training of three types of Kohonen networks-CPN, SKN, and XYF, with the unsupervised UKN providing a point of reference. SOMs were calculated using the Kohonen and CPANN Toolbox (Version 4.1) (Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Italy) [46,47] in MATLAB R2019b. All calculations were performed on a computing cluster (CMSS-MATLAB; 16 CPUs) at La Trobe University. k-fold cross validation was used to optimize the SOM models. With this approach, the entire data set was partitioned into k distinct groups of samples, called folds. One of the k folds was then withheld and the remaining k − 1 folds are used the train the model. The performance of the model was then measured using the withheld fold. This process was repeated k times, such that each of the k folds was only withheld once. The final performance of the model was given by the mean performance across each of the k folds. 10 folds were used to crossvalidate the models, and the venetian blinds method was used to partition the data. For the ith fold, where i ∈ {1, 2, …, k}, venetian blinds includes every kth sample in the fold, starting from sample i.
The number of neurons and epochs ranging from 12-24 and 100-30 000, respectively, were used for the SOM optimization by means of GA. The bubble plots in Figure S9, Supporting Information, summarize the relative frequency of occurrence and the mean optimization criteria for each model. The optimal architecture selected for each model was used for subsequent SOMs calculations, 24 × 24 neurons and 10 000 epochs for CPN, and 22 × 22 neurons and 2000 epochs for SKN. Although the XYF map with 20 × 20 neurons and 30 000 epochs had the highest relative frequency, its mean fitness value was below average. Therefore, another set of hyperparameters (24 × 24 neurons and 5000) with a highest fitness value and second highest frequency of selection was used for XYF calculation for further comparison.
Homopolymers of the standard Monomers A and B (shown in Figure 3a) produced low levels of bacterial attachment (Class 1 or 2), whilst for Monomers C-F, medium levels of bacterial attachment (Class 3) were observed. Most homopolymers and copolymers formed from Monomers 9-11, which contained either ethylene glycol or propylene glycol function side groups, produced high levels of bacterial attachment (Classes 4 and 5). The lowest bacterial attachment to homopolymers were observed for Monomer 13 (0.5 ± 0.3 × 10 6 ), whilst Monomer 1 was the most effective at maintaining low levels of bacterial attachment (Class 2) when mixed with standard Monomers C-F. The lowest overall bacterial attachment (Class 1) was observed for homopolymers of Monomer A and copolymers produced using Monomers A or B with Monomers 1, 2, or 12. ToF-SIMS spectra collected for each sample in the microarray were coupled with the mean F value of the sample to form a new dataset and analyzed with PLS regression, a technique previously used for correlating surface chemistry and cell attachment. [28,48,49] The R 2 value for the PLS model shown in Figure 3b was 0.70, indicating a linear relationship between the predicted and measured bacterial attachment for this new set of polymeric microarrays. This linear relationship suggested that the attachment of P. aeruginosa is dependent on the surface chemistry of polymers as represented by ToF-SIMS data. The regression coefficients (RC) shown in Figure 3c represent the influence of each secondary ion on bacterial attachment. Ions with high RCs were associated with high bacterial attachment and those with low RCs were associated with high resistance to bacterial attachment. These results were consistent with those reported in Hook et al. (2012), [28] where most oxygen-containing ions from ethylene glycol (C 2 H 2 O 2 − ) and hydroxyl groups (C 2 H 6 O + , C 4 H 9 O + ) were correlated with high bacterial attachment and hydrocarbon secondary ions representing cyclic carbon groups (C 4 H 5 − , C 7 H 3 − , C 5 H 5 O − ) and aliphatic groups (CH 3 + ) were associated with high bacterial resistance. However, some cyclic carbon groups (C 8 H 8 +, C 10 H 12 +) were also found to be correlated with high bacterial attachment. Moreover, fluorine-containing (positive RC:  face chemistry and bacterial attachment are more complicated than is indicated by the linear PLS model. Given that the same fragment ion can be associated with more than one chemical structure, this is not unexpected. As such, complementary methods are needed to further explore these relationships.

Comparison of SOMs Supervised by Bacterial Attachment Data
We used SOMs to investigate bacterial-material interactions underpinning the observed bacterial attachment and compared the output weight of SOMs with the PLS model. Bacterial attachment was modulated via the surface chemistries so samples clustered by molecular composition in the SOM models should be definitively clustered by bacterial class. This approach also allowed identification of the molecular functionalities associated with low, intermediate, and high attachment. We first needed to decide which ML algorithm was best suited to our dataset. Figure 4 shows a comparison of supervised CPN, SKN, and XYF SOMs trained on the same ToF-SIMS data set. Note that the hyperparameters for each model were chosen based on a GA, as discussed in the Experimental section. Figure 4a-c shows SOMs trained using the five bacterial attachment classes used to supervise SOM training. An additional test of SOM supervision performance is shown in Figure 4d-f, where the class memberships were assigned randomly. We use this class scrambling to study the influence of the class labels on the clustering in the SOM.
To quantify this effect, we calculated the Moran's I valuea measure of spatial autocorrelation-for the SOM based on neuron class assignment. Note that the algorithm was modified to suit a toroidal topology to allow even spacing between samples. A random arrangement of classes on the SOM would produce a Moran's I value close to 0, while well discriminated clusters would produce a value closer to 1. We also calculated the topographical error (TE), the fraction of samples whose best and second-best winning units are not neighbors on the map, to evaluate the local  map discontinuity. [50,51] This allowed us to determine the influence of assigned class during the training of SOMs, that is, a decrease in TE associated with class assignment suggests that the class labels are decreasing the topological accuracy of the model. Finally, all SOMs were also used to predict the class assignment of the test set (10% of randomly selected samples excluded from training) to study the predictive ability of each model. Figures 4a and 4d show CPN topographic maps where the classes are determined by bacterial attachment and by random assignment, respectively. The same TE for both CPN maps was expected, given that the class labels do not impact on model topology. To further demonstrate this, the location of replicates of homopolymer A, outlined in red, are well clustered in each case. There were significantly more unassigned neurons when a random class assignment was used. These neurons were identified as being equally associated with more than one class, and their increased abundance was due to the lack of relationship between input ToF-SIMS data and assigned class. The bacterial attachment classes (Figure 4a) form domains on the topographic map, consistent with a structure-property relationship between molecular surface composition and bacterial attachment. This resulted in a moderated Moran's I value of 0.35. On the contrary, the randomly assigned classes (Figure 4d) are scattered across the topographic map, as indicated by the low Moran's I value. Figures 4b and 4e show SKN topographic maps with the bacterial and random class assignments, respectively. Comparing to the CPN model (Figure 4a), bacterial attachment classes formed larger, more uniform domains on the SKN map (Figure 4b), giving a higher Moran's I value associated with the supervision. The TE, however, was slightly higher for SKN compared to CPN, especially when random labels were used. This indicates that class assignment influenced the topological accuracy of the SOM. Interestingly, the Moran's I value for the SKN with random labels was lower than that of the equivalent CPN, although only marginally. Finally, the prediction accuracy was slightly higher for the SKN than the CPN, and therefore the SKN would be preferable for predictive purposes. Figures 4c and 4f show XYF topographic maps, again with the bacterial and random class assignments, respectively. In this case, there was a clear difference in the XYF maps, compared to both the CPN and SKN maps. For both the bacterial and random labels, very large class domains were formed on the SOM, with correspondingly high Moran's I values. Given that randomly assigned classes cannot have any ordered relationship with surface chemistry, this result shows a strong biasing of the outcome by the class assignment itself. This is further supported by the higher TE compared to CPN and SKN. Additionally, replicates of homopolymer A (outlined in red) are well clustered for the bacterial attachment classes (Figure 4c) but are not clustered for the randomly assigned classes (Figure 4f).
Given the Moran's I values and prediction accuracies for each map, particularly for the pseudo-supervised CPN, these results provide strong evidence for a relationship between surface chemistry and bacterial attachment, consistent with previous quantitative reports [28,52] and with the PLS regression results. Although the XYF model clustered the classes well, the high TEs obtained for the XYF maps indicate that the effect of an assigned class was strongly biasing the model, leading to a loss of topological information. This is detrimental to our study, in which we sought to investigate the relationships between data topology and bacterial attachment. While the SKN provided better clustering quality than the CPN, the TE was the slightly lower for CPN, indicating higher topological accuracy. Nevertheless, given the similar results obtained with the CPN and SKN, the weights of both maps were studied to further explore the relationship between surface chemistry and bacterial attachment.

Analysis of Clusters Correlated with Bacterial Attachment
Weights from Class 1 and Class 5 neurons in the CPN and SKN, which represent polymers with lowest and highest bacterial attachment respectively, were selected for further analysis to compare the chemical groups associated with high bacterial resistance and attachment directly.  presented in Figures S1 and S2, Supporting Information. The CPN and SKN topographic maps illustrated in Figure 5 color coded with class weights provide insight into the clusters of the Class 1 and 5 neurons, compared to the class assignment maps shown in Figure 4. That is, the class assignment maps only show binary assignments for each class, whereas the weights show the continuous assignment fraction for each neuron. For example, although Cluster C and D in the CPN class assignment map (Figure 4a) seem to be connected, the map for Class 5 weighting (Figure 5a, red) shows that these Class 5 neurons are grouped into two sub-clusters, separated by lower weighted neurons. For both models, the smaller Class 1 clusters are labelled as A and bigger ones as B, the Class 5 clusters are then labelled alphabetically based on their distances from the Class 1 clusters.
Based on Figure 5, the SKN better separated the Class 1 clusters since Cluster A and Cluster B were separated from each other, while CPN appeared to have more clearly identified the different Class 5 clusters, as four major chemical groups were associated with high bacterial attachment instead of two in the SKN map.
The top 17 average calculated weights of the secondary ions for each cluster are summarized in Table 2, and may be used to further explore the clusters in the CPN and SKN. The secondary ions in Table 2 are color coded by their associated chemical functional groups. Note that a separate SKN ( Figure  S3, Supporting Information) was calculated using the replicates of the homopolymers, and the weightings of each set of replicates were analyzed to assist with assignment of the functional group. The top 15 weights for each set of homopolymers are summarized in Figure S4, Supporting Information, illustrating the chemical group assignment process.
Cluster A and Cluster B in the CPN and SKN have very similar weights, which are plotted against each other and shown in Figures S5 and S6, Supporting Information, highlighting the consistency of the SOM models. While a high weighting of nitrogen-containing secondary ions is seen for all Class 1 clusters, Cluster A, the smaller Class 1 cluster has high weighting for fluorine-containing secondary ions. This suggests that in addition to amine/nitrogen containing groups, surfaces with a combination of amine and fluorine functional groups exhibit good bacterial resistance. Figure 5 shows that SKN was more effective in separating the fluorine-containing samples from the other amine-containing materials. A selection of the major secondary ions associated with high bacterial attachment and resistance are illustrated in Figure 6 for improved visualization. Some of the secondary ions such as C 2 H 4 O + and C 2 H 4 O 2 + may be derived from glycol and hydroxyl groups, which could complicate conventional linear multivariate analysis.
For the CPN Class 5 clusters, Cluster C consisted mostly of glycol containing secondary ions, Cluster D mostly phenyl, Cluster E mostly hydroxyl, and Cluster F mostly cyclic moieties. For SKN, these four chemical groups merged into two major clusters, where Cluster C contained a high proportion of glycol and hydroxyl groups, and Cluster D contained mostly phenyl and other cyclic groups. The ability of SKN to identify a contaminated sample is demonstrated by the isolated Class 5 neuron containing sample 12F (3). High levels of nitrogen-and phenylcontaining secondary ions suggest that 12F(3) is contaminated, since Monomers 12 and F did not contain amine and phenyl groups. The nitrogen and phenyl contamination may have been from residual DMF (C 3 H 7 NO) and DMPA (C 16 H 16 O 3 ), respectively, or inadequate washing of the microarray pins during generation of the polymer spots and potentially crosstalk from the washing procedures. The sample still identified as Class 5 although nitrogen-containing secondary ions was previously shown to be associated with negligible bacterial attachment.
From Table 2, it appears that the CPN more clearly separated different functional groups associated with bacterial attachment, since the phenyl containing samples were separated from glycol and hydroxyl moieties. However, among the phenyl containing monomers, only Monomer 9 was found in the Class 5 clusters, whereas Monomers 7 and 8 also contained a phenyl group. Most of the samples located in the Class 5 clusters contain Monomers 9, 10 and 11, suggesting that the major chemical group contributing to high bacterial attachment was the long glycol chain, not the phenyl group of Monomer 9. Hence, the SKN appeared to more clearly identify the main functional groups associated with high bacterial attachment ability over CPN because of the supervised learning process.

Using the SOM to Identify Key Interaction Variables
It is valuable to compare the PLS regression and SOM results directly, to identify both similar and conflicting outcomes. First, we note that both methods identified C 2 H 2 O 2 − (ethylene glycol), C 4 H 9 O + (hydroxyl) and C 8 H 8 + (cyclic) to be associated with bacterial attachment and CH + (aliphatic), C 3 H 3 O 2 F 2 − and C 2 H 2 F 5 − to be associated with high bacterial resistance. However, it is interesting that PLS regression did not identify some of the fragment ions associated with high or low bacterial attachment, such as C 5 H 8 O 2 + , C 4 H 5 O 2 + and ions containing amine groups. One possible reason is that any given fragment ion may not be unique to a specific polymer, and hence may not correlate strongly (positively or negatively) with bacterial attachment. This is critical, as it indicates a limitation of the PLS regression approach, which seeks to identify correlations between individual fragment ions and bacterial attachment. The SOM methods, on the other hand, were used to explore the relationship between data topology and bacterial attachment, and therefore reveal complementary information to PLS. In particular, the SOM identified clusters of high or low bacterial attachment surface chemistries, from which combinations of fragment ions associated with each cluster could be studied and compared.
As an additional validation of the SOM results, and to show how PLS regression and the SOM can be used together, we calculated a second PLS regression model, this time including additional interaction variables. These interaction variables were calculated using the 3 top weighted positive and negative fragment ions associated with each of the Class 1 and Class 5 SKN clusters. The intensities for each positive ion were multiplied with those from each negative ion within the same cluster, giving nine additional interaction variables for each cluster (totaling 36 interaction variables across the 4 clusters).
We compared the two PLS regression models to investigate whether the additional interaction variables improved the model. A slightly higher R 2 ( Figure S7a  Information) value and lower root mean square error of cross validation (RMSECV) ( Figure S7b, Supporting Information) were obtained after adding the interaction variables, indicating that the interaction variables slightly improved the prediction accuracy of the PLS model. Interaction variables with either high or low correlation with bacterial attachment were also identified. Table 3 shows the 6 most highly positively and negatively correlating interaction variables (the complete RCs calculated for all 36 interaction variables are shown in Table S8, Supporting Information). Also shown are the RCs associated with each individual ion from the interaction as a comparison, as calculated from the original PLS regression model without interaction variables. There are several important observations relating to Table 3. In all cases, the interaction variables exhibited higher RCs than their constituent fragment ions alone. Interaction variables containing cyclic hydrocarbon coupled with oxygen-containing ions were most positively correlated with bacterial attachment. This agrees with the outcome of SKN that indicated that a phenyl group attached to a long glycol chain (Monomer 9) had a much stronger bacterial attachment ability compared to phenyl group alone (Monomers 7 and 8). Importantly, the C 4 H 2 O 2 − , C 2 HO − and C 4 HO − ions were not individually strongly positively correlated to attachment, instead exhibiting low positive or even negative correlation. This suggests that these ions are not generally correlated with bacterial attachment, however when present with cyclic hydrocarbon ions are strongly correlated. Results in Table 3 also help to explain why positive RCs were obtained for some cyclic groups in the original PLS model in Figure 3, despite the fact that cyclic groups were previously found to be associated with high bacterial resistance. [28] The interaction variables featuring ions containing nitrogen and fluorine further emphasize the limitations of the original PLS regression model. Individually, both C 4 H 4 N + and F − were found to be positively correlated with bacterial attachment. However, the interaction of the two ions gave a highly negative RC. In fact, all interaction terms for nitrogen-and fluorine-containing ions were found to be correlated with bacterial resistance (Table S9, Supporting Information). This consistent result further emphasizes that fluorine groups coupled with amine groups (as identified by the SKN) lead to high P. aeruginosa resistance. More studies need to be done to fully understand why the combination of these two functional groups prevents P. aeruginosa attachment. A combination of several factors, such as hydrophobicity, chemical reaction between polymer and bacterial surfaces, and orientation and arrangement of the functional groups, could be contributing to this observation.
Together, these results highlight the importance of complementing standard PLS regression analysis with techniques such as the SOM, which can provide further information about potential interactions between fragment ions. In this regard, understanding data topology is critical to accurately interpret and improve upon the information provided by PLS.

Conclusion
The relationship between ToF-SIMS spectra and P. aeruginosa attachment for a set of polymer combinations previously found to be associated with low, moderate, and high resistance to bacterial attachment was assessed by PLS and SOM models. The observed correlation between surface chemical data and bacterial attachment suggests that for the polymeric materials studied, the surface chemistry plays a key determining role in the bacterial response.
We explored different SOM variants and their ability to reveal information about structure-property relationships in a polymer microarray. In our case, the use of class labels biased the ability of the XYF to learn the data topology, undesirable for identifying the relationship between polymer surface chemistry and performance. For the SKN, while class membership clearly influenced topographic map formation, the modelled topology much more closely resembled that of the pseudo-supervised  CPN, suggesting lower influence of the class labels and therefore a higher influence of the ToF-SIMS molecular data. Comparing the weights of clusters representing the highest and lowest bacterial attachment for CPN and SKN suggested that SKN had a stronger ability to cluster and identify major chemical functional groups associated with bacterial resistance and bacterial attachment. The weights of the SKN suggested that fluorine-and nitrogen-containing groups hinder bacterial attachment while hydroxyl and long glycol groups promote bacterial attachment. The weights also showed that the contaminated samples or other outliers could be identified in the SKN topographic map as isolated neurons. This powerful ability of the SKN to identify contaminated samples allows them and other outliers, which could complicate conventional regression analyses, to be eliminated with justification.
While PLS is a powerful tool to detect the correlation of an individual ion, SOMs successfully detected ions associated low and high P. aeruginosa resistance that were not easily detected by the PLS model. Moreover, SOMs were able to identify different classes of materials with different chemistry that behaved similarly in terms of bacteria-surface interactions. Combining a SOM with PLS showed that interaction variables are important and highlight limitations of the original PLS. Coupling these two data analysis models is therefore crucial for a much deeper understanding of the complex interactions between surface chemistry and bacterial surfaces. By coupling the two models, further analysis could be done with different types of bacteria to study the universality of the bacterial resistance property of the polymer array.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.