Combination of Structure Databases, In Silico Fragmentation, and MS/MS Libraries for Untargeted Screening of Non-Volatile Migrants from Recycled High-Density Polyethylene Milk Bottles

Chemical contamination is one of the major obstacles for mechanical recycling of plastics. In this article, we built and open-sourced an in-house MS/MS library containing more than 500 plastic-related chemicals and developed mspcompiler, an R package, for the compilation of various libraries. We then proposed a workflow to process untargeted screening data acquired by liquid chromatography high-resolution mass spectrometry. These tools were subsequently employed to data originating from recycled high-density polyethylene (rHDPE) obtained from milk bottles. A total of 83 compounds were identified, with 66 easily annotated by making use of our in-house MS/MS libraries and the mspcompiler R package. In silico fragmentation combined with data obtained from gas chromatography–mass spectrometry and lists of chemicals related to plastics were used to identify those remaining unknown. A pseudo-multiple reaction monitoring method was also applied to sensitively target and screen the identified chemicals in the samples. Quantification results demonstrated that a good sorting of postconsumer materials and a better recycling technology may be necessary for food contact applications. Removal or reduction of non-volatile substances, such as octocrylene and 2-ethylhexyl-4-methoxycinnamate, is still challenging but vital for the safe use of rHDPE as food contact materials.


The mspcompiler R package
As detailed in Github (https://github.com/QizhiSu/mspcompiler), the objective of the mspcompiler R package is to offer means to compile either EI or tandem mass spectral libraries from various sources, such as NIST (if you have it installed), MoNA, and GPNS, and organize them into a neat and up-to-date *.msp file that can be used in MS-DIAL.
If you have NIST library installed, you can follow the step-by-step instruction described in the abovementioned website to convert the NIST library into *.msp format. However, the converted version does not contain SMILES (used for visualising the chemical structure in MS-DIAL) and retention index (RI, useful for improving the reliability of the identification in GC-MS data processing). For this reason, the package offers a function to extract chemical structures and assign the SMILES to the *.msp file accordingly, such that the analyst can have a general idea about the chemical structure of the candidates, which is useful for data interpretation. Different manipulations may be required depending on the specific publicly available EI libraries that need to be compiled. For instance, if you are working with the MoNA library, you may need to use the reorganize_mona() function to reorganize the SMILES and clean the chemical name.
For EI library, the package offers a way to extract reference RI in the NIST and assign them to the *.msp file accordingly based on the polarity of the column used. For those chemicals that do not have reference RI in NIST, then you can export their SMILES and estimate their RI using prediction models, for example the one developed by (Matyushin, Dmitriy D., Anastasia Yu Sholokhova, and Aleksey K. Buryak. 2019. "A Deep Convolutional Neural Network for the Estimation of Gas Chromatographic Retention Indices." Journal of Chromatography A 1607: 460395. https://doi.org/10.1016/j.chroma.2019.460395.), and then assign the predicted RI to the *.msp file accordingly. In practice, rounding the reference RI to the nearest integer while keeping two decimal places for the predicted RI can be helpful when manually checking the results. This allows for differentiation between the reference and predicted values in MS-DIAL.
For MS/MS library, after converting the NIST library into the *.msp format, you may need to separate the library into 2 parts based on the polarity since MS-DIAL processes positive and negative mode data separately. If you are working with the GNPS library, in addition to the polarity separation, you may need to convert it from *.mgf to *.msp format.
Once each library is cleaned and reorganized, you can combine and export them as *.msp format accordingly.

Constructing of in-house library
The in-house library was built following the strategy proposed by (Tada, Ipputa et al. 2019. "Creating a Reliable Mass Spectral-Retention Time Library for All Ion Fragmentation-Based Metabolomics." Metabolites 9(251): 1-15.). The standards were, in principle, prepared in methanol, with 10 standards being mixed together. In a mixture, no repeated molecular formula is allowed to avoid any confusion. These mixtures with 5 different concentrations were then injected into the UPLC-QTOF-MS system using the following conditions, which are identical to the ones used in this study: A Waters Acquity UPLC equipped with an AtlantisTM premier BEH C18 AX column (2.1 × 100 mm) of 1.7 μm particle size (Milford, MA, USA) was employed for the separation. Column temperature was set at 40 °C under the flow of 0.3 mL/min. Water and methanol, both spiked with 0.1 % formic acid, were the mobile phase A and B, respectively, for both positive and negative modes. A 13 min run was used with the following gradient elution: initial mobile phase A/B 95/5 was shifted to A/B 100/0 in 7 min, kept for 4 min, then dropped to the initial mobile phase in 0.1 min, and maintained for additional 1.9 min to get the system ready for the next injection. Injection volume was 10 μL.
The QTOF-MS was coupled to the UPLC by an electro spray ionization (ESI) probe. The conditions employed were as follows: resolution mode, capillary voltage 3.0 kV, sampling cone voltage 45 V, extraction cone 4.0 V, source temperature 150 V, desolvation temperature 350 °C, cone gas flow rate of 40 L/h, and the desolvation gas flow rate of 600 L/h. Data independent analysis (DIA, i.e., MSE), with low energy at 6 V and ramp high energy at 10-30V , was used for data acquisition and masses from 50 to 1200 Da were scanned. Leucine enkephalin (CAS 58822-25-6) at 2 ng/mL was employed for on-line mass correction. Test-mix from Waters was injected every 20 injections to ensure accuracy of the data. Then, the data was manually checked considering the following criteria: 1). If the Mass error is within the tolerance (5 ppm) 2). If the intensity increases with higher concentration 3). If the main fragment ions are explainable by MS-FINDER's in-silico fragmentation.
When all these criteria were met, the post identification was deemed confident, and the peak along with its Name, Adduct, InChIKey, SMILES, Formula, and MS/MS spectra were exported to MS-FINDER. Sometimes, MS-DIAL allocated a wrong adduct, then we would modify it in MS-FINDER, and saved them as a *.msp file. Finally, the MS-LIMA software was employed to check if there is any error in the library and to convert the mass of the precursor ions into the theoretical one.

Compilation of MS/MS libraries used for identification in MS-DIAL
Following the mspcompiler R package manual (https://github.com/QizhiSu/mspcompiler), we consolidated and restructured MS/MS libraries from various sources, including those compiled by the MS-DIAL developer, downloaded from GNPS (https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp), and from NIST 17. We also incorporated an in-house library containing 449 and 172 chemicals associated with food packaging in positive and negative modes, respectively. The compiled libraries, both positive and negative, were employed for the identification accordingly.

Converting GC-MS identified substances into a structure database used by MS-FINDER
In order to correlate LC-MS signals to a list of volatile and semi-volatile compounds for a given sample, a common approach is to compute the molecular formula from the LC-MS signals and manually search for the corresponding formula in the list of compounds. However, this approach may not be sufficient, as relying solely on the molecular formula may not accurately identify the correct candidate. While one could download the *.mol file of the candidate and use an in-silico fragmentation tool to confirm its viability, this method can be time-consuming and manual.
To address this issue, we propose a new method using MS-FINDER, which utilizes the list of volatile and semi-volatile compounds as a structure database. We have developed a R function, namely export4msfinder in the labtools R package (https://github.com/QizhiSu/labtools.) to convert any list of chemicals into a structure database to be used in MS-FINDER (this function can also be used for converting other list of chemicals that are related to the samples under investigation). With this method, the structure information of the compounds can be automatically retrieved, and after computing the molecular formulas, MS-FINDER will use this database to computationally fragment structures with the same molecular formula and rank them based on factors such as in-silico fragmentation probability. This approach can significantly reduce the manual interpretation required and increase the efficiency and accuracy of the compound identification process.