Enhancing Molecular Characterization of Dissolved Organic Matter by Integrative Direct Infusion and Liquid Chromatography Nontargeted Workflows

Dissolved organic matter (DOM) in aquatic systems is a highly heterogeneous mixture of water-soluble organic compounds, acting as a major carbon reservoir driving biogeochemical cycles. Understanding DOM molecular composition is thus of vital interest for the health assessment of aquatic ecosystems, yet its characterization poses challenges due to its complex and dynamic chemical profile. Here, we performed a comprehensive chemical analysis of DOM from highly urbanized river and seawater sources and compared it to drinking water. Extensive analyses by nontargeted direct infusion (DI) and liquid chromatography (LC) high-resolution mass spectrometry (HRMS) through Orbitrap were integrated with novel computational workflows to allow molecular- and structural-level characterization of DOM. Across all water samples, over 7000 molecular formulas were calculated using both methods (∼4200 in DI and ∼3600 in LC). While the DI approach was limited to molecular formula calculation, the downstream data processing of MS2 spectral information combining library matching and in silico predictions enabled a comprehensive structural-level characterization of 16% of the molecular space detected by LC-HRMS across all water samples. Both analytical methods proved complementary, covering a broad chemical space that includes more highly polar compounds with DI and more less polar ones with LC. The innovative integration of diverse analytical techniques and computational workflow introduces a robust and largely available framework in the field, providing a widely applicable approach that significantly contributes to understanding the complex molecular composition of DOM.


SI-1: Chemicals and materials
Water and methanol (LiChrosolv® LC/MS grade) for solid phase extraction were acquired from Merck (Darmstadt, DE).Water and methanol (Optima® LC/MS grade) for instrumental analysis were purchased from Thermo Fisher Scientific.Hydrochloric acid 25% (ACS grade) was purchased from Merck (Darmstadt, DE).Glass fiber filters (GF/F, 0.7 μm mesh size, 47 mm diameter) were purchased from Whatman (Little Chalfont, UK).SPE cartridges with a styrenedivinylbenzene stationary face Bond Elut PPL (Priority Pollutant) were acquired from Agilent Technologies (Santa Clara, USA).The reference material Suwanne River Fulvic Acid (SRFA, 2S101F) was purchased from the International Humic Substances Society.

SI-2: Sample treatment
Samples were filtered through 0.7-µm Whatman GF/F glass fiber filters immediately after the collection, acidified to pH 2.0 by adding hydrochloric acid stored at -20°C overnight.Before the extraction, PPL cartridges were washed three times with 1 mL of methanol, soaked overnight with 3 mL of methanol and conditioned with 3 mL of HPLC grade water acidified to pH 2 by adding hydrochloric acid.Samples were loaded under vacuum at about 3 mL/min.Cartridges were then washed with 3 ml of acidified HPLC-grade water and dried.Elution was performed with 3 ml of methanol and the final extract was collected in chromatographic vials, previously weighed.Two aliquots of 1 mL of each sample were prepared, one for direct infusion analysis and one for liquid chromatography.Samples were stored at -20°C until the day of the analysis.On this day each sample was diluted 1:1 with ultrapure water.The SRFA powder was weighed and diluted to 50 mg/L in MeOH:HPLC grade water (1:1).

SI-3: SRFA analysis by DI-HRMS and LC-HRMS
The reference material SRFA was analyzed by DI-HRMS and LC-HRMS following the same conditions as the other samples.A procedural blank was injected at the same time and the blank signals were removed during the data processing.Following the recommendations of the interlaboratory comparison study on DOM composition (Hawkes et al, 2020), results were compared through the online data comparison tool here provided (https://kairos.warwick.ac.uk/InterLabStudy) to facilitate the comparison of DOM reference standards.Raw data are available in the MassIVE depository (MSV000094643) and results are reported in Supplementary_data2.
For the molecular formula calculation of SRFA sample injected in DI, the following rules were respected: maximum error 5.0 ppm; O/C ratios in the range 0-1; H/C ratios in the range 0.3-2.5;DBE-O (double bond equivalent minus O atoms) between −10 and 10; m/z between 200 and 800; admitted atoms, C 4-40 H 1-80 O 1-40 N 0-1 S 0-1 with and without one 13 C. Peaks with assigned molecular formulas detected amounted to 3453.Of these, 1071 were found to be in common with the reference dataset provided in InterLabStudy, yielding a 95% match.The corresponding metric data were calculated as follows: H/C metric, 1.1008; O/C metric, 0.5344; AImod metric, 0.3256, and MW metric, 403.4267.These results indicate a consistency of our DI-HRMS method with previous approaches used to characterize DOM.
The SRFA sample injected by LC-HRMS was processed according to the workflow described in the main document.The novelty of its application makes these results not strictly comparable with previous studies.A molecular formula and a structure candidate were attributed to 573 features.In this case, only 2% of common peaks were observed in the comparison with InterLabStudy. Figure S3 shows the VK diagram of these results confirming what was observed in the samples of the present study as well.The LC analysis and workflow here developed, only focus on a fraction of the organic matter that differs from the one detected by DI-HRMS methods and with which has a very little overlap.

SI-4: LC-HRMS analisys in DDA acquisition mode
The chromatographic separation was performed in the same conditions as the DIA experiment.The UHPLC system was coupled to Q-Exactive Orbitrap (Thermo Fisher Scientific) operated in both positive (ESI+) and negative (ESI-) mode in full-scan (m/z range 67-1000 Da, 70000 nominal resolution FWHM at 200 m/z), with parallel data-dependent (DDA) acquisition of MS2 spectra from the top 5 most abundant ions per cycle (m/z range 200-1000 Da, 35000 resolution, normalized collision energy 35).Full details are reported in Table S1.Data were processed as in the DIA experiment, including formula calculation, featured-based molecular networking, and network annotation propagation.Raw and processed data are available in MassIVE (MSV000094642).Molecular networks and library matches are available in GNP (LC-ESI+:link; LC-ESI-:link).
A total of 740 features were associated with a formula and a respective molecular structure (714 in river water, 641 in seawater, and 341 in drinking water).Of these, 123 (17%) were annotated in the GNPS libraries.With this approach, more library annotations are gained, but the overall number of compounds is five times lower than in DIA, resulting in a loss of structural information and very low coverage of the DOM chemical space.

Figure S1 .
Figure S1.Van Krevelen plots show the distribution of the molecular formulas assigned in DI, LC (ESI-), and LC (ESI+).Each hexagon represents one or more molecular formulas, with color serving as a quantitative measure of the number of formulas overlapping in different regions of the plot.

Figure S2 .
Figure S2.Van Krevelen plots show the total number of molecular formulas assigned by SIRIUS in LC-HRMS ESI negative (blue) and ESI positive (red) ion mode.In yellow, formulas that found consensus in NAP and so have also structure prediction.

Figure S3 .
Figure S3.Van Krevelen plot shows the results of SRFA analysis by DI-HRMS and LC-HRMS workflow in comparison with the InterLabStudy.Green squares represent the matches between our results and ions that were detected and assigned by at least three instruments tested in the InterlabStudy (22% in DI and 4% in LC).Purple circles represent the matches between our results and assigned ions common to all instruments tested in the InterlabStudy (95% in DI and 2% in LC).Yellow triangles represent the ions that uniquely figure in our results.These plots were directly downloaded from the InterLabStudy comparison platform (https://kairos.warwick.ac.uk/InterLabStudy).