Methods matter: Influential purification and analysis parameters for intracellular parasite metabolomics

Due to improved instrument sensitivity and access, the use of metabolomics is gaining traction for the study of many organisms and pathogens. For the intracellular malaria parasite, Plasmodium falciparum, both targeted and untargeted metabolite detection has improved our understanding of pathogenesis, host-parasite interactions, parasite response to antimalarials, and impacts of resistance. However, protocols for purification are not optimized for investigations of intracellular pathogens and noise-limiting analysis parameters are not well defined. To explore influential parameters, we purified a diverse set of in vitro grown intra-erythrocytic P. falciparum parasites for untargeted metabolomics studies. Following metabolite identification, data processing included normalization to double stranded DNA, total protein, or parasite number to correct for different sample sizes and stage differences. We found that parasite-derived variables were most appropriate for normalization as they separate sample groups and reduce noise within the data set. However, these post-analysis steps did not remove the contribution from the host erythrocyte, in the form of membrane rich ‘ghosts’, and levels of technical sample variation persisted. In fact, we found that host contamination is as influential on the metabolome as sample treatment. This analysis also identified metabolites with potential to be used as markers to quantify host contamination levels. In conclusion, purification methods and normalization choices during the collection and analysis of untargeted metabolomics heavily affect the interpretation of results. Our findings provide a basis for development of improved experimental and analytical methods for future metabolomics studies of P. falciparum and other intracellular organisms. Importance Molecular characterization of pathogens, such as the malaria parasite, can lead to effective treatment strategies and improved understanding of pathogen biology. However, the distinctive biology of the Plasmodium parasite, such as its repetitive genome and requirement of growth within a host cell, hinders progress towards this goal. Untargeted metabolomics is one promising approach to learn about pathogen biology and how it responds to different treatments. By measuring many small molecules in the parasite at once, we gain a better understanding of important pathways that contribute to this response. Although increasingly popular, protocols for parasite isolation from the host cell and various analysis options are not well explored. The findings presented in this study emphasize the critical need for improvements in these areas to limit misinterpretation due to host metabolites and correct for variations between samples. This will aid both basic biological investigations and clinical efforts to understand important pathogens.

parasite, Plasmodium falciparum, both targeted and untargeted metabolite detection 23 has improved our understanding of pathogenesis, host-parasite interactions, parasite 24 response to antimalarials, and impacts of resistance. However, protocols for purification 25 are not optimized for investigations of intracellular pathogens and noise-limiting analysis 26 parameters are not well defined. To explore influential parameters, we purified a diverse 27 set of in vitro grown intra-erythrocytic P. falciparum parasites for untargeted 28 metabolomics studies. Following metabolite identification, data processing included 29 normalization to double stranded DNA, total protein, or parasite number to correct for 30 different sample sizes and stage differences. We found that parasite-derived variables 31 were most appropriate for normalization as they separate sample groups and reduce 32 noise within the data set. However, these post-analysis steps did not remove the 33 contribution from the host erythrocyte, in the form of membrane rich 'ghosts', and levels 34 of technical sample variation persisted. In fact, we found that host contamination is as 35 influential on the metabolome as sample treatment. This analysis also identified 36 metabolites with potential to be used as markers to quantify host contamination levels. 37 In conclusion, purification methods and normalization choices during the collection and 38 analysis of untargeted metabolomics heavily affect the interpretation of results. Our Introduction 57 Malaria continues to be responsible for hundreds of thousands of deaths annually, most 58 of which result from infection with the protozoan parasite, Plasmodium falciparum (1). 59 Characterization of the biology of this important pathogen can lead to improved 60 treatment strategies. The molecular mechanisms behind interesting P. falciparum 61 phenotypes are challenging to understand due to a lack of traditional methods of 62 investigation in this organism, such as forward and reverse genetics. Unbiased 'omics 63 approaches (transcriptomics and proteomics) are widely used but the limited annotation 64 of the parasite genome makes these data sets challenging to interpret. One way to 65 Influential parameters for P. falciparum metabolomics 4 alleviate this lack functional knowledge is to use network-based modeling to facilitate 66 data interpretation (2). Additionally, the measurement of direct mediators of the 67 phenotype, such as metabolite reactants and products of enzymatic reactions, can 68 improve our ability to make predictions about cellular function under certain conditions. 69 For this reason, metabolomics is becoming increasingly popular to study P. falciparum 70 (3-12). These studies have allowed for a greater understanding of malaria pathogenesis 71 (13), strain-specific phenotypes (11), and host-parasite interactions (9). Although 72 metabolomics can successfully identify metabolic signatures that correlate well with 73 biological function, such as time-and dose-dependent response to antimalarial 74 treatment (3, 5) and resistance-conferring mutations (12), there are distinct challenges 75 that need to be considered when performing metabolomic studies in P. falciparum. 76 Challenges such as host contamination, limited parasite yield, and parasite 77 stage-specificities arise due to certain properties of this organism (see Table 1). For 78 example, experimental samples typically have few parasites and abundant host 79 material. One contributing factor is that parasitemias are limited during in vitro culture 80 and clinical infections (<5% or five infected erythrocytes per 100 total (14, 15)). 81 Additionally, P. falciparum is an intracellular parasite during the asexual cycle in the 82 human blood stream; the host erythrocyte accounts for up to a 10-fold more cellular 83 material over early state parasites (16,17). Due to our ability to enrich for late stage 84 parasites using magnetic purification (18), the study of the larger later stage parasite 85 has historically allowed for efficient genomic, transcriptomic and proteomic analysis of 86 parasite biology. These stages have typically been thought of as more metabolically 87 active than the early stage parasites due to increased activity of well-studied cellular 88 Influential parameters for P. falciparum metabolomics 5 pathways, including robust hemoglobin degradation (19), nuclear genome replication, 89 and protein synthesis (20,21). The study of the smaller early stage of the parasite is 90 particularly hard to achieve due to difficulty isolating adequate amounts of parasite 91 material as a result of few effective enrichment methods (22). Thus, studies must be 92 designed in a manner to overcome these challenges, limiting sample-to-sample 93 variation and optimizing metabolite recovery (i.e. total number of metabolites detected). 94 In this study, we sought to define critical parameters that would help overcome 95 these challenges and allow the collection of high quality metabolomics data. We show 96 that diverse sample groups can be differentiated, but the choice of analytic parameters 97 for data processing and host cell contamination both heavily influence the parasite 98 metabolome. In particular, we investigated normalization approaches to assess the 99 impact of host contamination and found that the adjustment to parasite-derived 100 variables better remove sample noise. However, even appropriate normalization fails to 101 remove host noise completely, as host contamination is as influential on metabolome as 102 sample treatment. Thus, we propose that the combination of improved purification and 103 analytic parameters will generate more accurate measures of the metabolome, 104 increasing the utility of unbiased metabolomics to investigate intracellular parasite 105 biology.

108
Parasite sample groups are metabolically distinct 109 To ensure our metabolomics approach can identify obvious differences in sample 110 groups, we compared parasite groups that differed in stage, origin, and growth 111 Influential parameters for P. falciparum metabolomics 6 conditions ( Fig. 2A). Distinct purification procedures were used for preparation of each 112 sample group (see Materials and Methods and Fig. 1), resulting in different amounts of 113 parasite material (Fig. 2B, Table S1). Replicates of sample group 1, which were merely 114 lysed from host cells with a mean parasitemia of 1.14%, contained between 1.3-6.9 x 115 10 6 total parasites. Sample group 2 was enriched for late stage parasites using 116 magnetic purification to a mean parasitemia of 53.6% (Table S1). These replicates 117 contained between 4.7 x 10 7 to 6.7 x 10 8 total parasites (up to 100-fold more individual 118 parasites). Despite these differences, mean protein abundance was insignificantly 119 different across replicates of each sample group and was more variable in sample 120 group 2 (group 1 SD: 12.7, group 2 SD: 38.2, see supplementary information for code 121 and replicating DNA, and, thus, have increased and variable genome copy number per cell. 126 Protein does not correlate with parasite number or DNA abundance (data not shown, 127 see supplementary information for code). 128 We conducted metabolomics on the samples described above (Fig. 1). Cultured 129 parasites were lysed from host erythrocytes and analyzed via UPLC-MS. In comparison 130 1, we detected 375 total metabolites that were annotated by Metabolon, Inc.; 143 of 131 these were detected in every sample and represented 10 energy associated 132 metabolites, 159 lipid species, 108 peptides and amino acids, 40 nucleotides, 28 133 cofactors, 20 carbohydrates, and 10 others (Fig. 2C). Samples from group 1 contained 134 Influential parameters for P. falciparum metabolomics 7 between 182-242 metabolites while those from group 2 contained between 267-368 135 metabolites (Fig. 2C). Fifteen metabolites are found in every group 1 sample, but not all 136 group 2 samples, and 111 metabolites are found in every group 2 sample but not all 137 group 1 samples. Thus, distinct samples, due to parasite origin, stage, growth 138 conditions, and purification differences, have distinct metabolomes.  (Fig. 2D). In all cases, principle 148 component (PC) 1 primarily represents between group variation, and PC2 represents 149 within group variation (Fig. 2D). Without normalization, PC 1 and 2 summarize 78.4% of 150 sample variation. These principal components from parasite number and DNA 151 normalization summarize 87.7 and 80.6% of sample variation, respectively. With protein 152 normalization, 79.1% of variation is summarized. PC2 tends to separate sample group 1 153 better than those samples within group 2 (Fig. 2D). 154 The metabolites that most contribute to group or sample variation are not the 155 same with each normalization approach (Table S2). Thus, metabolome differences 156 between groups are dependent on normalization approach. Yet, there are several 157 Influential parameters for P. falciparum metabolomics 8 striking trends across analyses. For example, the PC structure following protein 158 normalization closely mimics that of the unnormalized data and, similar metabolites 159 contribute to PC1 and PC2 in both analyses. Sphingomyelin species contribute to within 160 group variation (PC2), and orotidine and dipeptides contribute to between group 161 variation (PC1 ; Table S2). Upon DNA or parasite number normalization, phenylalanine, 162 tryptophan, leucine, putrescine, and sedoheptulose 7-phosphate contribute to PC2, or 163 within group variation (Table S2) Beyond comparing the metabolomes of artificially distinct samples groups, we explored 180 Influential parameters for P. falciparum metabolomics 9 the metabolic changes induced by antimalarial treatment. We collected metabolomics 181 from treated and untreated early stage parasites that were identical in growth conditions 182 and purification approach, and were matched for blood batch (Fig. 3A, Table S1, see 183 Materials and Methods for group 1). Following data processing, the metabolomes of 184 antimalarial treated and untreated parasites fail to cluster via PCA (Fig. 3B). 185 Accordingly, univariate statistical analysis revealed no differentially abundant 186 metabolites between treated and untreated samples (see supplemental information for 187 code).  197 To further explore the host contribution to the metabolome, we built two Random 198 Forest classifiers to identify metabolites that are associated with either erythrocyte 199 ghosts or antimalarial treatment. We first built a classifier to predict blood batch in early-200 stage parasites (Fig. 3A). These samples likely have large host contribution due to the 201 inability to enrich for erythrocytes infected with early stage parasites. Ninety-five 202 metabolites (of 298), including AMP, ADP-ribose, aspartate, and sphingosine improved 203 Influential parameters for P. falciparum metabolomics 10 classifier accuracy in predicting blood batch (most influential depicted in Fig. 3D, see 204 supplemental information for code); the remaining metabolites had no effect on 205 classifier performance or worsened its predictive capabilities, indicating they are not 206 associated with blood batch due to high variability or association with other features that 207 differentiate samples. This classifier predicted blood batch with a 30% error rate. Thus, 208 a subset of the measured metabolome was predictive of blood batch. 209 To determine if blood batch is as influential on metabolome as antimalarial 210 treatment, we built a similar classifier to predict treatment within early stage samples 211 (Fig. 3A). Early stage parasites were classified into two treatment conditions with a 30% 212 class error rate. One hundred and eighteen metabolites (of 298)  Here, we explore metabolomics methods used in in vitro study of intraerythrocytic P. 220 falciparum. The parasite's intracellular lifestyle introduces challenges in implementing 221 traditional protocols, predominately due to limited amounts of parasite material and host 222 metabolite contamination. In our study, we sought to determine critical parameters for 223 the collection of high quality metabolomics data despite these challenges. In particular, 224 we investigated normalization approaches and conducted a detailed assessment of the 225 impact of host contamination. Overall, we found that only parasite-derived variables are 226 Influential parameters for P. falciparum metabolomics 11 best suited to use during normalization. Despite these analytic approaches, host noise 227 permeates the analysis, as host contamination is as influential on metabolome as 228 antimalarial treatment. Thus, improvements in both purification and analytic parameters 229 must be combined to generate accurate metabolomes and increase our ability to learn 230 more about the parasite's biology. 231 Normalization of metabolite levels aims to limit technical or non-biological 232 variation, thus enhancing interpretation of results. Normalization can be calculated by a 233 variety of methods and is implemented either before or after analysis (Table 1 (24, 25)). 234 Often, pre-analysis normalization is conducted by isolating the same number of cells for 235 analysis (26)  studies is essential to ensure that parasite-derived metabolites, and not host-derived 247 metabolites, are measured and interpreted to make conclusions. 248 We explore three post-analysis normalization approaches: protein, DNA, and 249 Influential parameters for P. falciparum metabolomics 12 parasite number. We argue the host erythrocyte heavily contributes to protein 250 abundance, and, thus, this metric is not solely parasite-derived. In our analysis, this was 251 most clearly observed when comparing protein abundances between our sample 252 groups (Fig. 2B). We expected a proportional increase in protein amount as parasite 253 size increases throughout the intraerythrocytic life cycle (from sample group 1 to 2; early 254 to late stage); however, this increase was not detected, implicating host erythrocyte 255 contribution. Furthermore, heavy host contamination explains the observations that 1) 256 there is an increased level of protein variability in group 2 (explained by the wider range 257 in parasitemia level and thus host erythrocyte contribution, Table S1), 2) host/media 258 metabolites such as kynurinine, phenol red, and HEPES were detected in this analysis 259 (see below and supplemental data), and 3) protein normalization minimally changes the 260 PCA data structure and top contributing metabolites ( Fig. 2D and Table S2).  (Table 1). 274 Clearly, parasite-to-parasite sample variation can influence metabolomics data, 275 but we also found host erythrocyte material can heavily impact a sample's metabolome. 276 Many studies employ erythrocyte lysis prior to sample purification ((8, 32) and our 277 current study, see Materials and Methods). However, this approach does not eliminate 278 the potential for host contamination; host membrane fragments devoid of internal 279 components, colloquially referred to as erythrocyte "ghosts," remain in purified samples 280 (Fig. 3C). Despite this concerted effort to limit host metabolites through lysis, our 281 studies support heavy erythrocyte contribution to the P. falciparum metabolome. HEPES, as well as cholesterol (a metabolite excluded from parasite membranes (43, 296 44)) are correlated prior to normalization, and these correlations persist following 297 normalization. Moreover, phenol red contributed to the accuracy of our antimalarial 298 treatment classifier, further confirming that blood batch effects influenced the dataset. 299 Lastly, lipid species were the major class of metabolites detected in our analysis (Fig.   300 2C) and contributed heavily to PC2 from un-and protein-normalized data sets (Table   301 S2), perhaps due to the remaining erythrocyte membranes. These results add to the 302 overwhelming evidence of host cell and media contamination in untargeted 303 metabolomics studies of parasites. 304 Following these observations, we also explored the effect of different blood 305 batches on metabolome measurements. Because generating sufficient Plasmodium 306 biomass for adequate biological replicates is time-intensive, many experiments require 307 multiple batches of human blood donations. To avoid batch effects, we controlled blood 308 batches across sample groups (Table S1). Prior to these studies, we predicted that the 309 blood batch would have some effects on the metabolome; we did not anticipate, 310 however, that it would be as influential as known stressors, like treatment with 311 antimalarials with established metabolic effects (3, 5). Several results from our analysis 312 support this observation. First, samples from either treatment group did not cluster via 313 PCA (Fig. 3B). Second, we detected none-to-few metabolites whose levels were 314 significantly different between conditions (zero between with and without antimalarial 315 treatment and 1 between various blood batches). Lastly, classifiers from both treatment 316 and blood batch predicted samples with equal accuracy (30% error rate, top predictive 317 metabolites displayed in Fig. 3D and E). Overall, from these analyses, we concluded 318 Influential parameters for P. falciparum metabolomics 15 that sample-to-sample variation exceeded variation associated with either group. We 319 also found 1-arachidonoyl-GPE to be significantly different in abundance across blood 320 batches, which can be explored as a potential biomarker of host contamination. To 321 expand on this idea, we were also able to predict a set of metabolites that are most 322 likely to be host erythrocyte-derived (or influenced by host environment) by identifying 323 the metabolites that are most predictive of blood batch (Fig. 3D). Additional 324 investigations are required since these metabolites may be parasite-derived but only 325 produced when they are in particular environments (e.g. blood batches). Going forward, 326 it may be possible to use these metabolites to quantify host cell contribution to 327 metabolome and assess parasite sample purity or control for host contamination during 328 analysis. 329 Overall, the methodology and findings from the current study provide a basis for  Table S1) to maintain parasitemia between 1-3%, with change of culture medium 354 every other day ( Fig. 1; Step 1). Cultures were incubated at 37 o C with 5% oxygen, 5% 355 carbon dioxide and 90% nitrogen (14). Some samples were treated with antimalarials 356 with metabolic effects to maximize differences between groups (see below and 357 Antimalarial treatment in Table S1).

359
For isolation of sample group 1, two distinct laboratory-adapted clinical isolates of P. Cultivation above). After the late stage population was confirmed using microscopy, 366 cultures were checked every one to two hours for the development of newly invaded 367 ring stage parasites. If the parasites were treated with antimalarials, it was performed at 368 this stage. Fourteen flasks containing early ring-stage parasites (<3 hours post invasion) 369 were subsequently lysed from the erythrocyte membrane using 0.15% saponin, as 370 previously described (46) ( Fig. 1; Step 3). Prior to lysis, sampling of parasite material 371 was taken for determination of erythrocyte count (hemocytometer) and parasitemia 372 (SYBR-green based flow cytometry (47) analysis. This procedure was performed five times for each parasite line to provide 10 380 replicates for group 1 metabolomic analysis. Additionally, matched parasites (same 381 parasite lineage, media type, stage, blood batches, and purification methods) were also 382 grown without drug treatment (Table S1) to generate 10 additional samples for group 1 383 untreated (see second comparison in Fig. 3). 384 For isolation of sample group 2, two Dd2-derived laboratory-adapted clones of P.  Table S1) first underwent an initial sorbitol 387 Influential parameters for P. falciparum metabolomics 18 synchronization step as above. The resultant early stage parasites were then incubated 388 at 37°C in cRPMI to allow for the successful transition of P. falciparum to the late 389 trophozoite and schizont stages, occurring 24 to 30 hours after initial synchronization. 390 Next, this predominantly late stage population was enriched through magnetic 391 purification using a MACS quad-magnet and MACS multistand (Miltenyi Biotech,392 Bergisch Gladbach, Germany), as previously described (18) ( Fig. 1; Step 2). Briefly, 393 parasite cultures were passed through LS columns with attached sterile syringe needles 394 (BD Biosciences, San Jose CA) at a rate of 2-3 seconds per drop. A series of two to 395 three column washes were performed with 5 ml of warmed cRPMI. To elute the desired 396 material, the column was removed from the magnet prior to adding 5 ml of cRPMI.

397
Column flow-through from 5 flasks containing late stage parasites was allowed to 398 recover in cRPMI for 30 min at 37 o C prior to saponin lysis, as described above ( Fig. 1;   399 Step 3). Determination of parasite count and protein quantification, as well as 400 subsequent sample washing and freezing, were performed as described above for 401 sample group 1. This procedure was performed five times for each parasite line to 402 provide 10 samples for group 2 metabolomic analysis. centrifuged for extraction ( Fig. 1; Step 4). Sample extracts were dried and reconstituted 411 in solvents containing standards (see below) at fixed concentrations to ensure injection 412 and chromatographic consistency. Waters AQUITY ultra-performance liquid 413 chromatography (UPLC) and Thermo Scientific Q-Exactive high resolution/accurate 414 mass spectrometer were used for metabolite detection ( Fig. 1; Step 5) (Fig 1; Step 6). Resultant processed metabolite abundances were used for 436 univariate and multivariate statistics, as well as classification. All analyses were 437 conducted using R (51-59). Welch's t-tests were used to compare group means for 438 differential abundance determination, assuming unequal variance and normal 439 distribution, and p-values were adjusted using a false discovery rate. The significance 440 cutoff is 0.05. See supplementary information for code and detailed analysis.     Fig. 2A), polyploid genome -Can use magnetic enrichment ( Fig. 1) Mixed stages -Effects of stage variation on data Media batches -Relevant if using serum-based media formulations Blood batches -Must be recorded and ideally matched within comparisons (Table S1) -Useful to assess host contamination levels ( Fig. 3D) Additional controls Uninfected erythrocytes -Use to identify host metabolites -Does not replace normalization

Enrichment methods
Saponin, other lytic reagents -Compatible with all stages (Fig. 1) -Parasites remain in ghosts (Fig. 3C) -Need improved methods that isolate parasite from host cell Magnetic purification -Increases parasite to host ratio ( Fig. 1