Determination of the Minimum Sample Amount for Capillary Electrophoresis-Fourier Transform Mass Spectrometry (CE-FTMS)-Based Metabolomics of Colorectal Cancer Biopsies

The minimum sample volume for capillary electrophoresis-Fourier transform mass spectrometry (CE-FTMS) useful for analyzing hydrophilic metabolites was investigated using samples obtained from colorectal cancer patients. One, two, five, and ten biopsies were collected from tumor and nontumor parts of the surgically removed specimens from each of the five patients who had undergone colorectal cancer surgery. Metabolomics was performed on the collected samples using CE-FTMS. To determine the minimum number of specimens based on data volume and biological interpretability, we compared the number of annotated metabolites in each sample with different numbers of biopsies and conducted principal component analysis (PCA), hierarchical cluster analysis (HCA), quantitative enrichment analysis (QEA), and random forest analysis (RFA). The number of metabolites detected in one biopsy was significantly lower than those in 2, 5, and 10 biopsies, whereas those detected among 2, 5, and 10 pieces were not significantly different. Moreover, a binary classification model developed by RFA based on 2-biopsy data perfectly distinguished tumor and nontumor samples with 5- and 10-biopsy data. Taken together, two biopsies would be sufficient for CE-FTMS-based metabolomics from a data content and biological interpretability viewpoint, which opens the gate of biopsy metabolomics for practical clinical applications.


Introduction
According to the World Health Organization, in 2020, colorectal cancer ranked third in incidence and second in deaths among malignant tumors [1]. Since colorectal cancer is commonly encountered in clinical practice, the analysis of cancer characteristics at the molecular level and the development of therapeutic agents based on these characteristics have proceeded at a faster pace than for other cancers. Fluorouracil (5-FU) was developed in 1957 to play a central role in the treatment of colorectal cancer [2]. In the 1990s, the efficacy of irinotecan and oxaliplatin was demonstrated [3][4][5][6], and after the 2000s, the usefulness of bevacizumab, cetuximab, panitumumab, ramucirumab, and other molecular targeting agents was demonstrated, expanding treatment options [7][8][9][10][11][12].
In recent years, the development of therapeutic agents for colorectal cancer has been dominated by the development of new molecularly targeted agents. The use of genomics, a type of omics, is essential for the development of these molecularly targeted drugs. Omics analysis includes genomics targeting DNA sequences, transcriptomics targeting RNAs, proteomics targeting proteins, and metabolomics targeting metabolites, among others. Genomics is currently the primary application field in clinical practice. However, limited information can be provided on the actual phenotypes since genomics is based on the analysis of upstream gene sequences in homeostasis. Conversely, metabolomics analyzes the most downstream metabolites in homeostasis and thereby allows a better understanding of signals directly associated with phenotypes. Thus, in the future, therapeutic agents targeting metabolites identified by metabolomics.
Several studies have shown metabolites and metabolic pathways characteristic of colorectal cancer using real tissues, and these metabolites are mostly measured by capillary electrophoresis (CE), liquid chromatography, and gas chromatography (GC) connected to mass spectrometry (MS) [13,14]. Among these, CE-MS is best suited for analyzing ionic metabolites, especially highly charged or phosphate compounds, the main components of energy metabolism in cancer. However, in these conventional methods, the required specimen amount is up to 20-40 mg, and the difficulty of obtaining a sufficient amount of specimen using nonsurgical means has greatly limited their clinical application. Therefore, capillary electrophoresis-Fourier transform mass spectrometry (CE-FTMS) has been developed and applied, showing approximately tenfold higher sensitivity than conventional CE connected to time-of-flight MS.
The development of CE-FTMS may enable the analysis with a smaller sample volume and actual clinical application in the future; however, the specific minimum amount of sample has not been clarified for CE-FTMS-based metabolome analysis. Therefore, this study aimed to evaluate the minimum amount of biopsied samples needed to ensure the quality of metabolomics by collecting colorectal tumor specimens and analyzing them with CE-FTMS.

Specimen Collection and Pretreatment
This study was approved by the ethical review committee of Tokyo Medical and Dental University Hospital (M2019-225). All biopsied samples were collected from five patients with colorectal cancer at Tokyo Medical and Dental University Hospital. Based on preoperative examination findings, we selected lesions with sufficient tumor volume that would not affect the diagnosis even if tumor tissue was collected. Tissue collection was initiated within 15 min after surgically removing the specimen. Using biopsy forceps for lower gastrointestinal endoscopy, 1, 2, 5, and 10 sites were taken from the tumor sites, and each was placed by batch. They were also similarly collected from the normal mucosa and placed in a batch. The collected tissues were frozen with liquid nitrogen in batches and stored in a freezer at ≤−80 • C until metabolome analysis.

Metabolite Extraction
Metabolite extraction and metabolome analysis were conducted at Human Metabolome Technologies, Inc. (HMT), Tsuruoka, Japan. Biopsied frozen tissue samples were weighed and placed in homogenization tubes along with zirconia beads (5 mm and 3 mm). Next, 50% of acetonitrile/Milli-Q water containing internal standards (H3304-1002, HMT, Tsuruoka, Yamagata, Japan) was added to the tubes, and samples were completely homogenized at 1500 rpm at 4 • C for 60 s using a bead shaker (Shake Master NEO, Bio-Medical Science, Tokyo, Japan). Then, the homogenate was centrifuged at 2300× g at 4 • C for 5 min. Subsequently, the upper aqueous layer was centrifugally filtered through a Millipore 5-kDa cutoff filter (UltrafreeMC-PLHCC, HMT) at 9100× g at 4 • C for 180 min to remove macromolecules. The filtrate was evaporated to dryness under a vacuum and reconstituted in Milli-Q water for metabolome analysis at HMT.

Metabolome Analysis
Metabolome analysis was conducted using HMT's ω Scan package with CE-FTMS based on the previously described methods [15]. Briefly, CE-FTMS analysis was performed using an Agilent 7100 CE capillary electrophoresis system equipped with a Q Exactive Plus (Thermo Fisher Scientific Inc., Waltham, MA, USA), an Agilent 1260 isocratic HPLC pump, an Agilent G1603A CE-MS adapter kit, and an Agilent G1607A CE-ESI-MS sprayer kit (Agilent Technologies, Inc., Santa Clara, CA, USA). The systems were controlled by the Agilent MassHunter workstation software LC/MS data acquisition for 6200 series TOF/6500 series Q-TOF version B.08.00 (Agilent Technologies) and Xcalibur (Thermo Fisher Scientific) and connected by a fused silica capillary (50 µm i.d. × 80 cm total length) with commercial electrophoresis buffer (H3301-1001 and I3302-1023 for cation and anion analyses, respectively; HMT) as the electrolyte. The spectrometer was scanned from m/z 60 to 900 and from m/z 70 to 1050 in positive and negative modes, respectively [16]. Peaks with S/N > 3 were extracted using MasterHands 2.18.0.1, an automatic integration software (Keio University, Tsuruoka, Yamagata, Japan), to obtain peak information, including m/z, peak area, and migration time (MT) [16]. Signal peaks corresponding to isotopomers, adduct ions, and other product ions of known metabolites were excluded, and the remaining peaks were annotated based on their m/z values and MTs using HMT's metabolite database, which was developed by running authentic chemical standards under the same analytical conditions. Areas of the annotated peaks were then normalized to internal standards and also by sample weights to obtain relative levels of each metabolite (Table S1).

Statistical Analysis
Principal component analysis (PCA) [17] was performed using the HMT's proprietary R program. Statistical significance was evaluated using Welch's t-test, and detected metabolites were plotted on metabolite pathway maps using VANTED 2.1.0 software [18]. For subsequent data analyses, as a pre-processing of metabolome data, metabolites with missing values of ≥5 out of 10 samples were excluded for statistical analysis. By default, missing values were imputed by 1/5 of the minimum positive values of each detected metabolite, and metabolite levels were transformed to z values (mean-centered and divided by the standard deviations of each metabolite). Hierarchical clustering analysis was conducted using the MeV v4.9.0 software with Euclidean distance as the distance calculation method [19]. Quantitative metabolite set enrichment analysis (QMSEA) was performed using the MetaboAnalyst 5.0 software [20,21]. The Kyoto Encyclopedia of Genes and Genomes database was selected as the metabolite set library [22]. Random forest was performed with twofold cross-validation to make the binary classification model. The number of metabolites in each tree was optimized, and the number of decision trees for ensembles was set at 500. Metabolite selection was performed using recursive feature elimination and fivefold cross-validation. The importance of metabolites in the random forest model was measured by the mean decrease in accuracy. All computations regarding the random forest were performed using the caret package in R.

Patient Characteristics
Patient characteristics are shown in Table 1. All five colorectal cancers were left-sided colorectal cancers, and four of these were well-differentiated adenocarcinomas. Table 2 shows the number of metabolites detected in the different biopsy samples after mass correction. Figure 1 shows the tissue weight on the X-axis and the number of metabolites detected on the Y-axis. The average numbers of metabolites detected in 1, 2, 5, and 10 pieces of biopsied samples were 424 ± 34 (average ± SD), 458 ± 18, 455 ± 23, and 450 ± 22, respectively, and thus, in >2 pieces of biopsied samples, the number of detected metabolites was >450. As a result, the number of metabolites detected in 1 piece of the biopsied sample was significantly lower than that in 2, 5, and 10 pieces of the biopsied sample (vs. 2 pieces, p = 0.015; vs. 5 pieces, p = 0.027; vs. 10 pieces, p = 0.058). The number of metabolites detected among 2, 5, and 10 pieces was not significantly different.

PCA and Heat Maps
PCA showed that the tumor and nontumor sites were separated by the PC2 axis (Figure 2). Most samples from the same patient were plotted close to each other. However, A1-N1, A2-T1, and A3-N1 were separated by the PC1 axis but showed a similar trend to  The number of metabolites detected in the tumor and nontumor sites was not significantly different.

PCA and Heat Maps
PCA showed that the tumor and nontumor sites were separated by the PC2 axis ( Figure 2). Most samples from the same patient were plotted close to each other. However, A1-N1, A2-T1, and A3-N1 were separated by the PC1 axis but showed a similar trend to the separation of tumor and nontumor sites in the PC2 axis.   Figure 3A shows the heat map with clustering; the heat map suggests that metabolomic profiles in one piece of the biopsied sample tend to be different from those in other samples. Figure 3B shows the heat map created using only those metabolites detected in 2, 5, and 10 pieces of biopsies, showing a statistically significant difference between tumor and nontumor sites, when excluding data from one biopsy. Figure 3B visually shows metabolite differences between tumor and nontumor sites, with similar metabolite sets detected in each of the two sites.

Random Forest
Finally, random forest analysis with twofold cross-validation was performed using 2-biopsy data to develop a binary classification model for distinguishing tumor and nontumor samples. As a result, 15 metabolites were selected as multi-metabolite markers based on their variable importance ( Table 4). The top three metabolites, 5-hydroxyindoleacetic acid (5-Hydroxy-IAA), indoleacetaldehyde, and formylanthranilate, are all Trp metabolites. 5-Hydroxy-IAA was significantly lower in tumor tissues (p < 0.012), whereas formylanthranilate was rather higher (p < 0.037). Then, the classification model was applied to 5-and 10-biopsy data and predicted the tumor or nontumor status with 100% accuracy, suggesting the possibility that 2 biopsies could be sufficient for developing a classification model that can distinguish tumors and nontumors as accurately as when using 5 or 10 biopsies (Figure 4).

Discussion
In this study, metabolomic analysis of colorectal biopsies was performed using the newly developed CE-FTMS and examined for the minimum number of specimens required for analysis. The metabolic characteristics of colorectal cancer were also examined based on the results of the CE-FTMS analysis.

Discussion
In this study, metabolomic analysis of colorectal biopsies was performed using the newly developed CE-FTMS and examined for the minimum number of specimens required for analysis. The metabolic characteristics of colorectal cancer were also examined based on the results of the CE-FTMS analysis.
Results of the CE-FTMS analysis showed that the number of detected metabolites was equivalent if ≥2 pieces of biopsies were used. In addition, Figure 1 implies that we need roughly 5 mg or more to secure appropriate data in terms of the number of detected metabolites. The weight of one piece of biopsy, however, varies significantly (from 1.0 to 10.6 mg), and thus, there is a risk in using just one piece of biopsy sample for CE-FTMSbased metabolomics and biological interpretation. Not only the number of metabolites detected but also the metabolomic profiles also resembled each other among the data obtained by ≥2 biopsies. Indeed, in the heat map, the detected metabolite profiles in 2, 5, and 10 biopsies were similar. QMSEA, using the data from >2 biopsies, identified 10 common pathways enriched in tumor and nontumor comparisons. Previous studies have also shown that most of these 10 pathways are altered in colorectal cancer metabolism. Furthermore, the classification model developed based on 2-biopsy data perfectly predicted tumor or nontumor status when applied to the 5-and 10-biopsy data, suggesting that a crucial metabolite set for distinguishing two groups can be captured with 2-biopsy data. Thus, CE-FTMS can detect the same biological features as conventional analysis methods with a smaller sample amount, such as biopsy specimens. Therefore, the minimum number of biopsies required for CE-FTMS analysis was considered to be two pieces (average 8.2 ± 4.6 mg in the mass). Since previous studies using conventional methods required sample volumes of 50-100 mg, CE-FTMS, which can perform accurate analysis with an average sample volume of 8.2 mg, is considered very useful clinically [23,24].
Pathway map results are particularly important for the clinical application of metabolomic analysis results. In recent years, cancer metabolic pathways have been attracting attention in the fields of tumor markers and new drug development; however, many aspects of metabolic pathways in colorectal cancer are still unclear. Among the pathways that showed significant differences in this study, pathways particularly relevant to cancer metabolism will be discussed.
Random forest analysis generated a tumor versus nontumor classification model comprising 15 metabolites; however, interestingly, tryptophan metabolites occupied the top three in the list, which echoes the results obtained in QMSEA. Indeed, the top three metabolites, 5-hydroxyindoleacetic acid (5-Hydroxy-IAA), indoleacetaldehyde, and formylanthranilate, represent three major pathways in tryptophan metabolism: serotonin, indole, and kynurenine ( Figure 5). In general, in cancer metabolism, indoleamine-2,3-dioxygenase (IDO)1, IDO2, and tryptophan-2,3-dioxygenase (TDO2) are activated in the first step of tryptophan degradation [25,26]. This phenomenon results in the accumulation of kynurenine, which suppresses T-cell differentiation and function and promotes immune tumor escape. This study showed that serotonin and indole pathways were enhanced in nontumor sites of the colon, whereas the kynurenine pathway was predominantly enhanced in tumor sites, suggesting the promotion of immune escape in the tumor regions.
In nitrogen metabolism ( Figure S1), glutamine has reportedly been metabolized more than other nonessential amino acids in cancer cells [27]. In the present study, glutamine metabolism was enhanced in tumor parts, suggesting increased glutamate production. MYC and KRAS (G12D mutation) are thought to be involved in this glutamine metabolism. In colorectal cancer, regardless of the presence or absence of KRAS mutations, glutamine is absorbed into the cell to produce fatty acids, proteins, and nucleic acids essential for cell survival and growth [28]. To facilitate glutamine entering the cell and activating the TCA cycle, glutaminase must be activated to change glutamine to glutamate, and previous studies have shown that this reaction is enhanced in colon cancer [29]. In this study, this mechanism may have resulted in decreased glutamine and increased glutamate levels at the tumor site. oxygenase (IDO)1, IDO2, and tryptophan-2,3-dioxygenase (TDO2) are activated in the first step of tryptophan degradation [25,26]. This phenomenon results in the accumulation of kynurenine, which suppresses T-cell differentiation and function and promotes immune tumor escape. This study showed that serotonin and indole pathways were enhanced in nontumor sites of the colon, whereas the kynurenine pathway was predominantly enhanced in tumor sites, suggesting the promotion of immune escape in the tumor regions. Figure 5. Tryptophan metabolism. Blue and red bars represent nontumor and tumor sites, respectively. Serotonin and indole pathways were relatively enhanced in nontumor sites of the colon, whereas the kynurenine pathway was predominantly enhanced in tumor sites. In purine and pyrimidine metabolism ( Figures S2 and S3), these metabolic pathways may reflect the status of nucleic acid synthesis. In purine metabolism, both AMP and GMP were increased in tumor sites. In general, in adenosine metabolism, increased conversion of ATP to ADP and ADP to AMP implies increased energy expenditure. In guanosine metabolism, increased GMP also indicates a similar event. In adenosine metabolism, the AMP is increased at the tumor site, and in guanosine metabolism, the GMP is increased at the tumor site. This phenomenon may be due to the following two reasons: first, the synthesis of nucleotides at the tumor site may have increased energy consumption and enhanced conversion from ATP and ADP; second, the purine salvage pathway may have been enhanced at the tumor site, resulting in increased AMP and GMP production from adenine and guanine [30]. In pyrimidine metabolism, although no significant difference was observed in UMP between the tumor and nontumor sites, UDP and UTP were significantly enhanced in the tumor area. Therefore, RNA synthesis is also enhanced by pyrimidine metabolism.
In cysteine and methionine metabolism ( Figure S4), the results suggest that cystathionine, a peripheral substance in the methionine circuit, is significantly higher at the tumor site. Furthermore, cysteine, its peripheral substance, was significantly enhanced at the tumor site in its conversion to cystine. The majority of malignant cells are in an oxidative state due to cellular metabolism changes caused by oncogenes. Oxidative stress at the tumor site may have enhanced the conversion from cysteine to cystine. The mean value of cysteine/cystine in this study was 0.02 in the tumor and 0.10 in the nontumor sites. A lower cysteine/cystine ratio indicates greater exposure to oxidative stress [31], and this feature is more likely observed in the tumor than in nontumor sites.
Overall, the fact that two biopsies are sufficient is clinically useful. For example, it is practically impossible to obtain a 20-40 mg specimen, which is required for TOFMS-based metabolome analysis during pretreatment endoscopy; however, two biopsies can be easily performed. The ability to analyze such a small amount of specimen eliminates the need to resect the tumor site for analysis, enabling clinical applications with less invasive and less expensive procedures. A future challenge is to make the analysis more convenient and immediate. If the time required for metabolome analysis is further reduced, making a quick and detailed diagnosis simply by analyzing metabolites in biopsy specimens from the tumor site is possible in the future. Furthermore, it would be clinically significant to make decisions in selecting future colorectal cancer drugs targeting metabolites with a small biopsy specimen collected endoscopically.
Several limitations should be considered in this study. First, because this is a pilot study, the number of patients is small. In particular, a larger number of patients are needed to examine metabolic pathways. Second, the study was limited to patients with colorectal cancer. Since the histological type, genotype, and grade of cancer differ depending on the primary site, further studies are needed for other types of cancer. Third, the specimens in this study were not taken directly from patients but from surgically resected colon or rectum tissues. Therefore, there may be some differences in the metabolites detected when compared to biopsy samples directly collected from living subjects.

Conclusions
This study clarified that CE-FTMS-based metabolomic analysis is feasible with a minimum of 2 biopsies (8.2 ± 4.7 mg) to obtain data that are comparable when using 5 (16.3 ± 5.0 mg) or 10 (47.3 ± 21.0 mg) biopsies, which was supported by the number of identified metabolites and biological interpretability tested by QMSEA and random forest analysis. This paves the way for biopsy-based clinical metabolomics for tumor characterization and patient stratification in the future.
Author Contributions: Conceptualization, M.T. and K.K.; methodology, M.T. and K.K.; formal analysis, K.K., K.T. and H.Y.; data curation, T.S. and H.S.; writing-original draft preparation, T.S.; writing-review and editing, T.S., M.T. and K.K.; visualization, T.S. and K.K.; supervision, M.T., K.K. and Y.K.; project administration, T.S., M.T. and K.K. All authors have read and agreed to the published version of the manuscript. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author due to ethical concerns.