A Multiomics, Molecular Atlas of Breast Cancer Survivors

Breast cancer imposes a significant burden globally. While the survival rate is steadily improving, much remains to be elucidated. This observational, single time point, multiomic study utilizing genomics, proteomics, targeted and untargeted metabolomics, and metagenomics in a breast cancer survivor (BCS) and age-matched healthy control cohort (N = 100) provides deep molecular phenotyping of breast cancer survivors. In this study, the BCS cohort had significantly higher polygenic risk scores for breast cancer than the control group. Carnitine and hexanoyl carnitine were significantly different. Several bile acid and fatty acid metabolites were significantly dissimilar, most notably the Omega-3 Index (O3I) (significantly lower in BCS). Proteomic and metagenomic analyses identified group and pathway differences, which warrant further investigation. The database built from this study contributes a wealth of data on breast cancer survivorship where there has been a paucity, affording the ability to identify patterns and novel insights that can drive new hypotheses and inform future research. Expansion of this database in the treatment-naïve, newly diagnosed, controlling for treatment confounders, and through the disease progression, can be leveraged to profile and contextualize breast cancer and breast cancer survivorship, potentially leading to the development of new strategies to combat this disease and improve the quality of life for its victims.


Introduction
Breast cancer (BrCa) remains a formidable global health challenge, affecting more than two million women and their families each year and leading to almost 700,000 deaths globally [1].In 2020, female breast cancer had the highest incidence rate of all cancers in both sexes and in female-specific cancers, accounting for 11.7% of all cancers and 24.5% of female cancers [1].In the United States, breast cancer accounted for an estimated 31% of new female cancer cases and an estimated 15% of female cancer-related deaths in 2022 [2].As such, women with breast cancer represent the largest sub-population of those with cancer in the United States and worldwide.
The present study was established as part of a two-step investigation to address this need.The first step, reported herein, was designed to apply a comprehensive multiomic analysis (using untargeted genomics, proteomics (serum), metabolomics (plasma, urine, analyzed.For significance testing of parametric variables, Student's t-tests were conducted.For non-parametric statistics, the Wilcoxon Rank-Sum Test was performed.All statistical analysis for metadata was performed using R version 4.2.3.

Whole Genome Sequencing
The WGS protocol used in this experiment was similar to the protocol previously described by Meydan et al. and will be briefly summarized here [18].Stool samples were collected using Thorne's Gut Health Test, which provides metagenomic sequencing of the microbiome.To isolate microbial DNA from the samples, an automated protocol, and MoBio's PowerMag ® (+CleaMag ® ) microbiome DNA isolation kit were utilized on the KingFischer TM Flex Instrument.The concentration of extracted DNA from each sample and estimated sample purity were determined by Qubit measurement and spectrophotometry (A260/280 and A260/230 absorbance ratios), respectively.
Nextera XT Library Prep (Illumina, San Diego, CA, USA) was used to enzymatically fragment and tag primer sites for adapter index addition, creating next-generation sequencing libraries.Sequencing adapters and indices were added during polymerase chain reaction (PCR) amplification, followed by library verification by fragment analysis (Agilent Bioanalyzer, Agilent, Santa Clara, CA, USA).DNA sequencing was performed utilizing the Illumina NextSeq platform, which generated 350 M reads per sample (150 × 150 read length) and translated to 24× coverage.
Whole genome sequences were aligned to the hg38 reference genome using BWA mem with the default parameters [19].Sentieon LocusCollector and Dedup algorithms were used to remove duplicate reads [20].After indel realignment and base recalibration using Sentieon default parameters, Somatic SNVs, and SVs were called with the DNAscope algorithm [21].Known variants were annotated using dbSNP SNVs, Indels, and Mills, and 1000 genomes indels [22,23].After variant calling, variants were filtered by QC, including removing any sites with p > 0.25 or marked as low quality.
Polygenic risk score (PRS) matrices were collected from PGS Catalog [24] for signatures of breast cancer-specific scores and other breast cancer subtypes, as well as genetic signatures for 54 health and disease scores [25].PRS for each individual was calculated by scoring the variants of the subject using the PGS Catalog calculator (PGS Catalog Calculator (in preparation [0]).PGS Catalog Team.PGScatalog/pgsc_calc).Statistical comparison of healthy controls versus breast cancer survivors was performed on these scores using the Wilcoxon rank-sum test.

Proteomics via Aptamer
Proteomic profiling was conducted at SomaLogic Operating Co., Inc. (Boulder, CO, USA), using the SomaScan Assay.The SomaScan method has been previously described in detail by Gold and colleagues and Kim et al. [26,27].Briefly, SOMAmer (Slow Off-rate Modified Aptamers) reagents consist of short single-stranded DNA sequences with a high binding affinity for proteins.The diverse chemical properties allow for a greatly expanded library from which SOMAmer reagents are selected.SOMAmer reagents are discovered in vitro using the SELEX (Systemic Evolution of Ligands by EXponential enrichment) process, which incorporates chemically modified nucleotides with structural similarity to desired amino acid side chains.This enhances the specificity and affinity of the target protein binding interaction.
Initially, samples were diluted and incubated with SOMAmer reagent mixes attached to streptavidin (SA)-coated beads.The beads had been washed and tagged with an NHS-biotin reagent.SOMAmer complexes and unbound SOMAmer reagents were subjected to ultraviolet light to cleave a "photo-cleavable" linker within the SOMAmer reagent which released them into a solution containing an anionic competitor.The anionic competitor displaced the SOMAmer reagents in the eluent, which was then incubated with a second SA-coated bead.Subsequent washing removed the free SOMAmer regent, followed by a final elution step, in which the protein-bound SOMAmer reagent was released from the proteins when mixed with a denaturing agent.Quantification of SOMAmer reagents was achieved by hybridization to custom DNA microarrays, which were detected via their cyanine-3 signal.
Principle component analysis was performed to reduce the dimensionality of a dataset by identifying and retaining the most significant features.The distance matrix from generalized unifrac metrics was fed into t-distributed stochastic neighbor embedding (t-SNE) to project the samples into a two-dimensional space [28].Log-normal univariate (Model A) and multivariate modeling (Model B and Model C) were performed.Model A was an uncorrected comparison for BCS versus healthy cohort.Model B corrected for age and menopause status and Model C corrected for age, menopause, and treatment type.Pathway enrichment analysis was performed, and the top 50 pathways were presented for each model.
The peak of the individual analyte product ions was measured against the peak of the corresponding internal standards.Quantitation was performed using a weighted linear least squares regression analysis generated from fortified calibration standards and prepared immediately prior to each run.
LC-MS/MS raw data were collected and processed using AB SCIEX software Analyst 1.6.3 and SCIEX OS-MQ software v1.7.Data reduction was performed using Microsoft Excel for Office 365 v.16.
Calibration samples were prepared at eight different concentration levels by spiking a phosphate-buffered saline/bovine serum albumin (PBS/BSA) solution with a corresponding calibration spiking solution.Calibration samples, study samples, and quality control samples were spiked with a solution of isotopically labeled internal standards and subjected to protein separation with acidified methanol (organic solvent).Following centrifugation, an aliquot of supernatant was evaporated and dried in a stream of nitrogen.The dried extracts were reconstituted and injected on an Agilent 1290 Infinity/SCIEX QTRAP 6500 LC-MS/MS system equipped with a C18 reverse phase UHPLC column.The mass spectrometer was operated in negative mode using electrospray ionization (ESI).
The peak area of each bile acid parent (pseudo-MRM mode) or the product ion was measured against the peak area of the respective internal standard parent (pseudo-MRM mode) or the product ion.Quantitation was performed using a weighted linear least squares regression analysis generated from fortified calibration standards prepared immediately prior to each run.
LC-MS/MS raw data were collected using SCIEX software Analyst 1.7.3 and processed using SCIEX OS-MQ v.1.7.Data reduction was performed using Microsoft Excel for Office 365 v.16.

Fatty Acid Panel
In order to assess red blood cell (RBC) fatty acids, dried blood spot (DBS) samples were collected according to manufacturer instructions (OmegaQuant Analytics, LLC.; Sioux Falls, SD, USA).Samples were processed and analyzed according to OmegaQuant Omega-3 Index ® methodology, as follows.A drop of blood was collected on filter paper that was pre-treated with a cocktail solution (Fatty Acid Preservative Solution, FAPS™) and allowed to dry at room temperature for 15 min.Following drying, samples were kept at 2-8 • C. The DBS was shipped to OmegaQuant for fatty acid analysis.One punch of the DBS was transferred to a screw-cap glass vial followed by the addition of BTM (methanol containing 14% boron trifluoride, toluene, methanol; 35:30:35 v/v/v; Sigma-Aldrich, St. Louis, MO, USA).The vial was briefly vortexed and heated in a hot bath at 100 • C for 45 min.After cooling, hexane (EMD Chemicals, Gibbstown, NJ, USA) and HPLC-grade water were added.Subsequently, the tubes were recapped, vortexed, and centrifuged to help separate layers.An aliquot of the hexane layer was transferred to a gas chromatography (GC) vial.GC was carried out using a GC-2010 Gas Chromatograph (Shimadzu Corporation, Columbia, MD, USA) equipped with an SP-2560, 100-m fused silica capillary column (0.25 mm internal diameter, 0.2 um film thickness; Supelco, Bellefonte, PA, USA).
* The chromatographic conditions used in this study were sufficient to isolate the C16:1trans isomers and the C18:2 D 9t-12c, 9t-12t, and 9c-12t isomers; the latter is reported as C18:2n6t.However, each individual C18:1 trans molecular species (i.e., C18:1 D6 through D13) could not be separated but appeared as two blended peaks that eluted just before oleic acid.The areas of these two peaks were summed and referred to as a C18:1 trans.

Untargeted Metabolomics
Untargeted metabolomics analyses were conducted on stool, plasma, and urine samples.Stool samples were homogenized by sonification after adding 1 × PBS (10 mL per mg tissue) prior to prepping.Plasma, urine, and stool homogenates were deproteinized with 6× volume of cold acetonitrile:methanol (1:1) solution following the addition of 13C6phenylalanine (3 µL at 250 ng/)µL as internal standard.Samples were kept on ice with intermittent vortexing and centrifuged at 18,000× g for 30 min at 4 • C. The supernatants were divided into 2 aliquots and dried down for analysis on a Quadruple Time-of-Flight Mass Spectrometer (Agilent Technologies 6550 Q-TOF, San Diego, CA, USA) coupled with an Ultra High-Pressure Liquid Chromatograph (1290 Infinity UHPLC Agilent Technologies, San Diego, CA, USA).Profiling data were acquired under both positive and negative electrospray ionization conditions over a mass range of 100-1200 m/z at a resolution of 10,000-35,000 (separate runs).Metabolite separation was achieved using two columns of differing polarity, a hydrophilic interaction column (HILIC, ethylene-bridged hybrid 2.1 × 150 mm, 1.7 mm; Waters) and a reversed-phase C18 column (high-strength silica 2.1 × 150 mm, 1.8 mm; Waters).For each column, the run time is 20 min using a flow rate of 400/µL min.A total of four runs per sample was performed to give maximum coverage of metabolites.A quality control sample, made up of a subset of samples from the study, was injected several times during each run.All raw data files obtained were converted to compound exchange file format using Masshunter DA reprocessor software (Agilent, San Diego, CA, USA).Mass Profiler Professional (Agilent, San Diego, CA, USA) was used for data alignment and to convert each metabolite feature (m/z × intensity × time) into a matrix of detected peaks for compound identification.Putative Identifications (IDs) were given to components based on mass molecular weights and further examined by comparison to a reference standard of the proposed compound.The mass accuracy of the Q-TOF method was <5 ppm with retention time precision better than 0.2%.A 1.2× fold change can be detected with a precision of 4%.13C6 phenylalanine internal standard was used to check for recovery of each sample only and not in normalization of the data.
An unsupervised principal component analysis, ANOVA, 3D plot and heat map, and a Partial Least Square discrimination analysis (PLS-DA) comparison between groups were generated for analysis.The R-package MetaboanalystR software (version 3.3.0)was used for data normalization, differential expression analysis, and visualization.In each mode, metabolites were row-wise normalized to a constant sum (SumNorm), log-transformed, and then scaled by mean centering (MeanCenter).Normalized data were analyzed by multivariate Principal Component Analysis (PCA) to reveal data heterogeneity, groupings, outliers, and trends.Hierarchical clustering analysis (HCA) was performed to reveal clustering between sample injections, group replicates, and multiple metabolite clusters that correlate with different subsets of clinical variables.The univariate Student's unpaired t-test was conducted for between-group analysis with multiple testing corrections to identify differentially expressed metabolites between two groups with statistical significance (FDR adjusted p-value ≤ 0.05 and |fold change| ≥ 1.5).

Gut Metagenomics
Gut metagenomics analyses were performed at CosmosID (Germantown, MD, USA).DNA from stool samples was isolated using the QIAGEN DNeasy PowerSoil Pro Kit, according to the manufacturer's protocol.DNA samples were quantified using the GloMax Plate Reader System (Promega, Madison, WI, USA) using the QuantiFluor ® dsDNA System (Promega, Madison, WI, USA) chemistry.DNA libraries were prepared using the Nextera XT DNA Library Preparation Kit (Illumina, San Diego, CA, USA) and IDT Unique Dual Indexes with a total DNA input of 1 ng.Genomic DNA was fragmented using a proportional amount of Illumina Nextera XT fragmentation enzyme.Unique dual indexes were added to each sample followed by 12 cycles of PCR to construct libraries.DNA libraries were purified using AMpure magnetic beads (Beckman Coulter) and eluted in QIAGEN EB buffer.DNA libraries were quantified using Qubit 4 fluorometer and Qubit™ dsDNA HS Assay Kit.Libraries were then sequenced on an Illumina NextSeq 2000 platform 2 × 150b.
Sequences were trimmed by Trimmomatic [29], and then aligned to human genome reference using BWA [19].Taxonomic annotation was performed by utilizing KrakenUniq [30] and subsequently Bracken [31] on a database that includes all bacterial, archaeal, viral, and fungal references from RefSeq along with human references.The lowest common ancestor taxonomic annotations were adjusted within the lineage until at least 10% of the unique k-mers belonged to a specific clade and not its parent, then filtered for at least 10 reads and a minimum Bracken-adjusted relative abundance of 0.005%.Pearson correlation was calculated by taking the log abundances of the species (or other relevant ranks) and comparing these between the two samples.The functional annotations for genes were performed by using HUMAnN3 with the UniRef90 clusters and summarized as MetaCyc v19.1 pathways by HUMAnN3 [32].The alpha diversity analysis was performed using Shannon entropy and species richness on the Bracken results of each replicate.Beta diversity was calculated as a generalized unifrac distance between the samples [33,34].Statistical analyses of beta diversity were performed by comparing the within-group distance to inter-group distances.Differential abundance was calculated using MaAsLin2 with a negative binomial model and trimmed mean of M-values normalization (TMM) [35,36].Differentially abundant taxa were modeled in three ways: (1) BCS vs. HC, (2) BCS grouped by stage at diagnosis, and (3) BCS grouped by cancer subtype.The correction factors were age, as well as the age and menopause state at the time of sampling.For differentially abundant taxa, taxon disease annotation was accomplished via the Disbiome database [37].

Quality of Life Questionnaires
Participants each electronically completed the Generalized Anxiety Disorder-7 questionnaire (GAD-7) and the Patient Health Questionnaire-8 (PHQ-8).Results were used as a means to assess the differences in the general quality of life (QoL) between the two groups.The PHQ-8 consists of 8 questions (0-3 points each) focused on anxiety (4 questions) and depression (4 questions).Scores range from 0-24 and are interpreted as 5-9 mild, 10-14 moderate, 15-19 moderately severe, and 20+ severe depression.The GAD-7 consists of 7 questions (0-3 points each) focused on generalized anxiety.Scores range from 0-21 and are interpreted as 5-9 mild, 10-14 moderate, and 15+ severe anxiety.Wilcoxon rank-sum tests were performed on the total score from each questionnaire.

Results
To accomplish the stated objectives of this single time point, observational controlled study, metadata were gathered by questionnaire, taken (blinded for the study team) from electronic health records of BCS participants, and analyzed from biological samples (blood, urine, feces).Whole genome sequencing (WGS) from blood yielded polygenic risk scores via analysis of single nucleotide polymorphisms (SNP).Targeted metabolomics panels included acylcarnitines and bile acids from plasma and fatty acids from RBCs.Untargeted proteomics was analyzed from plasma and assessed via aptamers.Untargeted metabolomics assays were run on plasma, urine, and stool to unmask unknown or unexpected differences.Metagenomics via NextGen sequencing was completed for the microbiome from stool samples.The data map in Figure 1 outlines these multiomic features and the following sections report the results from each class of data gathered.

Metadata: Description of the Cohort
Data were collected from a total of 100 female participants comprised of two equal groups-50 Breast Cancer Survivors (BCS) and 50 healthy controls (HC).These results are shown in Table 1.The mean age of all participants was 63 years (BCS = 62.8;HC = 63.2).Participants assigned to the BCS group had a prior diagnosis of stage 0-3 (in situ or invasive) breast cancer and had completed active therapy (e.g., chemotherapy, radiotherapy, endocrine therapy, or some mixture of treatment modalities), with a mean-time from treatment of 518 days (IQR: 411-613 days) prior to provision of the samples.Questionnaire data for the PHQ-8 and the GAD-7 are shown in Figure A1a,b, respectively.There were no significant findings between the two cohorts, in regards to the responses to those questionnaires.
microbiome from stool samples.The data map in Figure 1 outlines these multiomic features and the following sections report the results from each class of data gathered.

Metadata: Description of the Cohort
Data were collected from a total of 100 female participants comprised of two equal groups-50 Breast Cancer Survivors (BCS) and 50 healthy controls (HC).These results are shown in Table 1.The mean age of all participants was 63 years (BCS = 62.8;HC = 63.2).Participants assigned to the BCS group had a prior diagnosis of stage 0-3 (in situ or invasive) breast cancer and had completed active therapy (e.g., chemotherapy, radiotherapy, endocrine therapy, or some mixture of treatment modalities), with a mean-time from treatment of 518 days (IQR: 411-613 days) prior to provision of the samples.Questionnaire data for the PHQ-8 and the GAD-7 are shown in Figure A1a,b, respectively.There were no significant findings between the two cohorts, in regards to the responses to those questionnaires.  PHQ-8: Patient Health Questionnaire 8; GAD-7: Generalized Anxiety Disorder 7 questionnaire; Prebiotics: ("Are you currently taking prebiotic supplements?");Probiotics: ("Are you currently taking probiotic supplements?);Antibiotics: ("Have you ever taken antibiotics for more than 3 months in the past year?);NA: not applicable; VUS: variant of unknown significance; BrCa: breast cancer. 2Wilcoxon Rank-Sum Test, between-groups p-value < 0.05. 3 The participants with Type II diabetes (T2D) post-treatment were the same participants with T2D pre-treatment.

Genetic Analysis via Whole Genome Sequencing
Whole genome sequencing (WGS) was completed for each participant in both cohorts.Using these data, we applied multiple polygenic risk score (PRS) calculations to the cohort in order to understand their genetic predisposition for health and disease.PRS categories included breast cancer (108 scores), lipid (65 scores), insulin (23 scores), cardio (47 scores), and general health and disease (54 scores) [38].These were assessed for a total of 297 individual scores.The complete results for BrCa-specific PRSs are reported in Appendix B (Table A1), with selected PRSs for BrCA and other disease signatures shown in Figure 2.
As a first-pass genetics analysis, we expected this would give us the best overarching look at genetic differences between the HC and BCS cohorts.Within this analysis, two major themes stood out: (1) the genetic homogeneity present between these two age-matched cohorts and (2) the significant predisposition for BrCa risk present in the BCS cohort.Of the 189 non-breast cancer-specific PRSs assessed, significant differences were found in only 7. Two p-wave duration scores (p = 0.043 and 0.038) and gout (p = 0.041) were significantly higher in BCS vs control, while fasting insulin (p = 0.046) and insulin sensitivity index (p = 0.052), were significantly lower in BCS compared to HC.In the BrCa risk PRS category, 73 of the 108 PRSs assessed showed significance (q < 0.05), indicating a strong genetic predisposition for a genetic component of BrCa.This battery of predispositions correlated well with BrCa incidence in the BCS cohort (as the BCS cohort was defined by participants who had dealt with BrCa and recovered successfully).
were significantly higher in BCS vs control, while fasting insulin (p = 0.046) and insulin sensitivity index (p = 0.052), were significantly lower in BCS compared to HC.In the BrCa risk PRS category, 73 of the 108 PRSs assessed showed significance (q < 0.05), indicating a strong genetic predisposition for a genetic component of BrCa.This battery of predispositions correlated well with BrCa incidence in the BCS cohort (as the BCS cohort was defined by participants who had dealt with BrCa and recovered successfully).

Targeted Metabolomics
The investigation of targeted metabolomics was restricted to three important molecular classes with relevance to BCS and their largely yet-to-be-observed dynamics in each of the following: acylcarnitines (AC), bile acids (BA), and omega fatty acids (OFA).More specifically, ACs were investigated due to their relevance to dysfunctions in energy metabolism, BA was investigated due to their relevance to other types of cancers, and fatty acids were investigated because of their relevance in inflammatory balance.

Primary and Exploratory Proteomic Analysis
PCA analysis revealed a significant overlap between the two cohorts with several outliers (Figure 4a).PC1 represented 14.75% of the variance, while PC2 represented 8.19% of the variance.The initial exploratory untargeted proteomic analysis showed 108 proteins that were significantly different between BCS and HC (Figure A3a).

Univariate and Multivariate Proteomic Analysis
We ran several log-normal univariate and multivariate models on the proteomic data.The first model (Model A) found no features that significantly differentiate between BCS and HC (Figure 3b).The second model (Model B), contrasting BCS and HC while correcting for age and menopause status, found more than 60 proteomic features that were higher and 30 proteomic features that were lower as a function of age (Figure 4b).In this same model, BCS cohort status and menopause status (except for FSH) showed no significant differentiation (Figures 4b and A3b).
In a third model (Model C) of BCS versus HC, which corrected for age, menopause, as well as treatment type, age was again the largest differentiator of significant proteomic features, with more than 30 proteins being higher and 30 proteins being lower (Figure 4e).BCS cohort status and treatment type showed several higher proteomic features in this model (Figure 4b).Additionally, Model C found that stathmin (STMN1), syntaxin-7 (STX7), and the ubiquitin-conjugating enzyme E2 E1 (UBE2E1) were lower in the BCS cohort (Figure 4d).
Figure A3c shows the enrichment of biological pathways based on the proteomic analysis in the context of Models A, B, and C used for the analysis above.Pathways of interest included the BRACA1 PCC network, FoxE1 target genes, chronic myelogenous leukemia up, and GOMF RNA binding domains, among others, which were significantly enriched (up/down) in all models (Figure A3c).

Univariate and Multivariate Proteomic Analysis
We ran several log-normal univariate and multivariate models on the proteomic data.The first model (Model A) found no features that significantly differentiate between BCS and HC (Figure 3b).The second model (Model B), contrasting BCS and HC while correcting for age and menopause status, found more than 60 proteomic features that were higher and 30 proteomic features that were lower as a function of age (Figure 4b).In this 3.5.Untargeted Metabolomics 3.5.1.Untargeted Plasma Metabolomics PCA yielded a high overlap between BCS and HC cohorts, indicating a similarity between the two groups (Figure A4a), in regards to urine and stool metabolomics, respectively.Variable importance plots (VIP) for different MS modes highlighted major metabolites of interest (Figure 5c).In multivariate modeling, when controlling for breast cancer type, age, and menopause status in the BCS versus HC comparison, 3 metabolites were higher and 4 metabolites were lower in the BCS cohort (Figure 5a).The molecules that were significantly higher included diaminopimelic acid (p = 2.646 × 10 −5 ), 12Z-tetradecyl acetate (p = 0.0422), and one metabolite that could not be annotated.The metabolites that were significantly lower in BCS compared to HC were Asp-Pro-Thr (p = 0.0045), Ser-Ser-His (p = 0.0045), glycylproline (p = 0.0075), and 8-hydroxyalanylclavam (p = 0.0451; Figure 5a).

Stool Microbiome
In the dataset, where the majority of samples fell within the ~10-20 million range, PCA and Principal Coordinates Analysis (PCoA) highlighted a noticeable overlap of cohorts with minor separation (Figure 6a).No significant global differences were observed in phylum distribution (Figure 6b), alpha diversity (Shannon index; Figure 6c), or species richness (Figure A5a).However, beta diversity (Bray-Curtis dissimilarity) analysis revealed several noteworthy distinctions, indicating unique community structure profiles between the two cohorts (Figure 6d).The volcano plots for genus and species revealed

Untargeted Urine & Stool Metabolomics
PCA yielded a high overlap between BCS and HC cohorts, indicating a similarity between the two groups (Figure A4b,c).When controlling for breast cancer type, age, menopause, and treatment type (tamoxifen, chemo, endo, radio) using multivariate model-ing (Model C), 48 identifiable metabolites (shown in Figure 5b and listed in Table A2) were found to be significantly lower (q < 0.1).When controlling for breast cancer type, age, and menopause status (Model B), no significantly different metabolites were found to be higher or lower in either the untargeted urine or untargeted stool results (Figure A4d,e).

VIP Plots and Metabolite Type Annotation
For each of the three sample types (plasma, serum, and stool), a PLS-DA comparison revealed that 122 analytes were of importance.Of those metabolites, 63 were found to be higher in the BCS cohort and 59 were found to be lower (Figure 5c).An analysis aimed at deeper annotation of those 122 metabolites represented in the VIP plots showed that 82 (67%) of the metabolites had some biological or chemical function that was annotated in the literature.The remaining 41 (33%) metabolites had no easily identifiable function.These 82 metabolites were grouped into four categories: (1) known drug (N: 28, 34%), (2) of dietary or supplement origin (N: 17, 21%), (3) endogenous metabolite (N: 14, 17%), or 4) exogenous metabolites or known toxicant (N: 23, 28%).When comparing the BCS cohort to the healthy control, of the: (1) drug metabolites 14 were higher and 14 were lower, (2) dietary or supplement metabolites, 11 were higher and 6 were lower, (3) endogenous metabolites, 7 were higher and 7 were lower, and (4) exogenous or known toxicant metabolites, 9 were higher and 14 were lower.

Gut Microbiome Metagenomics Stool Microbiome
In the dataset, where the majority of samples fell within the ~10-20 million range, PCA and Principal Coordinates Analysis (PCoA) highlighted a noticeable overlap of cohorts with minor separation (Figure 6a).No significant global differences were observed in phylum distribution (Figure 6b), alpha diversity (Shannon index; Figure 6c), or species richness (Figure A5a).However, beta diversity (Bray-Curtis dissimilarity) analysis revealed several noteworthy distinctions, indicating unique community structure profiles between the two cohorts (Figure 6d).The volcano plots for genus and species revealed separation between the two cohorts in a large number of features (Figure 7a,b).Similarly, the volcano plots for family, order, class, and phylum revealed separation in a number of features that varied between plots (Figure A5b-e).Utilizing univariate and multivariate models to account for factors such as stage at diagnosis, breast cancer type, age, and menopause status uncovered substantial differences in the abundance of specific genera and species (Figure 7a-d).Importantly, these distinctions remained consistent across models.Furthermore, delving into the published literature (via a proprietary database) unveiled associations between the identified genus Utilizing univariate and multivariate models to account for factors such as stage at diagnosis, breast cancer type, age, and menopause status uncovered substantial differences in the abundance of specific genera and species (Figure 7a-d).Importantly, these distinctions remained consistent across models.Furthermore, delving into the published literature (via a proprietary database) unveiled associations between the identified genus and species differences and a diverse range of diseases with microbiome correlates (Figure 8).HC taxon disease annotations and selected disease similarities.

Cohort
This age-matched, entirely female, primarily post-menopausal cohort was statistically similar across groups in all measured demographic variables except anxiety, which occurred more frequently in the BCS cohort.Participants provided samples from 93 to 1043 days post-treatment end.The BrCa types and stages as well as modes of treatment were heterogeneous in this study, potentially confounding the identification of patterns but important in terms of the broader application of this data set as a foundational database to build upon for future work.

Genetics
Specific to this study, we found a difference in preexisting risk between the two cohorts, which emphasizes the usefulness of polygenic scores as a screening measure in women to assess their risk of BrCa.Of the 189 non-breast cancer-specific PRSs assessed, significant differences were found in only 7 scores (3.2%).This finding was evidence of the homogeneity of the cohort (for issues beyond BrCa) and success in age-matching participants.Alternatively, the BCS cohort had a significantly higher predisposition for breast cancer when compared to the healthy controls (73 of 108 scores significantly different between groups or 67.6%).BrCa is known to be a disease where genetics are predictive of future risk.Studies have found that combined risk from single nucleotide polymorphisms (SNPs) is associated with breast cancer in Genome-Wide Association Studies (GWAS) and explains over 30% of breast cancer heritability [39].

Untargeted Proteomics
Univariate modeling (Model A) comparing only BCS status yielded no significant differences between the groups.When correcting for age and menopause in multivariate models in BCS vs. healthy controls (Model B), follicle-stimulating hormone (FSH) was

Cohort
This age-matched, entirely female, primarily post-menopausal cohort was statistically similar across groups in all measured demographic variables except anxiety, which occurred more frequently in the BCS cohort.Participants provided samples from 93 to 1043 days posttreatment end.The BrCa types and stages as well as modes of treatment were heterogeneous in this study, potentially confounding the identification of patterns but important in terms of the broader application of this data set as a foundational database to build upon for future work.

Genetics
Specific to this study, we found a difference in preexisting risk between the two cohorts, which emphasizes the usefulness of polygenic scores as a screening measure in women to assess their risk of BrCa.Of the 189 non-breast cancer-specific PRSs assessed, significant differences were found in only 7 scores (3.2%).This finding was evidence of the homogeneity of the cohort (for issues beyond BrCa) and success in age-matching participants.Alternatively, the BCS cohort had a significantly higher predisposition for breast cancer when compared to the healthy controls (73 of 108 scores significantly different between groups or 67.6%).BrCa is known to be a disease where genetics are predictive of future risk.Studies have found that combined risk from single nucleotide polymorphisms (SNPs) is associated with breast cancer in Genome-Wide Association Studies (GWAS) and explains over 30% of breast cancer heritability [39].

Untargeted Proteomics
Univariate modeling (Model A) comparing only BCS status yielded no significant differences between the groups.When correcting for age and menopause in multivariate models in BCS vs. healthy controls (Model B), follicle-stimulating hormone (FSH) was significantly higher in BCS (Figure A3b).Interestingly, a 2010 study found a positive association between serum FSH concentrations and better prognosis during tamoxifen therapy in a cohort of postmenopausal breast cancer patients [40].Another group of researchers found that Her-2+ patients had higher serum FSH levels compared to Her-2patients and patients with high Ki67 expression also had higher levels of FSH [41].
With the multivariate Model C, we corrected for age, menopause, and treatment and found that Stathmin (STMN1), Syntaxin-7 (STX7), and Ubiquitin-conjugating enzyme E2 E1 (UBE2E1) were higher in the BCS cohort.STMN1 has been associated with breast cancer due to its influence on cell proliferation, differentiation, and motility.Phosphorylation of the four serine residues of STMN1 leads to inhibition of microtubule polymerization [42].Furthermore, phosphorylated STMN1 contributes to the regulation of cell migration, cell invasion, and cancer metastasis.Specifically, phosphorylation at serine residues 25 (ser25) and 38 (ser38) of STMN1 was shown to be increased in cells with higher metastatic potential and was also associated with increased morbidity and mortality in breast cancer patients.Following phosphorylation at ser25 and ser38, STMN1 binds to glucose-regulated protein of molecular mass 78 (GRP78).GRP78 is an endoplasmic reticulum chaperone and heat shock protein that is known to be involved in tumor proliferation, survival, and metastasis.In contrast to phosphorylation at ser25 and ser38, phosphorylation at ser16 and ser63 of STMN1 led to improved morbidity and mortality.Researchers also noted that inhibition of MEK kinase, which specifically phosphorylates ser25 and ser38, strongly reduced GRP78 binding.This led the researchers to believe that tumor progression was impacted by site-specific phosphorylation of STMN1 at ser25 and ser38 and subsequent GRP78 binding [43].In breast cancer tissue, one study found stathmin expression was associated with tumor proliferation, p53 status, basal-like differentiation, BRCA1 genotype, high-grade histology, tumor angiogenesis, immune response, and survival.Essentially, linking stathmin expression to increased severity and worse outcomes in breast cancer [44].Another study found that STMN1 can be used as a prognostic indicator, based on elevated STMN1 leading to a worse prognosis in breast cancer patients [45].
STX7 is a member of the SNARE family of proteins, which are generally associated with vesicle trafficking.Specifically, STX7 has been associated with the fusion of late endosomes with lysosomes and the homotypic fusion of lysosomes as well as the fusion of lysosomes with phagosomes [46].Although little research exists on the role of STX7 in breast cancer, a recent in vitro study highlighted the involvement of STX7 as a promoter of invadopedia (i.e., matrix-degrading structures) formation during cancer cell invasion.Thus, increased levels of STX7 are implicated as a major contributing factor in breast cancer cell invasion [47].
Ubiquitin-conjugating enzymes (E2s) are involved in breast cancer progression and are hypothesized to increase resistance in triple-negative breast cancer [48].There are a number of E2 enzymes that can contribute to breast cancer pathology, playing a myriad of roles depending on the specific enzyme.Elevated UBE2E1 is an interesting finding in BCS.Previously, UBE2E1 has been implicated in acute myelogenous leukemia (AML) and pancreatic cancer.A study of UBE2E1 in AML found that patients with high UBE2E1 expression were non-responsive to chemotherapy and had worse prognoses while those with lower UBE2E1 were more responsive and likely to enter complete remission [49].UBE2E2 has been shown to promote cancer cell movement and invasion in breast cancer cells through its action on ISG15 [50].For a complete review of ubiquitin-conjugating enzymes and their involvement in breast cancer, as well as many other cancers, see the works of Du et al. [51] and Voutsadakis [52].

Targeted Metabolomics 4.4.1. Acylcarnitines
Dysregulation in energy metabolism and associated obesity are well-established risk factors for BrCa, especially in postmenopausal women [53].Further, there is a known link between metabolic disorders and aberrant acylcarnitine metabolism [54], which was the impetus for assessing acylcarnitine in this study.Our results indicated that hexanoyl carnitine was significantly higher and carnitine was significantly lower in the BCS cohort compared to healthy controls.To our knowledge, this is the first study to analyze acylcarnitines in a cohort of breast cancer survivors.However, previous studies have assessed acylcarnitine levels in breast cancer patients.A study published in 2013 by Shen et al. found that hexanoyl carnitine levels were significantly higher in breast cancer patients compared to controls [55].The majority of our results suggest that BCS patients become more like their healthy counterparts as time goes on.In this instance, however, the significantly elevated hexanoyl carnitine levels from our study of BCS are consistent with the significantly elevated hexanoyl carnitine levels of breast cancer patients found by Shen.Another study found that serum carnitine levels were significantly lower in breast cancer patients compared to controls [56].This finding by Ozmen et al. shares similarities with our finding of significantly decreased carnitine.However, the Ozmen group measured serum samples and we analyzed plasma samples.Additionally, their group only analyzed patients who had undergone radiotherapy.Therefore, only a general, and not a direct, comparison can be made between the breast cancer patients of the Ozmen study and the BCS of our study.

Bile Acids
The role of bile acids in breast cancer (and cancer, in general) is complex and warrants further investigation and clarification, as much is still unknown.Furthermore, contradictory results make interpretation of the role of bile acids in breast cancer even more difficult.Although assessments of bile acids in BCS vs. healthy controls have not been previously conducted to our knowledge, bile acids have been studied in breast cancer as well as many other cancer types.Our results showed that chenodeoxycholic acid (CDCA), tauroursodeoxycholic acid (TUDCA), cholic acid (CA), glycodeoxycholic acid (GDCA), glycochenodeoxycholic acid (GCDCA), deoxycholic acid (DCA), and ursodeoxycholic acid (UDCA) were significantly lower in BCS patients compared to healthy controls.
CDCA, a major primary bile acid, generally induces oxidative stress, DNA damage, and inflammation.CDCA has been implicated as a tumor promotor in several cancers, however, it has been suggested that CDCA is a tumor suppressor in breast cancer [57].One study showed that CDCA inhibits the proliferation of cells in tamoxifen-resistant breast cancers via FXR activation [58].However, another team of researchers reported that the activation of FXR is positively correlated with estrogen receptor expression and supports cell proliferation [59].Another study comparing the bile acid profiles in the serum of breast cancer patients to patients with benign breast disorder (BBD) found that CDCA was elevated in breast cancer patients [60].
A study by Costarelli and Sanders found that the mean concentration of DCA was 52% higher in breast cancer patients compared to controls, leading them to suggest that DCA may be involved in the etiology of breast cancer, potentially as a tumor promoter [61].A recent cell study showed that Clostridium-specific DCA plays a molecule-specific role in breast cancer cell proliferation.The researchers found that DCA significantly promoted the proliferation of HER2+ cells but did not affect triple-negative breast cancer cells [62].
Interestingly, UDCA, TUDCA, and GUDCA have been studied as potential treatments for other types of cancer but not specifically for breast cancer.A study analyzing breast tissue samples from breast cancer patients found that several bile acids, including DCA and GCDCA, accumulate in breast tumors.This study also found that the accumulation of these bile acids showed an inverse relationship with proliferation scores.Increased bile acids led to decreased tumor cell proliferation, which led to improved prognostic outcomes [63].

RBC Fatty Acids
In the current study, targeted analysis of RBC fatty acids in BCS patients revealed significantly increased levels of arachidic acid, lignoceric acid, behenic acid, and Omega-3 Index as well as significantly lower levels of EPA, DHA, and DPA.Additionally, the AA:EPA ratio was significantly higher in BCS patients.Significantly altered omega-3 fatty acids may be a contributing factor in the etiology and progression of breast cancer.Furthermore, depleted omega-3 fatty acids may increase susceptibility to the development of breast cancer, impact the effectiveness of treatment, and play a role in survivorship.As such, fatty acid supplementation may be an effective adjunct therapeutic intervention in breast cancer.
Previous studies have found changes in omega-3 fatty acid status in breast cancer patients.For example, Cala et al. found significant changes in 10 fatty acids in their breast cancer cohort of Colombian Hispanic women [64].Pakiet et al. assessed the serum fatty acid profiles of breast cancer survivors to examine if fatty acid levels normalized following successful treatment.They found a lack of normalization of fatty acid profiles after breast cancer resection, as well as significant differences in serum fatty acid levels before and at 12-and 24-month follow-ups.Interestingly, they found significantly increased levels of EPA and DPA relative to control factors [65].Shen et al. also found a significant increase in DPA in patients with triple-negative breast tumors compared to the control factors [55].DHA, behenic acid, and arachidic acid were found to be significantly increased in breast cancer patients with invasive ductal carcinoma [66].
Hidaka et al. looked at the relationship between fatty acid levels and cytologic atypia, a condition that often leads to breast cancer.Their research team found that EPA and DHA were significantly lower in women with cytologic atypia in multiple blood lipid compartments.Additionally, they found that the EPA+DHA:AA ratio, the omega-3:6 ratio, and DPA were all significantly lower in plasma triacylglycerides of women with atypia [67].
A recent review by Fabian et al. presented studies showing a high intake ratio of marine omega-3 fatty acids EPA and DHA relative to the omega-6 fatty acid arachidonic acid reduces the risk of breast cancer compared to those with low intake ratios [68].Similarly, other studies have shown that a higher dietary intake ratio of omega-3:omega-6 is associated with a lower risk of breast cancer [69,70].
While more research is needed to fully elucidate the role of omega-3 fatty acids in breast cancer and breast cancer survivorship, this study adds to a growing body of evidence implicating omega-3 fatty acids as an important contributing factor in breast cancer progression.Based on our results, we believe it is pertinent for all breast cancer patients to have their omega-3 fatty acid status assessed.Furthermore, we believe that omega-3 fatty acids have potential as an adjunct therapeutic intervention for breast cancer patients.

Untargeted Metabolomics
Our findings from the untargeted plasma metabolomics analysis showed several three-amino acid sequences that were significantly decreased in breast cancer patients compared to healthy controls (i.e., Asp-Pro-Thr and Ser-Ser-His).Recently, an interesting review highlighted research findings that specific amino acids are altered by 10-15% in breast cancers [71].Lai et al. showed that alanine (Ala), histidine (His), threonine (Thr), arginine (Arg), proline (Pro), glutamate (Glu), and glycine (Gly) are six times more likely to be decreased than increased in cancer patients [72].An interesting meta-analysis of metabolomics in the diagnosis of breast cancer highlighted the frequency of specific amino acids mentioned in the literature.They reviewed diagnosis-related studies that noted either increased or decreased amino acids in various tissue samples (saliva, blood, urine, and breast tissue).Seven studies found changes in His (3 up and 4 down).However, only one of the seven studies showed a decrease in His in plasma.Six studies found changes in Pro (3 up and 3 down), six studies found changes in Ser (5 up and 1 down), five studies found changes in Thr (2 up and 3 down), and four studies found changes in Asp (3 up and 1 down) [73].Jobard et al. found that in premenopausal women Ser was associated with a decreased breast cancer risk [74].As for Asp, results show that it both increases [75] and decreases [76] with the risk of breast cancer.Studies have shown that increased levels of both His and Thr increase the risk of developing breast cancer [77].
Glycylproline (Gly-Pro), a dipeptide that is typically an end-product of collagen metabolism, was significantly decreased in BCS vs HC.As noted, Gly and Pro have a high likelihood of being decreased in cancer patients.Alternatively, some neoplasia have reported increased proline content [78,79].In the context of breast cancer, a study by Zareba et al. in MCF-7 breast cancer cells suggested that Gly-Pro and Gly-Pro-derived proline, along with the level of activity of the enzymes that cleave it (e.g., proline dehydrogenase/proline oxidase and prolidase) may play an intricate role in the balance between apoptosis, autophagy, and the growth of neoplastic cells [80].Furthermore, in breast cancer, estrogen increases prolidase activity and collagen biosynthesis [81].
Variable importance plots (VIP) provide an estimation of the importance of each variable in the overall effect.The VIP plots for the untargeted plasma metabolomics showed that the Gly-Pro dipeptide was a highly important metabolite.The remaining metabolites found in the untargeted plasma metabolomics analysis, 8-hydroxyalanylclavam, and 12Z-Tetradecenyl acetate, have not previously been associated with cancer.The PCA results from both the untargeted urine and stool metabolomics showed a high overlap between the BCS and HC cohorts.No metabolites were significantly higher or lower in stool or urine samples when controlling for breast cancer type, age, and menopause status using multivariate modeling.
In the untargeted urinary metabolomics analysis, Model C showed that, in the subcohort of the BCS group who had received chemotherapy treatment, there was a set of 48 identifiable metabolites that were significantly lower (q < 0.1).Upon review of the available literature, we found only a small number of studies in breast cancer that have investigated urinary metabolomics and even fewer have included untargeted measures [82][83][84][85][86][87][88][89][90][91][92][93].However, to our knowledge at the time of submission, this is the first of its kind study to investigate the untargeted urinary metabolome in a breast cancer survivor cohort that had received chemotherapy treatment.Given this fact, there are no datasets to compare the findings with and, thus, these findings warrant further investigation in a larger BCS cohort.This analysis was limited by the lack of power due to the small sample size (n: 10, 20% of BCS cohort).

Metagenomics
The microbiome sequencing put forth in this study provides valuable insight into the microbial composition of a BCS cohort.This gut microbial sequencing gives us the opportunity to assess the landscape of the gut metagenome of a relatively unassessed microbial population.
There was little association that could be drawn between our findings and those of other studies in the literature.This was mainly due to the depth of microbial analysis conducted in the present study and the scarcity of research in similar BCS cohorts.A study conducted by Smith et al. on the fecal microbial composition of BCS and their quality of life found that physical function, vitality, and mental health were negatively associated with Ruminococcus and Dorea.Additionally, they found that non-obese BCS had higher relative abundance of Ruminococcus, Streptococcus, Roseburia, and Dorea [12].The present study found a slight increase in Roseburia and a decrease in Streptococcus lutetiensis, while the other results were not replicated.
BrCa patients who undergo chemotherapy treatment are at increased risk of developing metabolic disease and weight gain.This can potentially lead to increased morbidity and reduced quality of life in BCS.A study in BrCa patients with early-stage breast cancer compared the metagenome of patients who underwent adjuvant chemotherapy, adjuvant endocrine therapy, or both.They found that patients treated with chemotherapy only experienced clinically and statistically significant weight gain and fat distribution.Colonic inflammatory markers were also observed to have increased two-fold.Additionally, the researchers noted that these changes were accompanied by a reduction in the abundance and variety of microbial species in the gut microbiome [16].The present study did not replicate this finding and, by most metrics, the cohorts were similar in terms of abundance and variety at a high level, with reported genus and species level differences.
In the study presented here, a simple analysis comparing BCS to HC showed multiple differences on each of the levels (species, genus, family, etc.) we investigated.This specific set of findings casts a wide net and can be leveraged in future research for translations into interventions that can be applied to breast cancer survivors.When controlling for age and menopause status as well as grouping by BrCa stage at diagnosis and BrCa subtype, a second, more resolved set of findings was illuminated, giving further resolution to these differential findings and their potential generalizability.Similarly, multiple correlations were found with other diseases known to affect the microbiome (such as irritable bowel syndrome, Parkinson's disease, colorectal cancer, etc.).Utilizing these correlations could further open avenues to understanding significant changes in microbiome community structure and lead to novel interventions.

Conclusions, Future Directions, and Limitations
The present study was established as a hypothesis-seeking and hypothesis-generating investigation.When such multiomics studies are conducted, patterns of variance frequently emerge from the high dimensional data.These novel patterns of variance then form the basis of new hypotheses that can be followed up by future investigations.It is these novel hypotheses that frequently lead to treatment innovations.
It was acknowledged at the outset of the present study that treatment effects would be a confounding factor in the interpretation of how the biochemical characteristics being measured may be associated with disease development or treatment outcome.This is because treatment and time will naturally change the biochemical landscape.Nevertheless, this study has revealed several future research questions and has laid the groundwork for specific investigations.These can be considered for both newly diagnosed and breast cancer survivors.
First, this deep phenotyping study has led to a data set that can live as a molecular atlas of breast cancer survivors.This can serve as a frame of reference for future multiomics studies in a larger cohort of BCS, where the study will be more strongly powered by larger subject numbers.This may lead to the ability to detect signals that did not emerge from the current cohort size.
Second, the finding of altered omega-3 fatty acids warrants the exploration of fully quantitative omega-3 fatty acid status in the newly diagnosed and in BCS as a research effort.Since omega-3 status is clinically actionable, there is merit in examining omega-3 fatty acids status in the clinic.The provision of omega-3 dietary supplements could become a clinically justifiable intervention.
Third, examination of bile acids in the newly diagnosed and in a larger cohort of BCS is warranted.In particular, secondary bile acids are products of bioconversion by gut bacteria.When bile acids are released from the gut and distributed systemically, secondary bile acids may have clinical effects in tissues like the breast.Since secondary bile acids are modifiable by dietary inputs, such as prebiotics and probiotics, this path of investigation may one day lead to prophylactic or therapeutic food or dietary supplement interventions (primary or adjunctive).A future investigation of bile acids in the newly diagnosed would be justified.
Fourth, acylcarnitines were altered in this BCS cohort.In the BCS group, total carnitine was lower.Carnitine is a carrier molecule that transports fatty acids throughout the body.Variations in the types of fatty acids bound to carnitine can be highly informative as to how metabolism is changing.Moreover, carnitine also transports fatty acids across the mitochondrial membrane so they can be oxidized as an energy source via beta-oxidation.Thus, low total carnitine may be of clinical import in BCS, as it may adversely impact the ability to use fatty acids as an energy substrate necessary for the generation of ATP.Future work should investigate the acylcarnitine signature in the newly diagnosed to explore how this molecular class might influence disease progression and treatment outcomes.Additionally, carnitine status should be examined, as carnitine supplementation is widely used and can favorably impact energy metabolism.Further research should also explore whether this carnitine deficit is a result of declining endogenous carnitine synthesis, increased carnitine degradation, or reduced carnitine intake.
Fifth, the present study showed a strong genetic component of individual risk, evidenced by the divergence of polygenic risk scores between the HC and BCS cohorts.A future study in the newly diagnosed will have the ability to associate molecular characteristics with disease emergence and progression.Additionally, deep phenotyping study (using multiomics) in those with genetic risk (PRS), including asymptomatically, may reveal very early metabolic signatures (in those with genetic risk) that can lead to new treatment targets and preventive approaches.What this means is that molecular features like the transcriptome, proteome, metabolome, and microbiome (measured in this study) may reveal early metabolic signatures that influence the genetic risk to become manifest as a clinical disease.
Sixth, this study in BCS has allowed for the refinement of methods in multiomics sample collection, preanalytical processing, analysis, and interpretation.This refinement of multiomic methods in BCS is of notable value that can be deployed in larger studies of the newly diagnosed.A prospective multiomic study of the newly diagnosed can be a very important next step since this level of deep phenotyping has not been conducted in the newly diagnosed as of this writing.This would include assessment at the time of diagnosis and longitudinally during treatment, a study which has not to date been completed in a large cohort.
Seventh, it bears noting that a rudimentary analysis was performed on the time from treatment in this BCS cohort.In short, we looked for potential divergent signals between those who most recently completed treatment and those who were further out from completion of treatment.A limitation of this approach is that we did not have molecular data at the various time points in the time from treatment analysis.Future studies would seek to examine patients during the course of treatment in the newly diagnosed and follow them with multiomics analytics for a select number of years after treatment.This type of analysis would provide multiomic data at a time proximal to the time of diagnosis, during treatment, and post-treatment.
Eighth, the current study included a deep statistical analysis of the gut metagenome.However, due to the extensive amount of metagenomic data, future studies could expand the statistical analysis of the metagenome data from the existing study to examine additional signals.This may further clarify the biological meaning of the BCS cohort findings.
Finally, future investigations should pay careful attention to comorbidities, as the molecular and clinical phenotypes may stratify differently in accordance with these comorbidities.

Figure 1 .
Figure 1.Data map showing classes of data with a number of subjects, detailed metrics assessed within each class, and a number of features measured within each metric.

Figure 1 .
Figure 1.Data map showing classes of data with a number of subjects, detailed metrics assessed within each class, and a number of features measured within each metric.

Figure 2 .
Figure 2. Selected Polygenic Risk Scores.These scores emphasize the significant predisposition for breast cancer risk and the relative homogeneity of the population in regard to other PRSs assessed.BrCA: breast cancer, INS: insulin, BP: blood pressure, BMI: body mass index, LDL: low-density lipoprotein [38].

Figure 2 .
Figure 2. Selected Polygenic Risk Scores.These scores emphasize the significant predisposition for breast cancer risk and the relative homogeneity of the population in regard to other PRSs assessed.BrCA: breast cancer, INS: insulin, BP: blood pressure, BMI: body mass index, LDL: low-density lipoprotein [38].

Figure 4 .
Figure 4. Untargeted Proteomics.(a) PCA of BCS Serum Proteomics (lognormal filtered, by type, SD = 0.5); (b) Univariate and Multivariate Models (q < 0.1): Model A-BCS Status; Model B-BCS status, age, and menopause status; Model C-BCS status, age, menopause status, and treatment type; (c) BCS vs. HC: significantly different proteins (after multiple hypothesis correction; log10 normalized); (d) Volcano plot of Model C by BCS status; (e) Volcano plot of Model C by age; ** and *** Wilcoxon Rank-Sum Test p-value < 0.01 and p-value < 0.001, respectively.Note: Gene names are used for brevity, however, all findings in this figure refer to gene products (proteins).

Figure 4 .
Figure 4. Untargeted Proteomics.(a) PCA of BCS Serum Proteomics (lognormal filtered, by type, SD = 0.5); (b) Univariate and Multivariate Models (q < 0.1): Model A-BCS Status; Model B-BCS status, age, and menopause status; Model C-BCS status, age, menopause status, and treatment type; (c) BCS vs. HC: significantly different proteins (after multiple hypothesis correction; log10 normalized); (d) Volcano plot of Model C by BCS status; (e) Volcano plot of Model C by age; ** and *** Wilcoxon Rank-Sum Test p-value < 0.01 and p-value < 0.001, respectively.Note: Gene names are used for brevity, however, all findings in this figure refer to gene products (proteins).

Figure 5 .
Figure 5. Untargeted Metabolomics.(a) Model B of BCS in plasma, (b) Model C of ChemoTX in urine, (c) VIP scores for enriched metabolites by mass spectrometry model for three sample types (plasma, urine, stool).

Metabolites 2024 ,
14, 396 16 of 37the volcano plots for family, order, class, and phylum revealed separation in a number of features that varied between plots (FigureA5b-e).

Figure 7 .
Figure 7. Gut Microbiome Metagenomics II.(a) genus volcano plot of BCS vs. HC; (b) species volcano plot of BCS vs. HC; (c) significant genus grouped by stage at diagnosis and BrCa subtype; (d) genus corrected for menopause and age.

Table 1 .
Metadata: Description of the Cohort.