Machine Learning Reveals Lipidome Remodeling Dynamics in a Mouse Model of Ovarian Cancer

Ovarian cancer (OC) is one of the deadliest cancers affecting the female reproductive system. It may present little or no symptoms at the early stages and typically unspecific symptoms at later stages. High-grade serous ovarian cancer (HGSC) is the subtype responsible for most ovarian cancer deaths. However, very little is known about the metabolic course of this disease, particularly in its early stages. In this longitudinal study, we examined the temporal course of serum lipidome changes using a robust HGSC mouse model and machine learning data analysis. Early progression of HGSC was marked by increased levels of phosphatidylcholines and phosphatidylethanolamines. In contrast, later stages featured more diverse lipid alterations, including fatty acids and their derivatives, triglycerides, ceramides, hexosylceramides, sphingomyelins, lysophosphatidylcholines, and phosphatidylinositols. These alterations underscored unique perturbations in cell membrane stability, proliferation, and survival during cancer development and progression, offering potential targets for early detection and prognosis of human ovarian cancer.


■ INTRODUCTION
The absence of reliable noninvasive ovarian cancer (OC) diagnostics leads to more deaths than any other cancer associated with the female reproductive system, with 419,085 deaths from 1990 to 2019 in the United States alone. 1 It is the fifth leading cause of cancer-related death in women. 2 Failure of early detection remains the most daunting challenge in OC diagnosis. 3 In the United States, the 5-year survival rate is 93.1% for localized OC, but it is reduced drastically to only 30.8% for metastatic OC. 4 High-grade serous ovarian cancer (HGSC) is the most frequent subtype accounting for 70−80% of all OC deaths. 5,6 Early diagnosis is therefore imperative for reducing OC mortality. However, OC often eludes detection until an advanced stage, 6 and the molecular pathogenesis underlying early-stage OC remains poorly understood. To study the biochemical underpinnings of early-stage OC pathogenesis, we conducted indepth lipidomic analyses in a Dicer1-Pten double-knock-out (DKO) mouse model as a function of time. These mice faithfully recapitulate human HGSC with phenotypic, histopathologic, and molecular similarities 7,8 and exhibit stepwise development and progression of HGSC, beginning with a premalignant phase, tumor initiation, and malignant growth in the primary tissue before advancing to early metastases, widespread metastases, and ultimately death.
It is now widely accepted that cancer is a metabolic disease. 9 As such, metabolomics/lipidomics are central to cancer biology. Metabolomics and lipidomics allow for measuring and identifying small-molecule metabolites or lipids in complex clinical specimens such as serum and tissue samples. 10 Two basic types of metabolomics experiments exist: targeted and nontargeted. 11 These experiments are typically conducted using nuclear magnetic resonance (NMR) spectroscopy and/ or mass spectrometry (MS). Nontargeted metabolomics/ lipidomics allows for the unbiased detection of thousands of metabolites/lipids, while targeted approaches focus on a known set of target species. For an unbiased discovery investigation of a specific disease, as in this work, nontargeted approaches are typically the first step. Nontargeted workflows lead to the generation of big data, necessitating mining methods such as machine learning. These methods are a subset of artificial intelligence that involve developing systems that can learn and improve with more experience without being explicitly programmed to do so. 12 Combining machine learning with metabolomics and lipidomics is a powerful approach to learn about cancer biology, 13 providing a unique opportunity for the discovery of candidate prognostic and predictive biomarkers.
Multiple studies have attempted to find metabolome or lipidome alterations associated with ovarian cancer in biofluids. 14−18 In the paper by Gaul et al., using serum metabolomics, serous epithelial ovarian cancer (EOC) was discriminated from healthy controls (HC) (HC n = 49, EOC n = 46) using 16 metabolites including numerous lipids. 14 The discrimination achieved 100% accuracy in the cohort studied using support vector machines (SVM). 14 Braicu and coworkers conducted a serum metabolomics study detailing profound lipid metabolism alterations. 15 Serum samples of 147 OC patients were compared with 98 control subjects with benign ovarian tumors and nonneoplastic diseases. Improved predictive values were achieved when cancer antigen 125, the current OC clinical biomarker, was used alongside some lipid species identified in the study. 15 Metabolomics investigations on ovarian cancer mouse models have also been conducted. Jones et al. performed metabolomic serum profiling for the detection of early-stage HGSC in DKO mice, identifying 18 discriminatory metabolites, including lipids in the phosphatidylethanolamine (PE), triglyceride (TG), lysophosphatidylethanolamine (LysoPE), and phosphatidylinositol (PI) classes. 19 Here, we present the first indepth machine learning longitudinal analysis of the serum lipidome of a DKO HGSC mouse model using a four-pronged approach: (1) unsupervised machine learning methods and univariate statistical analyses to map global lipidome alterations, (2) hierarchical clustering analysis to identify lipidome changes in response to HGSC progression, (3) multiple machine learning algorithms with varying inductive biases to identify time-resolved HGSC evolution, and (4) Kaplan−Meier estimates and restricted mean survival times analyses to find prognostic circulating lipid marker candidates.

Experimental Design
Dicer f lox/flox Pten f lox/f lox Amhr2 cre/+ DKO females and Dicer f lox/f lox Pten flox/f lox control females that do not carry Amhr2 cre/+ were generated, with the genotypes confirmed by PCR amplification of DNA. Mice were housed in the Baylor College of Medicine vivarium in dedicated mouse rooms in microisolator cages. When the animals reached 8 weeks of age, serum samples were collected from mice every 2 weeks until the end of the study or humane endpoint for sacrifice. When a DKO mouse with an advanced-stage cancer was determined to be severely sick, the mouse was anesthetized for the last blood collection via cardiac puncture and euthanized. The submandibular vein was chosen for the serial blood collection by alternating cheek sides following a valid animal protocol (AN-716). A total of 100− 200 μL of the blood sample was collected into a BD serum separator, allowed for 30 min clotting, followed by centrifugation (spun at 14,000 rpm for 10 min at room temperature) and serum collection. The collected serum samples were stored at −80°C for further metabolomics analysis. DKO mice were sacrificed for this study in accordance to the animal protocol approved by the institutional animal care and use committee (IACUC) at Baylor College of Medicine. Samples from 15 DKO mice (n = 231) and 15 control mice (n = 238) were used for lipidomics analyses. Prior to data analysis, time points for each sample collected were converted into a percentage lifetime metric with the following mathematical formula %lifetime age of mice (weeks) total lifespan of mice (weeks) 100 = × The % lifetimes were then binned into five categories: 0− 30% (stage I), 30−45% (stage II), 45−60% (stage III), 60− 75% (stage IV), and 75−100% (stage V).

Reagents
Optima liquid chromatography−mass spectrometry (LC− MS)-grade water, 2-propanol, acetonitrile, formic acid (99.5+ %), ammonium formate, and ammonium acetate were purchased from Fisher Chemical (Fisher Scientific International, Inc., Pittsburgh, PA) and used to prepare chromatographic mobile phases and solvents for extraction. Isotopically labeled lipid standards (Table S5) were purchased from Avanti Polar Lipids (Alabaster, AL) and used to prepare the lipid internal standard mixture.

Sample Preparation
The lipid extraction solvent was prepared by adding 700 μL of the isotopically labeled lipid standard mixture (Table S5) to 42 mL of 2-propanol. Serum samples were thawed on ice, followed by the extraction of nonpolar metabolites. The extraction procedure was carried out by adding the prepared extraction solvent to 10−25 μL of the serum sample in a 3:1 ratio. Following this step, samples were vortex-mixed for 30 s and centrifuged at 13,000 rpm (20,784 rcf) for 7 min. The supernatant was transferred to LC vials and stored at −80°C until analysis, which was performed within a week. A blank sample, prepared with LC−MS-grade water, underwent the same sample preparation process as the serum samples. A pooled quality control (QC) sample was prepared by adding 2−5 μL aliquot of supernatant to each serum sample. This QC sample was analyzed every 10 injections to assess LC−MS instrument stability through the course of the experiment. Samples were run in a randomized order on consecutive days.

Ultra-High Performance Liquid Chromatography−Mass Spectrometry (UHPLC−MS) Analysis
Reverse-phase (RP) ultra-high performance liquid chromatography−mass spectrometry (UHPLC−MS) analysis was performed with a Thermo Accucore C30, 150 × 2.1 mm 2 , 2.6 μm particle size column mounted in a Vanquish LC coupled to an Orbitrap ID-X Tribrid mass spectrometer (ThermoFisher Scientific). The mobile phases and chromatographic gradients used are described in Table S6. MS data were acquired in positive and negative ion modes in the 150−2000 m/z range with a 120,000 mass resolution setting. The most relevant MS parameters are provided in Table S7. Samples were kept at 4°C in the autosampler during LC−MS analysis, while the column temperature was set to 50°C. An injection volume of 2 μL was used for all runs. For lipid annotation, MS/MS experiments were performed using the Thermo Scientific AcquireX data acquisition workflow. Tandem MS data were acquired at a resolution of 30,000 and an isolation window of 0.4 m/z. Precursor ions were fragmented with HCD and CID activation methods. For HCD, stepped normalized collision energy (NCE) of 15, 30, and 45 and a CID collision energy of 40 were used to fragment the precursor ions.

UHPLC−MS Data Processing
Spectral features (described as m/z, retention time pairs) were extracted with Compound Discoverer v3.2 (ThermoFisher Scientific) from the raw files. This procedure included retention time alignment of chromatographic peaks, peak picking, peak area integration, and compound area correction using a QC-based regression curve. A cubic spline regression model was applied for the QC-based regression curve as the data acquired were not linear. 20 The sample blank injection was used to remove background peaks: features with less than five times the peak area of the corresponding features in the sample blank were marked as background signals and removed from the dataset. Additionally, features that were not present in at least 50% of the QC sample injections or had a relative standard deviation (RSD) of more than 30% in the QC injections were removed from the dataset.

Lipid Annotation
Lipid annotation was conducted for selected spectral features detected following filtering. The exact masses and MS/MS spectra of all features were first matched against a curated inhouse lipid spectral database and the proposed annotations were manually inspected. For features of interest that did not have matches in the local database, the generated elemental formulas, exact masses, and MS/MS spectra were matched against databases such as Lipid Maps 21 and mzCloud. 22 A total of 1070 species, which included fatty acids, glycerophospholipids, sphingolipids, and glycerolipids, were successfully annotated with this approach and used for further analysis. The complete dataset of annotated species is available through the Metabolomics Workbench, as described above.

Global Lipidome Analysis
To investigate alterations at the lipidome level, fold changes were computed by taking the base two logarithmic ratio of the lipid abundances for DKO mice to the DKO control mice ( . Statistically significant lipids were identified via Welch's T-test (DKO n = 221, DKO control n = 238) followed by a Benjamini−Hochberg correction using the Statsmodel library (v. 0.12.2). Eighty-seven lipids with q < 0.05 were identified as significant. These lipid features were logtransformed (log 2 X) and autoscaled prior to unsupervised machine learning. Principal component analysis (PCA), kernel PCA (kPCA), and t-distributed stochastic neighbor embedding (t-SNE) were performed with the sci-kit learn library (v. 0.24.1). In addition, uniform manifold approximation and projection (UMAP) were performed using the umap library (v. 0.5.1). A two-step pipeline was set up to identify the best hyperparameters for kPCA. First, a kPCA dimensionality reduction to the first two components, followed by a logistic regression classifier, then GridSearchCV in the sci-kit learn library were used to select the best kernel and gamma value for the algorithm. The gamma value selected was 0.03, while the kernel used was the radial basis function (RBF). For t-SNE, the following hyperparameters were used: perplexity = 4, early exaggeration = 10. Perplexity controls the balance between the local and global structures of the data, while early exaggeration is the factor that increases the attractive forces between data points. Time-resolved lipid changes were computed by comparing the five lifetime stages of DKO and DKO control mice with a Welch's T-test. Lipids with p < 0.05 were identified as significant. In addition, overlapping significant features in the time-resolved univariate test were identified using an upset plot library (v. 0.6.0). Significant lipids that appeared in at least three lifetime stages were screened as potential prognostic circulating lipids for ovarian cancer.

Lipidome Longitudinal Analysis
Fold changes, as described above, were computed for 87 lipids with q < 0.05, and hierarchical clustering analysis (HCA) was then used to identify clusters of lipidomic trajectories using those fold changes. Each row of the dataset is equivalent to the fold change values over the five lifetime stages for a given lipid feature. The goal of this analysis was to cluster lipids that have a similar trend over time. HCA was performed using the SciPy library (v. 1.6.2). The distance hyperparameter, that is the distance between two observations (lipids), used was the correlation metric, which is defined as follows where x and y are two lipid features. The second hyperparameter, the linkage hyperparameter, is the measure of the distance between two clusters to be merged. Complete linkage was used�this method computes the maximum distance between any single data point in the first cluster and any single data point in the second cluster, which is defined as follows The algorithm then fuses clusters that have the shortest distance between each other. Where d(x, y) is the distance between lipids x ∈ X and y ∈ Y and X and Y are two sets of lipid clusters. Four lipid clusters were identified to have biologically meaningful trends over time. The longitudinal lipid changes of the four lipid clusters were visualized using the Holoview Python library (v. 1.14.6). The correlation network graphs of the four clusters were plotted using Plotly (v. 5.3.1) and networkX (v. 2.5). Lipids with r ≥ 0.5 (Pearson's correlation coefficient) are displayed with a link on the network graphs.

Machine Learning Classification Methods
Feature Selection. For each lifetime stage, only lipid features with p-values < 0.05 (Welch's T-test) were retained. Furthermore, one feature was retained for every two highly correlated lipid features (Pearson's correlation, r > 0.8). Samples were divided into a training set (70% of total samples) and a test set (30% of total samples). Lipid features were selected by fitting the training datasets with a meta-transformer for selecting features based on importance weights. In this case, random forests were used, and features were ranked via their Gini index feature importance score. The features with a Gini index greater or equal to the mean of all Gini indices were the final lipid features selected for classification purposes. The number of trees used for the random forest classifiers was a hundred, and all samples were autoscaled prior to feature selection with random forests. Feature selection was carried out with the SelectFromModel function and Random Forest classifier in the sci-kit learn library (v. 0.24.1).
Machine Learning (ML) Algorithms. Classification tasks were performed by training machine learning models to discriminate DKO from DKO control mice using the features selected as described above. The machine learning algorithms used included logistic regression, random forest, k-nearest-neighbors, support vector machines, and a voting ensemble classifier. The default parameters of Python's sci-kit learn machine learning library (v. 0.24.1) were used. As indicated above, 70% of samples were used for training purposes, with a 5-fold cross-validation method, while the remaining 30% were used as the test set. The classifiers were evaluated using the area under the curve of the receiver operating characteristic curve (AUC ROC) metric. ROC is a probability curve that plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold values. This feature makes it an unbiased metric score, particularly for an unbalanced dataset. To further validate the best-performing classifier models, a permutation test was used to generate a null distribution by evaluating the classifier′s accuracy on 1000 distinct permutations of the dataset. In these permutations, feature values remain unchanged, while label assignments are varied. This null distribution represents the hypothesis that there is no relationship between the features and labels. Subsequently, an empirical p-value is computed as the proportion of permutations where the accuracy score from the permutations surpasses the accuracy obtained with the original, entire dataset. 5-fold cross-validation was used for this test, and all other default parameters of Python's sci-kit learn (v. 0.24.1) permutation_test_score were used.
Logistic Regression. Logistic regression is a regression algorithm used for classification purposes, in this case, binary classification (DKO vs DKO control mice). It is an extension of linear regression, as it computes a weighted sum of input features in addition to a bias term. However, instead of outputting a numeric value as in linear regression, the numeric value is passed through a sigmoid function that computes a probability (p) value between 0 and 1. In our case, samples with p̂< 0.5 were classified as control animals, while p̂≥ 0.5 were classified as DKO animals.
Random Forest Classification. Random forests are an ensemble of decision trees. A decision tree takes the form of an inverted tree, starting with a root node at the top, with the node split by lipid features into internal nodes, culminating with the leaf node. While lipid features split each node, as indicated, the leaf nodes give the final classification of either DKO or DKO control mice. Decision trees are assembled to form the random forest via bootstrap aggregation, which reduces prediction variance by random sampling of training samples with replacement. The algorithm also introduces additional randomness during tree construction by using a random subset of features to search for the best features to split the node, resulting in greater tree diversity. For this work, the number of trees in the forest is a hundred, and the quality of node split is measured by the Gini impurity. Support Vector Machines. The goal of support vector machines (SVM) is to identify a separating hyperplane b + w T x that will discriminate two classes of samples with the widest possible margins, where w is the weights or coefficient vector, b is the bias term, and x is the feature value. This goal is accomplished by learning the w and b terms during training with the following equation where C is a regularization parameter that penalizes or accommodates ξ, ξ is the slack variable that allows for a soft margin classification, allowing some training data to fall within the SVM margin. Therefore, the goal is to minimize the weights, bias, and slack variables, subject to a correct prediction while accommodating the slack variables. In this work, C was set to 1. A kernelized SVM was used to transform datasets that are not linearly separable to a higher-dimensional space, where they may be linearly separable. The kernel used in this work is the radial basis function kernel that is defined below where p and q represent data points and γ is the kernel coefficient. After training, given a test sample x, its prediction score can be obtained with OV score = b + wx. If the ovarian cancer (OV) score ≤ 0, the sample is classified as control mice, and vice versa. k-Nearest-Neighbors (k-NN). k-NN is a nonparametric supervised learning algorithm using an instance-based learning method. It simply stores training data instances and computes votes based on the majority class of the k-nearest-neighbors. The number of neighbors selected was five in this work, and a uniform weight function was used. That is, all points in each neighborhood were weighted equally.
Voting Classifier. Because we selected machine learning models with different inductive biases, we explored an ensemble method voting classifier. The estimators for the voting classifier include all of the ML models described previously: logistic regression, random forests, SVM, and k-NN. In addition, soft voting was performed using average predicted probabilities to predict class labels.

Prognostic Lipid Discovery Methods and Survival Analysis
Feature selection was performed by a lifetime stage-resolved volcano plot analysis. This involves plotting the −log 10 p-value (Welch's T-test, DKO lifetime stages II−V vs DKO stage I) against the log 2 FC (fold change, DKO lifetime stage II−V vs DKO stage I). Lipid features with p-values < 0.05 and at least one log 2 FC for each comparison pair were identified as significant. Volcano plot analysis was performed using the Bioinfokit library (v. 2.0.8). Overlapping significant features in the DKO volcano plot analysis were identified using an upset plot via the Upset python library (v. 0.6.0). Lipids that were significant in at least three of the four DKO lifetime stages comparisons were screened as potential prognostic circulating lipids for ovarian cancer. In addition, significant lipids in at least three lifetime stages comparison of DKO vs control lifetime stages comparisons were also screened.
The selected lipids were used to split the DKO samples into two groups using the median split method. For the last serum collection before mice death or end of the study, the DKO samples with less than or equal to the median of the lipid's relative abundance were designated as the "low metabolite level" group. In contrast, the DKO samples with greater than the median of the lipid's relative abundance were designated the "high metabolite level" group. Furthermore, the survival function S(t) = P(T > t), which is the probability that a mouse survives longer than some specified time t, was computed using the Kaplan−Meier (KM) estimate described in the equation below where d i is the number of mice death events at time t, while n i is the number of mice at risk of death prior to time t. The logrank test (p < 0.05) was used to determine if the differences between KM curves were statistically significant. In addition, the restricted mean survival time (RMST) is defined below This metric was used to compare two survival curves by measuring the area under the survival curve, which is a measure of "time lost". Kaplan−Meier estimates and the RMST was also used to compute and compare the survival curves of DKO vs control mice, respectively. Finally, the hazard curves were computed using the Nelson−Aalen estimate, and all survival analysis methods in this work were performed using the Python lifelines library (v. 0.26.3).

Statistical Analysis
Computational analysis was carried out as indicated in the respective sections above using the Python 3.8.8 programming language. NumPy (v. 1.20.1) was used for numerical computations, the Pandas (v. 1.2.4) library was used to perform data handling, and data manipulation, Matplotlib (v. ■ RESULTS

Research Design and Computational Pipeline
To study HGSC development and progression, we employed DKO mice (Dicer1 flox/flox Pten flox/flox Amhr2 cre/+ ) and DKO control mice (Dicer1 flox/flox Pten flox/flox Amhr2 +/+ ) models using high-density blood sampling (Figure 1a). A total of 15 mice in both groups were used for analysis. Starting from the 2-month mark, blood samples were collected biweekly until humane sacrifice of the animals or at the end of the study at 46 weeks. This longitudinal design resulted in 221 and 238 blood samples collected for DKO and DKO control mice, respectively. As expected, DKO mice had a shorter lifespan than DKO control mice, as shown by Kaplan−Meier ( Figure  S1a) and Nelson−Aalen ( Figure S1b) estimate curves. Furthermore, the restricted mean survival time difference (ΔRMST) between DKO and DKO control mice was about 3 weeks ( Figure S1c). Given the time-course data misalignment, each time point was converted to a "percentage lifetime" variable to align the dataset (Figure 1b). The percentage lifetime was computed by taking the percentage of the age of each mouse in weeks normalized by the total lifespan of the mouse (or age of the mice) at the last time point of blood collection (see the Materials and Methods section). Percent lifetimes were binned into five stages, which we named the "lifetime stages": 0−30%  To investigate global differences between DKO and DKO control mice, the 87 significant lipids were used to conduct unsupervised learning for all combined time points. PCA ( Figure S2a (Figure 2c). There was a progressive increase in the number of significantly altered lipids as HGSC advanced, except for the observed decrease from lifetime stage II to III. This overall temporal trend seems to mimic HGSC evolution in humans where the disease evolves from an asymptomatic early stage with only minimal metabolic changes to being more easily detectable at more advanced stages where profound metabolic changes are expected. A breakdown of the significant lipids common across stages is presented in the upset plot in Figure 2d. A total of 71.4% of the lipids were unique to lifetime stage I, 48.8% to stage II, 46.4% to stage III, 44.8% to stage IV, and 68.1% to stage V. Furthermore, a total of 19 serum lipids were found to be significantly altered in at least three of the five lifetime stages (Table S2). Of these, 68.4% were PC or PC-O, making these the most upregulated lipid classes based on univariate time-resolved analysis.

Lipidome Alterations in Response to Ovarian Cancer Progression
Taking advantage of the granularity of our longitudinal RP UHPLC−MS dataset, we investigated lipidome changes associated with OC progression by identifying lipid trajectory clusters and calculating pairwise correlations between lipids in each cluster (Figure 3 and Table 1). The dataset consisting of 87 significant lipids (Welch's T-test, BH q-value < 0.05, DKO vs DKO mice) was used for this analysis. To study the   temporal evolution of these lipid alterations, time-resolved average lipid abundances in DKO and DKO control mice were computed. Using fold changes between the average lipid abundances (Log 2 [DKO/control]), hierarchical clustering was used to identify four main lipid trajectory clusters (A−D). In cluster A, the lipid fold changes increased in DKO mice from lifetime stage I to II, decreased from II to III, and then spiked back up in V. Similar temporal trends were observed for cluster B lipids. However, in cluster C, lipids increased from lifetime stage I to II, decreased from II to III, and increased back from III to IV, followed by a mostly slight downward trend from lifetime stage IV to V. Finally, cluster D lipids had a relatively mild temporal change from lifetime stage I to IV, with a sharp increase from IV to V (Figure 3a,b). A correlation network graph for these clusters is presented in Figure 3c, showing the connectivity of related and same lipid classes. A common characteristic of clusters A−C was an increase of the specific lipids in DKO mice from lifetime stage I to II, followed by a decrease from stage II to III. These clusters were mostly composed of ether-linked and ester phospholipids such as PC,

Time-Resolved Machine Learning Discriminates Tumor Stages of HGSC in DKO Mice
We subsequently employed indepth machine learning (ML) to further characterize the five lifetime stages. The feature selection strategy in the ML computational pipeline ( Figure  4a) led to the selection of five lipid features for lifetime stage I, 25 for lifetime stage II, 18 for lifetime stage III, 24 for lifetime stage IV, and 42 for lifetime stage V (Table S3). After feature selection, five ML algorithms, including logistic regression, random forests (RF), k-nearest-neighbors (k-NN), support vector machine (SVM), and a voting classifier composed of the four prior ML algorithms were used to discriminate DKO from DKO control mice within each of the lifetime stages ( Figure  4a). ML algorithms were trained under 5-fold cross-validation conditions, while a separate test set was used for testing purposes. Detailed ML prediction results are presented in    (Figure 4b,g). For lifetime stage II (training set n = 60, test set n = 26), RF gave the highest ROC-AUC test set score of 0.70 (Figure 4c,g). For lifetime stage III (training set n = 59, test set n = 26), logistic regression and a voting classifier had the highest ROC-AUC test set score of 0.85 (Figure 4d,g). For lifetime stage IV (training set n = 60, test set n = 26), RF gave the highest ROC-AUC test set score of 0.66 ( Figure  4e,g), and finally, for lifetime stage V (training set n = 98, test set n = 42), SVM gave the highest score of 0.75 (Figure 4f,g).
In addition, a different model validation procedure was conducted via permutation tests with the entire dataset under 5-fold cross-validation conditions. The results are presented in Figure S3.   (Figure 5c and Table S3). Other selected lipid markers for lifetime stages II, IV, and V are shown in Figure 5b,d,e and Table S3. A summary of the lipid categories represented in each of the ML discriminant panels is given in Figure 5f. Phospholipids were the most represented category in all of the five lipid discriminant panels. Of all of the phospholipid classes, PC and PC-O were the most abundant species. The least represented lipid category was steroid lipids, with just one cholesterol derivative selected in the lifetime stage V (75− 100%) panel. Furthermore, of all of the lipids selected as markers, only phospholipids and fatty acyls (composed mostly of fatty acids) were selected in all of the lifetime stages. In summary, the early progression of OC was marked by increased levels of phospholipids, notably PC and PC-O, while, in contrast, later stages were marked by more diverse lipid alterations, including sphingolipids, fatty acyls, glycerolipids, steroid lipids, and phospholipids. Apart from phospholipids, sphingolipids were the most represented lipid category at stages IV and V, consisting of mostly HexCer, Cer, and SM (Figure 5f). These results agree with the lipid trajectory clustering results discussed earlier.

Prognostic Circulating Lipids in DKO Mice
Because prognostic makers are useful in providing information on the likely health outcome of cancer patients, we employed survival analysis methods to investigate lipid species predictive of the course of OC in DKO mice. First, candidate lipids were selected by comparing all 1070 lipid features in DKO lifetime stages II−V with DKO lifetime stage I. Lipid features with pvalues < 0.05 (Welch's T-test) and at least one fold change (log 2 FC, DKO lifetime stages II−V vs DKO stage I) were selected, resulting in a set of 10 different lipids in DKO lifetime stages I vs II (Figure 6a), 56 in I vs III (Figure 6b), 68 in I vs IV (Figure 6c), and 29 in I vs V (Figure 6d). A breakdown of overlapping and unique lipid features in these subsets is given in the upset plot in Figure 6e. A total of 12 lipids were present in at least three sets from various lifetime pair comparisons. These lipids were selected as prognostic candidates ( Figure  6e). Furthermore, the 19 lipid features found to be differential in at least three of the five lifetime stages (Figure 2d) were also selected as candidate prognostic lipids. All 15 DKO animals were binned into two groups based on a median split using all 31 candidate prognostic lipids. A DKO "low" group was built from mice with lipid abundances lower than or equal to the median of the relative abundances of the selected lipids, while mice with abundances greater than the median were bundled into a DKO "high" group. Three lipid species of the 31 lipid candidates had a statistically significant difference in their Kaplan−Meier (KM) curves via the log-rank test. These included PC(39:4) (p-value = 0.003, Figure 6f), PC(37:2) (pvalue = 0.02, Figure 6g), and PC(40:7) (p-value = 0.008, Figure 6h). Of the three prognostic lipids, PC(39:4) had the strongest prognostic effects with a ΔRMST of 10.96, followed by PC(40:7) (ΔRMST = 9.35) and then PC(37:2) (ΔRMST = 7.75) ( Figure S4). All of the prognostic circulating lipids had elevated levels in DKO mice compared to DKO control mice for all time points combined (Figure 6h).

■ DISCUSSION
Given that most metabolomic cancer studies are based on a snapshot of the metabolic process, 14−18 it is not surprising that an understanding of the metabolic pathogenesis of HGSC remains elusive. In this study, we performed nontargeted serum lipidomics of DKO mice, an ovarian HGSC mouse model. We examined the temporal interplay of serum lipids in ovarian HGSC progression. Ovarian HGSC originates in the fallopian tube where fallopian tube epithelial (FTE) cells may be transformed into serous tubal intraepithelial carcinoma (STIC) lesions. STIC metastasize into the ovary and then to the omentum. 23 The omentum, an extensive network of adipose tissue, provides a secondary metastasis hub, 24,25 further underscoring the importance of investigating ovarian HGSC pathogenesis through lipidome alterations. Reassuringly, our study identified similarly altered lipids as a previous study at a fixed time point, 19 validating the experimental approach applied here. As expected, and given the pathogenesis of HGSC, 23 significant lipid alterations were evident from the data analysis performed when all time points were combined. The most altered lipid classes at a global level included sphingolipids and phospholipids, with the general trend showing that the number of significant lipids for each lifetime stage increased as ovarian HGSC progressed. PC and PC-O were the most perturbed lipid classes, following perturbations shown in previous metabolomic studies. 26

Phospholipids
Phospholipids, specifically ether and ester phospholipids, are by far the predominant lipid classes present in clusters A−C of the temporal trend analyses conducted in this study, with PC and PC-O being the key lipid families. This finding is not surprising, as PC comprise approximately 40−50% of all total cellular phospholipids. 27 Furthermore, cancer cells require increased generation and maintenance of cellular membranes, largely composed of phospholipids. 28 Iorio et al. reported the activation of phosphatidylcholine-cycle enzymes in human epithelial ovarian cancer (EOC) cells. 29 In that study, the authors reported increased phosphocholine (Pcho) levels and upregulation of choline kinase (ChoK)-mediated phosphorylation, providing a plausible explanation for the observed increase in PC levels, particularly for the progression from lifetime stage I to II in clusters A−C. These data strongly suggest upregulation of the Kennedy pathway, 30 with a predominance of PC generation. Altered PC levels in ovarian cancer have been previously reported in human studies 31 and in an ovarian cancer mouse model. 26 This temporal trend for phospholipids agrees with the discriminant lipids selected for DKO classification tasks for all lifetime stages (Figure 5f). PCs and PC-Os comprise most of the lipids selected for classification within lifetime stage II. In addition, phospholipids have the highest percentage of discriminant lipids at all lifetime  (40:7)) were identified as potential prognostic circulating lipids. Of all discriminant lipids identified, most phospholipid species increased, while a few decreased, such as LPE and LPC. LPC perturbations have been reported in an ovarian cancer human study 31 and LPE species have been suggested as earlystage ovarian cancer biomarkers in another human study. 14 In a study of the triple-knock-out (TKO) HGSC mouse model, LPE and LPC were likewise altered. 26 In our study, LPE(18:1), LPC(20:4/0:0), and LPC(20:5/0:0) were selected as discriminant lipids for lifetime stage V, with decreased levels in DKO mice. LPC and LPE are the first step in Land's cycle, the biochemical pathway involved in the remodeling of PC and PE. 32 LPC and LPE are mainly derived from partial hydrolysis of PC and PE, respectively, via phospholipase A 1 and A 2 (PLA 1 and PLA 2 ). 33 Decreased relative abundances of these lipid classes at lifetime stage V can be explained by the sustained upregulation of PC and PE. Indeed, longitudinal lipidome analysis of the TKO mouse model showed that most LPC species were lower in abundance and most PC species much higher in HGSC. 26 Furthermore, in a large-scale profiling study of metabolic dysregulation in human ovarian cancer, LPC and LPE were reported to be elevated in localized epithelial ovarian cancer (EOC) and downregulated in metastatic EOC. 34 These results align with findings for lifetime stage V for LPE and LPC.
Another class of phospholipids that emerged as important were the phosphatidylinositols (PI). These lipids are the central actors in the PI and PIP 2 cycles, underpinning several mammalian cell signaling pathways. 35 There, PI is converted into phosphatidylinositol-4-phosphate (PI4P), which is further converted into phosphatidylinositol-4,5-bisphosphate (PIP 2 ) via various phosphokinases. PIP2, on the other hand, is a component of the phosphatidylinositol 3-kinase (PI3K) pathway that has been extensively implicated in cancer. 36 PI3Ks are lipid kinases that phosphorylate PIP2 at the 3-OH inositol group to yield phosphatidylinositol 3,4,5-trisphosphate (PIP3). PIP3 activates the serine/threonine protein kinase, which plays a key role in carcinogenesis. 36 The perturbation of PI levels in HGSC can be rationalized by increased phosphatidylinositol 3-kinase (PI3-kinase) activity, due to the increased copy numbers of the p110α catalytic subunit of the enzyme in ovarian cancer. 37 This altered signaling pathway has been linked to cell proliferation, 38 glucose metabolism, 39 and various types of oncogenic transformations. 40 In addition, alteration of PI levels has been reported in a DKO lipidomic study 19 and proposed as a potential trait of early-stage OC in humans. 14

Sphingolipids
Cluster D in the hierarchical clustering temporal analysis results ( Figure 3) consists mainly of ceramides (Cer) and hexosylceramides (HexCer) with a characteristic abundance spike from lifetime stage IV to V (i.e., toward the end of the animal's life cycle). Ceramides are essential intermediates in sphingolipid metabolism, acting as substrates for more complex sphingolipids or degradation products. For example, HexCer and sphingomyelins (SM) are derived from Cer, while SM and HexCer can be degraded to Cer by sphingomyelinases (SMAse) and cerebrosidases, respectively. Altered sphingolipid metabolism has been implicated in leukemia, 41 hepatocellular, 42 colorectal, 43 and ovarian cancers. 44 Long-chain ceramides have been identified as possible diagnostic biomarkers of human epithelial ovarian cancer. 44 Sphingolipid metabolism has also been implicated in regulating autophagy. 45 Autophagy's primary role is to regulate cellular homeostasis by removing damaged organelles and aggregated proteins; however, under high-stress conditions, such as nutrition starvation, autophagy contributes to maintaining cellular functions by supplying energy to the cell. 46 As such, in the early cancer stages, autophagy possesses an anti-carcinogenic function by attempting to maintain normal cellular operations. 46 On the other hand, at the late stages of cancer development, autophagy confers tumor cell survival functions to counteract metabolic stress, 47 directly explaining the temporal trends of lipids in cluster D. As such, the role of autophagy in cancer can be said to be paradoxical. Furthermore, ceramide glycosyltransferases, an enzyme class that catalyzes the formation of hexosylceramides, has been implicated in playing a role in tumor progression. 48 Overexpression of uridine diphosphate-glucose ceramide glucosyltransferase (UGCG), the gene involved in the synthesis of glucosylceramide, has also been reported in ovarian cancer cells. 48 The highest abundance increase for a discriminant lipid was for HexCer(d34:1) in lifetime stage V. Finally, six SM species were selected in the lifetime stage V classification task, all having low relative abundances in DKO mice vs DKO controls. In contrast, cluster D lipids showed overwhelmingly increased levels of Cer and HexCer at the late stages. This metabolic trend suggests a conversion of SM to Cer via SMAse to sustain the continued proliferative effects of Cer in tumor cells.

Fatty Acids, Triglycerides, and Other Derivatives
Cancer cells can shunt energy from glucose into fatty acid synthesis, 49 and the metabolic rearrangements are pivotal in cell signaling and tumor growth. 50 The observed alterations in fatty acid abundances at every single lifetime stage examined are a result of this metabolic shift. Enzymes associated with lipid syntheses, such as acetyl-CoA carboxylase (ACC) and ATP-citrate lyase (ACL), are overexpressed and involved in tumorigenesis in various tumor cell types. 51−53 Fatty acid synthase (FAS), a multienzyme protein whose main role is to synthesize palmitate from acetyl-CoA and malonyl-CoA, has also been found to be upregulated in ovarian cancer tissues and associated with poor disease prognosis. 54 Furthermore, stearoyl-CoA desaturase-1 (SCD1), the enzyme that catalyzes the production of saturated fatty acids from monounsaturated fatty acids, is upregulated in ovarian cancer stem cells. 55 Exogenous fatty acid metabolism also plays a role in ovarian cancer development. 49 For instance, fatty acid binding protein (FABP4) has been identified at the interface of adipocytes and ovarian tumor cells in omental metastases. 56 Furthermore, CD36, a member of the fatty acid transport proteins (FATP), a transmembrane transport protein that allows long-chain fatty acids into the cells, has also been implicated in breast cancer progression and metastasis. 57 Our ML algorithm selected FA species as discriminant across all lifetime stages. Five of these were decreased in DKO mice relative to controls. These species included 3-hydroxyphenyl-valerate, FA(26:1), and FA(18:3). Changes in FA levels during tumor development most likely indicate the interplay between FA synthesis and FA cell uptake, concomitant with FA metabolism associated with the synthesis of complex lipids.
Estrogens, whose significant roles in the development and metastasis of ovarian cancer are well-documented, 58 have been linked to increased levels of TG in mice 59 and humans. 60,61 This provides a biological link between estrogens and TG in ovarian cancer pathogenesis. Furthermore, in a metabolic study involving over a hundred thousand subjects and a 10-year follow-up period, serum TG were shown to positively correlate with gynecological (ovarian, endometrial, cervical) cancer risk. 62 In our study, TG(60:12) was selected as one of the cluster A lipids, with levels spiking up from lifetime stage I to II, decreasing from II to III, and then increasing in stages IV and V. In addition, two triglycerides, TG(56:9) and TG(58:9), belong to cluster D lipids, which have a characteristic spike from lifetime stages IV−V. For ML classification tasks, most TG played a discriminatory role in lifetime stage V, with eight out of nine having higher relative abundance in DKO mice. A serum metabolomics study comparing DKO mice with controls also found a triglyceride (TG 55:7) that increased in DKO mice. 19 Triglycerides are used for energy storage, which is very much needed to support cell growth as cancer progresses. This suggests the upregulation of the monoacylglycerol and glycerol phosphate pathways.
Other selected discriminant lipids included prostaglandin A1 (PGA1), an eicosanoid. This lipid was lower in DKO mice in the third lifetime stage. Higher abundances of prostaglandin and prostaglandin D2 have been found to inhibit human ovarian cancer cell growth both in vitro and in mice. 63 Similarly, A-class prostaglandins are known to have antiproliferative effects by blocking the cell cycle and activating apoptotic cascades. 64 A cholesterol derivative was also selected as a discriminant lipid in lifetime stage V, with an increased abundance in DKO mice. Cholesterol metabolites have been linked to the promotion of tumorigenesis. 65 Furthermore, the high serum cholesterol level has been linked to increased ovarian cancer risk in a prospective study. 66

■ CONCLUSIONS
We here present a deep temporal lipidomic study of an HGSC ovarian cancer mouse model. The main findings are summarized in Figure 7, pointing at numerous alterations in a variety of lipid pathways. Phospholipids were the most perturbed lipid class. They also represented the highest number of altered species at the early stages of HGSC development, pointing to cell integrity fortification processes associated with cancer progression. We also found that ceramide and hexosylceramide levels predominantly increased in DKO mice at the later stages of OC progression. It is well known that sphingolipid metabolism is linked to cancer development and progression via autophagy. In the early stages, an attempt is made to inhibit tumorigenesis; however, at later stages, those lipids assist in cancer proliferation. Furthermore, we identified sets of lipids that discriminate between DKO and DKO control mice, even at the earliest stages of disease progression. In addition, three phospholipid species were identified as circulating prognostic markers in DKO mice. These findings underscore the potential for the existence of early-stage diagnostic or prognostic lipid biomarker panels for human ovarian cancer. Given that the Dicer1-Pten double-knock-out (DKO) mouse model faithfully recapitulates human HGSC with phenotypic, histopathologic, and molecular similarities, further research steps would be to validate our findings in human population.
Survival analysis comparison of DKO and DKO control mice ( Figure S1); DKO and DKO control mice comparison via unsupervised learning methods ( Figure  S2); permutation test plots for validating the bestperforming ML models used for discriminating DKO from DKO control mice ( Figure S3); restricted mean survival times (RMST) plots for all prognostic lipid candidates ( Figure S4); 87 statistically significant (q < 0.05) lipids for the DKO vs DKO control comparison, all time points combined (Table S1); statistically significant lipid features for the comparison between DKO and DKO control mice that were present in at least three lifetime stages (Table S2); lipids selected via machine learning for each percentage lifetime stage (Table S3); machine learning results for DKO classification (Table S4); composition of the stable isotope-labeled chemical standards mixture used in UHPLC-MS (Table S5); chromatographic gradient for the RP UHPLC−MS method (