A living biobank of patient-derived ductal carcinoma in situ mouse-intraductal xenografts identifies risk factors for invasive progression

Summary Ductal carcinoma in situ (DCIS) is a non-obligate precursor of invasive breast cancer (IBC). Due to a lack of biomarkers able to distinguish high- from low-risk cases, DCIS is treated similar to early IBC even though the minority of untreated cases eventually become invasive. Here, we characterized 115 patient-derived mouse-intraductal (MIND) DCIS models reflecting the full spectrum of DCIS observed in patients. Utilizing the possibility to follow the natural progression of DCIS combined with omics and imaging data, we reveal multiple prognostic factors for high-risk DCIS including high grade, HER2 amplification, expansive 3D growth, and high burden of copy number aberrations. In addition, sequential transplantation of xenografts showed minimal phenotypic and genotypic changes over time, indicating that invasive behavior is an intrinsic phenotype of DCIS and supporting a multiclonal evolution model. Moreover, this study provides a collection of 19 distributable DCIS-MIND models spanning all molecular subtypes.


INTRODUCTION
Each year, over 70,000 women are diagnosed with ductal carcinoma in situ (DCIS) in the USA, UK, and the Netherlands alone. 1-3 DCIS is a non-invasive lesion that respects the natural tissue barriers of the breast and is, therefore, not life threatening. However, a subset of DCIS lesions eventually overrules the natural tissue barriers imposed by the healthy breast tissue, leading to invasive breast cancer (IBC), a potentially life-threatening disease. Therefore, all women diagnosed with DCIS undergo breast-conserving surgery followed by radiotherapy or mastectomy, with a subset also receiving endocrine therapy. 4 Currently, it is impossible to predict whether a DCIS lesion will stay indolent or progress into invasive disease. This poses a major clinical challenge in determining which DCIS patients to treat, with the risk of overtreating patients with a DCIS that will never progress into invasive disease. To reduce the negative impact of overtreatment in low-risk DCIS patients, yet assure proper treatment for high-risk DCIS patients, it is crucial to identify which factors determine the invasive progression of DCIS.
Most studies investigating the genetic progression of DCIS to IBC focused on synchronous DCIS-IBC, i.e., DCIS with proven potential to progress to IBC, and are therefore not informative for what drives progression of pure DCIS to IBC. To date, these studies were unable to identify a single genomic alteration predictive of progression into invasive disease, [5][6][7] indicating that progression of DCIS is likely a complex process involving various tumor cell-intrinsic and microenvironmental factors that ultimately determine whether or not a DCIS lesion remains indolent. Mouse models of DCIS are limited and do not recapitulate hormone dependence and progression from atypical cell growth into DCIS and subsequent IBC. 8 Furthermore, the study of human DCIS tissue from patients is limited by the fact that the natural progression cannot be followed over time within the same patient. As a result, the dynamics and natural course of DCIS progression remain poorly understood. Since human DCIS lesions initiate inside the mammary ducts, a mouse-intraductal (MIND) injection technique was developed to inject human cell lines and patient-derived DCIS samples into the mammary ducts of immunocompromised mice. 9,10 This method results in DCIS lesions retaining the sample-specific estrogen receptor (ER), progesterone receptor (PR), and HER2 expression, and can be used to follow the progression of DCIS lesions over time. 8,[11][12][13] Here, we utilize the MIND injection technique to generate an extensively characterized living biobank of 115 patient-derived DCIS xenograft models representing the full spectrum of human DCIS. Similar to DCIS patients, a subset of DCIS-MIND models showed progression into IBC over time, enabling us to link the natural progression of DCIS to histopathological and molecular data. Using this approach, we identified HER2, MYC, and PTK6 amplification, high copy number aberration (CNA) burden, solid growth pattern, grade 3, high Ki67 level, and distinct 3D growth pattern as risk factors, whereas a luminal A subtype or columnar growth correlate with low-risk DCIS. Sequential transplantation resulted in a collection of 19 distributable DCIS-MIND models, including 2 luminal A, 4 luminal B, 2 ER + HER2 + , and 11 ER À HER2 + models. Our DCIS-MIND biobank provides a useful resource to study the natural progression of human DCIS and to identify factors associated with invasive progression.

Establishing a living biobank of DCIS-MIND models
To study the natural progression of DCIS, we set out to create a living biobank of patient-derived MIND models of DCIS that reflects the full histopathological spectrum of DCIS, including pure DCIS, DCIS with adjacent IBC, and DCIS with micro-invasion. To this end, we collected fresh DCIS samples from patients who underwent surgery at the Netherlands Cancer Institute. Fresh DCIS tissue was immediately processed into single cells by overnight enzymatic digestion and intraductally injected into immunodeficient NOD-scid;Il2rg null (NSG) mice. In total, we obtained 130 DCIS samples, which were intraductally injected into 1,956 mammary glands of 767 NSG mice supplemented with estradiol (E2) ( Figure 1A; Table S1). The 130 patient samples included 85 pure DCIS (65%), 6 DCIS with micro-invasion (5%), and 39 DCIS with adjacent IBC (30%) ( Figure S1A). Twelve months after intraductal injection, the injected mammary glands were dissected and subsequently analyzed using immunohistochemistry (IHC), whole-gland 3D imaging, and genomic and transcriptomic analysis ( Figures 1A and S1B). The collection of samples included grade I (12%), II (48%), and III (39%) DCIS, with an overall take rate of 88% (115 out of 130) ( Figures 1B  and 1C).
DCIS models retain histological and mutational features of the primary lesions DCIS can be classified into subgroups based on their histological features, including growth pattern, hormone receptor status, and grade. 14 The collected DCIS-MIND models reflected four types of growth patterns described in patients: solid, cribriform, micropapillary, and columnar ( Figures 1D and S1C). The growth patterns of the primary DCIS lesions were retained in 76% (87 out of 115) of the MIND models. In most cases where the growth pattern did not match, the DCIS-MIND lesions showed columnar growth instead of a solid, cribriform, or micropapillary growth pattern observed in the primary DCIS ( Figure 1E). Time point analyses revealed that columnar lesions observed at early time points frequently developed into a different growth pattern at later time points, indicating that these DCIS lesions may require a longer time to progress into their final growth pattern ( Figure 1F).
We also checked whether the DCIS-MIND models recapitulated the expression of established biomarkers in breast cancer, i.e., ER, PR, HER2, and Ki67. Indeed, we identified a high concordance between the original patient sample and the DCIS-MIND models for the expression of hormone receptors, Ki67, and the surrogate molecular DCIS subtypes: luminal A, luminal B, HER2 + , and basal DCIS ( Figures 1G and 1H). Expression was similar in 97 out of 105 models (92%) for ER, 79/105 models (75%) for PR, 96/107 models (90%) for HER2, and 85/ 100 models (85%) for Ki67 ( Figure S1D). The discordance observed in certain models might be caused by heterogeneity in the patient sample or progression of the DCIS cells after engraftment.
Comparative genomic analyses of the primary DCIS lesions and the corresponding MIND models using whole-exome sequencing (WES) (n = 60), whole-genome sequencing (WGS) (n = 11), or panel sequencing (n = 18) revealed that our collection  15 See also Figure S2 and Tables S2 and S3. Article of DCIS-MIND models reflects the heterogeneity of human DCIS with respect to mutations in known cancer genes and genomic aberrations (Figure 2; Tables S2 and S3). 16 The most common alterations in our DCIS biobank include ERBB2 amplifications (34.8%) and mutations in TP53 (24.7%), PIK3CA (24.7%), GATA3 (18.0%), and AKT1 (12.4%) (Figure 2A). RAD51 foci analyses on a subset of MIND models (n = 20) showed that DCIS accumulate high levels of dsDNA damage (yH2AX foci) and 20% of DCIS PDX are predicted to be homologous recombination deficient (HRD), comparable with the proportion of IBC with genetic HRD signatures and/or genomic scars ( Figure S2A). 17 To further validate the concordance between primary DCIS lesions and the MIND models we performed DNA CNA analysis for 34 matched pairs of primary and DCIS-MIND samples (Figure S2C), as well as WES or targeted sequencing for 12 pairs ( Figure 2B). Both analyses showed high concordance between the primary lesion and MIND model on a global level ( Figure 2B). At the local level, small variations were detected, such as absence or addition of a mutation, which could be explained by variant allele frequencies ( Figure S2B), or an additional gain or loss ( Figure S2C), which might indicate evolution of the DCIS-MIND model or heterogeneity of the primary lesion.
In addition, we performed transcriptomic analyses of matched primary and DCIS-MIND samples. Single sample gene set enrichment analysis (GSEA) showed correlation at the hallmark gene set level ( Figure S2D), and analyses of PAM50 genes 18,19 revealed high concordance with the IHC subtype (83%, 111 out of 134) ( Figure 2C). Moreover, primary and DCIS-MIND samples clustered together based on the PAM50 signature (Figure 2C). Finally, we performed DCIS-specific subtyping as published by Strand et al., 15 which showed clear separation of DCIS-MIND samples into ER-high, ER-low, and quiescent subtypes ( Figure 2D).
Overall, the high concordance for growth patterns, molecular subtype, mutational landscape, CNA profiles, and gene expression patterns indicates that our DCIS-MIND models faithfully recapitulate the original patient lesions and reflect the heterogeneity observed in the DCIS patient population, and are therefore a representative collection of models to study DCIS progression.
HER2 overexpression, solid growth pattern, high proliferative index, and grade 3 correlate to invasive progression We utilized our DCIS-MIND biobank to follow the natural progression of DCIS over 12 months and identify which cases represent high-risk DCIS or low-risk DCIS. Progression of DCIS lesions was scored using both 2D H&E analysis and a 3D whole-gland imaging technique, which enabled us to identify all human-derived DCIS lesions using human-specific Ku80 immunolabeling. In both 2D sections and 3D images, we scored the number of progressed samples showing protrusion of DCIS cells through the murine myoepithelial cell layer (identified by alpha-smooth muscle actin labeling). Lesions were classified into non-invasive (no protrusions), micro-invasive (<2 mm protrusion), or invasive (>2 mm protrusion) subtypes (Figures 3A and  S3A). Overall, 54% of DCIS samples remained non-invasive, while 39% and 7% of DCIS samples showed micro-invasion or invasion, respectively ( Figure 3B).
We next sought to identify differences between the progressed and non-progressed DCIS samples. Weight of the tissue received from the surgical material did not influence the likelihood of DCIS invasive progression, although the total number of collected DCIS cells was somewhat higher in DCIS samples showing invasive progression compared with the nonprogressive DCIS samples ( Figures S3B and S3C). In addition, no difference in invasive progression was observed between pure DCIS samples and DCIS samples with adjacent IBC (Figures 3C and S3D). Hence, the natural evolution of DCIS in MIND models is not affected by the size of the obtained resection material or the presence of adjacent IBC.
The ability to follow the natural progression of DCIS-MIND lesions provides a unique opportunity to identify clinical biomarkers associated with invasive progression. We therefore performed univariate linear regression analysis to calculate odds ratio (OR) related to invasive progression for clinical parameters, such as molecular subtype, growth pattern, grade, menopausal state, age, and BMI, as well as the new DCIS subtyping. We found significantly higher OR scores for DCIS lesions with a HER2 + molecular subtype (OR = 16.49; 95% CI, 3.62-75.14), ER negativity (OR = 11.55; 95% CI, 2. 50 To further test the clinical validity of our dataset, we used the onco-type DX DCIS, 20 COX2 + P16 + Ki67, 21,22 and 812-gene 15 classifiers for risk of DCIS recurrence. On primary DCIS data, the COX2 + P16 + Ki67 + classifier showed a significant OR for DCIS progression, while the onco-type DX DCIS showed a trend for predicting low-, intermediate-, or high-risk DCIS. On the PDX data, both the onco-type DX DCIS and 812-gene classifiers showed significant prediction of DCIS progression, whereas the COX2 + P16 + Ki67 + showed a trend toward a higher risk of invasive progression ( Figures 3C and S3F).
Overall CNAs as well as specific aberrations in MYC, ERBB2, and PTK6 correlate to invasive progression We next used genomics data from the patient samples and DCIS-MIND models to identify additional biomarkers. CNA-seq analysis revealed a significant higher percentage of overall (C) Odds ratio table showing risk scores (univariate and multivariate linear regression models) for association between common clinical parameters, DCIS subtypes as proposed by Strand et al., 15 and multiple invasive recurrence classifiers for invasive progression of DCIS-MIND lesions, identifying HER2 overexpression and solid growth patterns as independent risk factors. Molecular subtype and growth pattern parameters are based on DCIS-MIND characteristics while the other parameters are based on patient characteristics. See also Figure S3.

Article
CNAs in DCIS samples with invasive progression compared with non-invasive DCIS samples in the MIND models ( Figure 4A), whereas this trend was not visible in the patient samples (Figure S4A). Notably, the overall lower CNA percentage in the primary DCIS samples could be caused by quenching of the CNA signals due to the presence of normal DNA from myoepithelial and stromal cells, resulting in CNAs with smaller amplitude that are more difficult to pick up.
To identify CNAs that are more common in DCIS with invasive progression, we performed both KCSmart and GISTIC2 analyses (Table S4). KCSmart identified 1q gains and 16q losses as common aberrations in indolent DCIS and DCIS with invasion, suggesting that these aberrations represent early events during development of DCIS. In addition, both KCSmart and GISTIC2 identified gains/amplifications of FGFR1, MYC, CCND1, ERBB2, and PTK6, as well as losses as CDKN2A/B, BRCA1, and MAP3K1 to occur more frequently in DCIS-MIND models with invasive progression (Figures 4B and S4C). These findings were confirmed in the CNA-seq analysis of the primary DCIS lesions, with the exception of CCND1 ( Figures S4B and S4C). Mutational analysis did not reveal significant differences between non-invasive DCIS and DCIS with invasive progression except for mutations in KMT2D, which were more common in non-invasive DCIS ( Figures S4D and S4E).  Table S5). Interestingly, we only found limited significant differences in the gene expression profiles from the matched primary DCIS lesions when comparing progressed with non-progressed DCIS-MIND models. However, analysis of transcriptomic data from the MIND models did reveal more significant differences in gene expression between indolent DCIS and DCIS with invasive progression, suggesting that stromal cells might obscure significant changes in the primary patient samples. Transcriptomic analysis of the MIND models again identified high expression of ERBB2 and Ki67 as risk factors, as well as additional factors such as S100A8/A9 and FOXD1, which are described to drive breast cancer proliferation. [23][24][25] GSEA of primary DCIS lesions and the corresponding PDX lesions identified upregulation of hallmark gene sets associated with proliferation (E2F targets, G2M checkpoint, MYC targets) ( Figure 4D, Table S5).
As DCIS is a heterogeneous disease, we did not expect one molecular marker to explain all cases with invasive progression. Therefore, we used detection of imbalanced differential signal (DIDS) scores 26 to identify genes that are differentially expressed in subgroups of the DCIS models. DIDS analysis on DCIS-MIND samples identified previously mentioned genes such as high expression of ERBB2 in DCIS with invasive progression or high expression of ESR1 in non-invasive DCIS ( Figure 4E; Table S5). DIDS scores that were found in both DCIS-MIND samples and primary samples with a score in the same direction were NSD3 and PTK6 (Figures 4F and S4F). Increased copy number gains of PTK6 were also identified in DCIS models with invasive progression ( Figure 4B). PTK6 is a tyrosine kinase, which was previously associated with breast cancer progression and may therefore be a candidate biomarker for high-risk DCIS, as well as a potential therapeutic target. 27,28 Whole-mount 3D analysis reveals two distinct growth patterns with strong correlation to invasive progression To assess whether growth characteristics of DCIS are associated with indolent or invasive growth, we analyzed the 3D wholemount images. Interestingly, based on their 3D morphology, DCIS lesions could be classified into two distinct growth patterns: replacement growth and expansive growth. In DCIS with replacement growth, the human DCIS cells replace the existing mouse luminal epithelium and populate the ducts, but do not severely disturb the architecture of the ducts. DCIS with expansive growth is characterized by tumor cells that do not grow within the ducts, but rather expand perpendicularly to the ducts ( Figures 5A and  S5A). The expansive lesions showed a higher volume and more spherical morphology when compared with replacement lesions, which had a smaller volume and a more elongated shape ( Figures 5B and 5C). Expansive lesions remained more localized within the ductal tree and caused loss of contacts between the myoepithelial cells resulting in a discontinuous myoepithelial cell layer ( Figures 5A and S5A).
As a result of the potential mechanical pressure on the myoepithelial cells and the basement membrane, we identified two types of morphological aberrations. On the one hand, lesion outgrowth led to bulging and hyperbranching of the ductal epithelium ( Figure S5E). Not only did the epithelium containing the lesion show this hyperbranching phenotype, but the healthy neighboring ducts also adopted a hyperbranched morphology ( Figure S5E). This suggests that hyperbranching might be induced by a paracrine signal from either the stressed myoepithelial cells or the tumor cells. On the other hand, expansive lesion growth resulted in breakage of the myoepithelial cell layer and the basement membrane and subsequent invasive growth ( Figures 3A, 5A, S3A, and S5B).
Importantly, invasive events almost exclusively occurred in DCIS models with expansive growth, whereas the vast majority of the models with replacement growth stayed indolent ( Figures 5D, S5B, and S5C). Analysis of early and late time points (6 vs. 12 months after injection) showed that lesions with replacement growth remained indolent, while most expansively growing lesions that were indolent at the early time point progressed to invasive growth at the late time point ( Figure S5D).

Article
Interestingly, the hyperbranched expansive lesions did not result in a significant difference in invasive growth compared with lesions without hyperbranching ( Figures S5F and S5G). To assess whether these 3D growth patterns represent DCIS growth in patients we used a modified FLASH tissue-clearing protocol 29 to perform 3D imaging of FFPE DCIS patient samples, which revealed two similar growth patterns ( Figures 5E-5G).
To find an explanation for the difference in 3D growth pattern, we compared gene expression profiles of replacement and expansive samples and found extracellular matrix protein 1 (ECM1) to be significantly upregulated in expansive lesions (Figure 5H). ECM1 is a secreted glycoprotein reported to be a marker of poor prognosis in multiple cancer types, including breast cancer. 30 We validated this result using immunofluorescence staining of ECM1 in thick tissue sections of DCIS-MIND lesions ( Figure 5I). Interestingly, the overall ECM1 signal did not differ between lesions with replacement and expansive growth (Figure S5H). However, extracellular ECM1 was significantly more present in expansive lesions compared with replacement lesions and normal ducts ( Figure 5J). Together, these results indicate that the 3D growth pattern is a potential predictor of invasive progression of DCIS, which could be driven by increased deposition of extracellular ECM1.

DCIS-MIND models show phenotypic and genotypic stability over multiple passages
To study whether DCIS samples evolve and acquire a more aggressive phenotype over time, we sequentially transplanted DCIS cells over multiple generations in MIND models. Six to 12 months after transplantation, the human DCIS cells were isolated from the injected mammary glands using magnetic bead sorting. This leads to an enrichment of human EpCAM + cells that were subsequently re-injected into NSG mice ( Figures 6A  and S6A). Of the 115 first-generation DCIS models (P0), we were able to successfully re-transplant 42 models (36%) to a second-generation (P1) ( Figure 6B). For 16 of these models, a third-generation (P2) could be generated ( Figure S6B). Importantly, throughout these serial transplantations, intrinsic phenotypes such as molecular subtype, growth pattern, and invasive potential were maintained ( Figures 6C-6E). Also, analysis of CNA and RNA-seq data from a subset of models revealed high stability between successive transplantations ( Figures 6F   and 6G). Only the expression profile of DCIS123 P0 and P1 did not show a good correlation, potentially caused by a heterogeneous lesion with the presence of both DCIS and LCIS. The limited phenotypic and genotypic changes over multiple generations are in line with the fact that synchronous DCIS-IBC have little genomic differences. 31,32 Therefore, our data strongly suggest that the propensity to become invasive is a DCIS-intrinsic phenotype that does not evolve over time.
In addition, we created a biobank of 19 DCIS-MIND models that can be distributed to other labs as we obtained more than 1 million cells after each generation of xenografts. This collection includes 2 luminal A, 4 luminal B, 2 ER + HER2 + , and 11 ER À HER2 + DCIS models ( Figures 6H and S6C). These distributable models are suited for future research into biomarkers and therapeutics for prevention of DCIS invasive progression. As an example, we tested estrogen dependence of four ER + models by transplanting them in mice that were either supplemented with E2 in the drinking water or deprived of estrogen by ovariectomy. For all four ER + models, DCIS lesions grew significantly faster in the estrogen-supplemented mice than in the ovariectomized mice, indicating that the DCIS-MIND models retained their estrogen sensitivity ( Figures 6I and S6D-S6G). Intraductal growth can be difficult to assess via palpation and caliper measurements. Therefore, we introduced AkaLuciferase (AkaLuc) in the DCIS118 and DCIS088 models and could measure intraductal growth using bioluminescence imaging as early as 4 weeks after intraductal injection ( Figures S6H and S6I).

HER2 expression promotes invasive progression of DCIS
The distributable DCIS-MIND models also permit experimental validation of candidate drivers of invasive progression of DCIS. HER2 expression was consistently found to be correlated with invasive progression in multiple analyses, including CNA, IHC, and RNA expression ( Figures 3C, 4, and S4). To test whether HER2 overexpression also leads to downstream pathway activation, we performed IHC stainings for p-ERK and p-AKT (Figure 7A). The vast majority of HER2 + DCIS (18 out of 19) showed high expression of p-ERK and/or p-AKT, while most HER2 À controls (3 out of 4) did not ( Figure 7B).
To investigate the effect of HER2 expression on DCIS progression, we tested the effect of HER2 inhibition in HER2 + MIND (B) Volumes of DCIS lesions for replacement and expansive growth. Data are represented as mean ± SEM. Two-tailed Mann-Whitney test. **p < 0.01. (C) Width to length ratios of DCIS lesions for replacement and expansive growth, revealing a more elongated shape for lesions with replacement growth vs. a more spherical shape for lesions with expansive growth. Data are represented as mean ± SEM. Two tailed Mann Whitney test. **p < 0.01. (J) Quantification of extracellular ECM1 expression in DCIS lesions for expansive or replacement growth and normal mammary ducts. Lines indicate Q 1 , median, and Q 3 . Two-tailed Mann-Whitney test. ***p < 0.001, *p < 0.05, ns, not significant. See also Figure S5. Article models and conversely overexpressed HER2 in a HER2 À MIND model. Treatment of seven invasive HER2 + DCIS models with herceptin effectively prevented outgrowth of palpable tumors in 5 out of 7 models ( Figure 7C and S7A-S7E). Only two models (DCIS156 and DCIS169) did not respond to herceptin treatment ( Figures 7D and S7F-S7G). Whole-mount analyses revealed significant growth with invasion in the control group, while the responsive treatment groups showed few non-invasive and small lesions ( Figure 7E). In addition, we overexpressed HER2 by lentiviral transduction in a non-invasive HER2 À DCIS model (DCIS098). Stable and functional overexpression of HER2 upon transduction was validated in HEK293T cells prior to transduction of DCIS098 cells (Figures 7F, 7G, and S7B). Six months after injection, whole-mount analyses revealed that the mock-treated cells grew as non-invasive replacement lesions, whereas the HER2-overexpressing cells grew as an invasive expansive lesion ( Figure 7H). Together, these data demonstrate the causal role of HER2 expression in the invasive progression of DCIS and underscore the utility of DCIS-MIND models for functional validation studies.

DISCUSSION
Although DCIS accounts for 20%-25% of all newly diagnosed breast cancers, its natural progression is still poorly understood and consensus biomarkers for DCIS progression are lacking, prompting the need for preclinical in vivo models that permit longitudinal monitoring of DCIS growth and progression. Previous studies using cell lines such as MCF10DCIS.com and SUM225 or small collections of DCIS xenografts have not led to reliable biomarkers for DCIS progression, mainly because cell lines only model ER À DCIS and because the small number of available xenografts did not account for the heterogeneity of the disease. 13,33,34 Here, we report the generation of a large biobank of 115 orthotopic DCIS-MIND models recapitulating the molecular and histological heterogeneity of the patient population. 35,36 Monitoring of the natural progression of DCIS of our biobank showed that 46% of DCIS cases progress into invasive disease, suggesting that around half of the DCIS lesions in patients would stay indolent if left untreated. Moreover, we provide a collection of 19 distributable DCIS-MIND models, including luminal A, luminal B, ER + HER2 + , and ER À HER2 + models. This diverse and phenotypically stable platform vastly increases the number and diversity of in vivo models of DCIS and fulfills the need for better models to study the natural evolution of DCIS. Finally, this extensively characterized biobank vastly increases the amount of omics data available for DCIS and offers the opportu-nity to identify and validate biomarkers related to invasive progression of DCIS.
Commonly studied biomarkers in DCIS include nuclear grade, comedonecrosis, tumor size, and expression of ER, PR, HER2, and Ki67, but this has so far not led to any consensus. [37][38][39] For example, there are conflicting reports regarding the role of HER2 expression in DCIS progression. 7,[40][41][42][43] These studies focus on different outcomes including recurrence of DCIS or IBC and the samples analyzed have a large variation in treatment history, such as extent of surgery and adjuvant treatment. Our models do not have these limitations, and thus enabled us to identify multiple prognostic factors for invasive progression of DCIS, including solid growth, comedonecrosis, grade 3, ER and PR negativity, HER2 positivity, and high Ki67, whereas columnar growth or a luminal A subtype are indicative of lowrisk DCIS. These findings may aid stratification of DCIS patients into high-and low-risk subgroups, as is currently being done in clinical trials such as LORIS, LORD, COMET, and LORETTA. [44][45][46][47] Indeed, our findings provide strong support for the decision to exclude patients with grade 3, comedonecrosis, or a HER2 + subtype DCIS from active surveillance in these trials.
Comprehensive molecular profiling of all DCIS-MIND models suggests that 1q gain and 16q loss are necessary for DCIS initiation, while additional aberrations are needed for promoting invasive progression. These findings are in line with Geyer et al., 48 who propose that low-grade DCIS have 1q gains and 16q losses, whereas more aggressive high-grade lesions have additional aberrations in for example 8q (MYC), 17q (ERBB2) or 20q (PTK6). Interestingly, our study identified significant CNAs and genes that strongly overlap with the non-significant CNAs identified by Strand et al. 15 (i.e., MYC (8q24), ERBB2 (17q12), PTK6 (20q13), FGFR1 (8p11.23)). RNA-seq analyses further confirmed the correlation between risk of invasive progression and high expression of ERBB2 (HER2), PTK6 and Ki67, which have all been described previously as potential risk factors for tumor progression. 27,28,37,41,42 RNA-seq analyses also highlighted the relevance of DCIS recurrence classifiers (onco-type DX DCIS-, COX2 + P16 + Ki67 + and 812 gene classifier) for prediction of invasive progression in untreated DCIS. Altogether, this demonstrates the potential of our in vivo DCIS-MIND platform to identify significant associations for non-significant trends observed in human studies.
Beyond classical genomic and transcriptomic biomarkers, lesion morphology may provide another source of predictive information, as demonstrated by our observation that two distinct 3D growth patterns, i.e., replacement or expansive growth, correlate strongly with invasive progression and are able to predict invasive progression better than any other marker.  Article Importantly, we show that similar growth patterns also occur in DCIS specimens from patients, indicating that 3D pathology of human breast cancers could yield prognostic biomarkers which cannot be uncovered with 2D pathology. 49,50 The capability to monitor in vivo evolution of DCIS for prolonged periods of time via serial transplantation enabled us to evaluate the different evolutionary models that have been proposed. The independent evolutionary model postulates that DCIS and IBC evolve in parallel and do not necessarily share genomic aberrations, suggesting an independent origin of DCIS and IBC. 51,52 In contrast, the evolutionary bottleneck model postulates that multiple DCIS clones co-exist, after which a single clone acquires the propensity to break through the basement membrane and expand into an invasive tumor mass. This model is supported by several studies showing overlapping mutations and CNAs between DCIS and adjacent IBC. 31,[53][54][55] Single-cell sequencing studies found evidence for a multiclonal invasion model, in which multiple cell lineages are able to invade into the stroma after the degradation of the basement membrane. 32,56 Our MIND models show remarkable genetic and phenotypic stability during serial passaging, which is concordant with previous data showing that synchronous DCIS-IBC have marginal genomic differences. 31,32 Importantly, also the invasive behavior of our DCIS-MIND models remains stable during serial passaging, showing that the ability of DCIS lesions to become invasive is an intrinsic phenotype that does not evolve over time, thus lending strong support to the multiclonal evolutionary model 32 In conclusion, this work provides a large, well-characterized resource of patient-derived in vivo models recapitulating the full heterogeneity of DCIS. These models may facilitate the identification and validation of biomarkers related to invasive progression of DCIS, and thereby foster the development of more tailored treatment strategies for DCIS patients.
Limitations of the study An important limitation of our DCIS-MIND models is the lack of immune cells and human stroma as our model requires immunodeficient mice. Multiple studies have stressed the importance of the tumor stroma and immune infiltrate and have shown correlations between the tumor microenvironment and the progression of DCIS. 57,58 Efforts have been made to introduce human immune cells to immune deficient mouse models, but these do not fully recapitulate the human immune system. 59,60 In addition, we are limited to the 1-to 2-year lifespan of our mouse models, whereas the progression of DCIS in women can take years or even decades.
Finally, the slow growth kinetics of specifically non-invasive DCIS models limits the possibilities of growing large numbers of cells for additional experiments and multi-omics analyses.
These analyses are therefore most likely slightly biased toward faster-growing and larger DCIS lesions.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

DECLARATION OF INTERESTS
The authors declare no competing interests. Mice As a mouse model we used 6-8 weeks old NOD-scid IL2Rgammanull (NSG) female mice obtained from Jackson Laboratories and bred in-house for experimental cohorts. All animal experiments were approved by the Animal Welfare Committee of The Netherlands Cancer Institute (NKI) in accordance with national guidelines. Animals were maintained in the animal department of the NKI, housed in individually ventilated cages (IVC) under specific pathogen-free (SPF) conditions, and received food and water ad libitum.

Human samples
Freshly resected DCIS material for establishment of PDX models was provided by the gross room (Antoni van Leeuwenhoek Hospital) and dissociated to single cells for intraductal implementation into immunocompromised mice. The study was approved by the institutional review board (NKI) and all subjects provided informed consent.

METHOD DETAILS
Patient derived DCIS xenograft models Freshly resected DCIS material was provided by the gross room and mechanically dissociated. Next an overnight digestion at 37⁰ C with continuous movement was performed containing 5 mg Collagenase Type IV (Thermo Fischer), 0,24 mg Hyaluronidase (Sigma), 200 mg BSA (Sigma) and 5 ml Gentamycin (Invitrogen) in 10 ml Advanced DMEM/F12 (Gibco) supplemented with 1% Penicillin-Streptomycin (Invitrogen), 1% L-Glutamine (Gibco) and 1mM HEPES (Sigma) per 100 mg of tumor tissue. Cells were further digested with 0.25% Trypsin-EDTA (Invitrogen) for 1 minute followed by incubation in 10U/ml DNAse (Sigma) in 4 ml Advanced DMEM/F12 while being vortexed for 3-5 min. Finally, the solution was filtered through a 70 mm cell strainer (BD Falcon) and counted before being intraductally injected into 6-8 weeks old NOD-scid IL2Rgammanull (NSG) female mice. Intraductal injections were performed under anesthesia (2,5% Isoflurane) using a 50 ml Hamilton syringe and a 34-gauge needle (Point style 4) to deliver cells into the mammary gland as previously described Behbod et al, 10 only without snipping the nipple and surgically opening the mouse. 20 ml of PBS (with 2ul of trypan blue) containing 25.000 cells were injected. Mice received 17b-estradiol (E2) supplementation either by slow-release pellets (0.18 mg, 90-day release pellets) or E2 supplementation (Sigma, E2758) in the drinking water (4 ug/ml). After 6-or 12-months mice were sacrificed and mammary tissues were fixed and embedded or taken on PBS for further processing.

Magnetic bead sorting and re-transplantation
In order to sequentially transplant DCIS-MIND models, intraductally injected mammary glands were excised at 12 months after intraductal injection and digested overnight as described before. The obtained single-cell solution is dissolved in 300 ml PBS with 0.5% BSA plus 100 ml of human Fcr blocking reagent (Miltenyi Biotec). Mouse cells were then magnetically labeled with mouse MHCI and MHCII antibodies (BD Parmingen) followed by MACS Anti-Biotin MicroBeads UltraPure (Miltenyi Biotec), and negatively sorted for human DCIS cells using LD columns (Milteny Biotec). A small part of the sorted cells was then stained with anti-human CD326 (EpCAM)-eFluor660 (eBioscience) and analyzed by fluorescence-activated cell sorting (FACS) for purity, while the rest of the cells was intraductally injected with 25.000 DCIS cells per mammary gland.
Estrogen sensitivity of ER + DCIS-MIND models Female NSG mice were ovariectomized or sham operated at 6-8 weeks of age, one week before intraductal injection. Sham operated mice received E2 supplementation (4mg/ml, Sigma) in the drinking water starting 1 week before surgery until the end of experiment. Mice were monitored weekly and tumor volume was measured with a caliper; Volume was calculated using the formula: V= (width^2)*length/2. Mice were killed by CO2 asphyxiation, 6-12 months after intraductal injection or when cumulative tumor volume exceeded 2000 mm3. Mammary glands were fixed and embedded and if no palpable tumor was present, tumor area was measured from FFPE slides using a human specific anti-Ku80 antibody (Cell Signaling).

Herceptin (traztuzumab) treatment of mice
Mice, intraductally injected with HER2 + DCIS (DCIS063 or DCIS088), were treated weekly with 10 mg/kg Herceptin (Trazimera, Pfizer in 0.9% NaCl) or vehicle one month after intraductal injection by intraperitoneal injection (I.P.). Mice were monitored weekly and tumor volume was measured with a caliper; Volume was calculated using the formula: V= (width^2)*length/2. Mice were sacrificed by CO2 asphyxiation, 12 months after intraductal injection or when cumulative tumor volume exceeded 2000 mm3.

Lentiviral expression in human DCIS of ERBB2 or Akaluciferase
Primary or PDX DCIS tissue was digested to a single-cell solution as described before and was transfected with a lentivirus (ERBB2-GFP lentivirus, Origene, RC222909L4, or Akaluciferase, courtesy of the lab of Shinae Kizaka-Kondo cloned into the pRRL vector backbone, Addgene plasmid #31485) or vehicle by putting the cells and the virus in a falcon tube and spin for 2 hours at 2500 rpm at RT. Transfected cells were then directly intraductally injected into 6-8 weeks old female NSG mice. 6 -12 months after injection mice were sacrificed and mammary glands were fixed and embedded or processed for 3D whole-mount imaging. The ERBB2-GFP lentivirus functionality was tested by transfecting HEK-293T cells (ATCC) and performing a Western Blot with a HER2/ErbB2 and pHER2/ErbB2 antibody (Cell signaling) blocked in milk powder.

In vivo bioluminescence imaging
In vivo bioluminescence imaging was performed with Akalumine-HCl (Courtesy of Leiden university). Akalumine was dissolved at 2 mg/ml in sterile H2O and stored at À20 C. Akalumine-HCl solution was injected i.p. (0.01 mL/g body weight) and animals were anesthetized with 2-3% isoflurane. Light emission was measured 15 min after Akalumine administration by using a cooled CCD camera (IVIS; Xenogen), coupled to Living Image acquisition and analysis software (Living image 4.3 PerkinElmer) over an integration time of 1 min. Signal intensity was quantified as the Flux (photons per second) measured over the region of interest.

Immunohistochemistry (IHC)
Immunohistochemistry for ER, PR, HER2, p-ERK, p-AKT and Ki67 of the FFPE tumor samples was performed on a BenchMark Ultra autostainer (Ventana Medical Systems). Briefly, paraffin sections were cut at 3 mm, heated at 75⁰C for 28 minutes and deparaffinized in the instrument with EZ prep solution (Ventana Medical Systems). Heat-induced antigen retrieval was carried out using Cell

RAD51 assay (immunofluorescence staining and scoring)
The following primary antibodies were used for immunofluorescence: rabbit anti-RAD51 (Abcam ab133534, 1:1000), mouse antigeminin (NovoCastra NCL-L, 1:60), rabbit anti-geminin (ProteinTech 10802-1-AP, 1:400), mouse anti-y-H2AX (Millipore #05-636, 1:200). Goat anti-rabbit Alexa fluor 568 (Invitrogen; 1:500), goat anti-mouse Alexa fluor 488 (Invitrogen; 1:500), donkey anti-mouse Alexa fluor 568 (Invitrogen; 1:500), and goat anti-rabbit Alexa fluor 488 (Invitrogen; 1:500) were used as secondary antibodies. For target antigen retrieval, FFPE sections were microwaved for 20 min at 110 C in DAKO Antigen Retrieval Buffer pH 9.0. Sections were cooled down in distilled water for 5 min, then permeabilized with DAKO Wash Buffer (contains Tween-20) for 5 min, followed by incubation in blocking buffer (DAKO Wash Buffer with 1% bovine serum albumin) for 5 min. Primary antibodies were diluted in DAKO Antibody Diluent and incubated at room temperature for 1 h. Sections were washed for 5 min in DAKO Wash Buffer followed by 5 min in blocking buffer. Secondary antibodies were diluted in blocking buffer and incubated for 30 min at room temperature. The 2-step washing was repeated followed by 5-min incubation in distilled water. Dehydration was performed with increasing concentrations of ethanol (70%, 96% and 100%). Sections were mounted with DAPI ProLong Gold antifading reagent and stored at -20 C. Biomarkers were quantified on FFPE PDX by scoring. Biomarker scoring was performed onto life images using a 60x-immersion oil lens. The RAD51 score represents the percentage of geminin + tumor cells with five or more RAD51 nuclear foci and the pre-defined cut-off of 10% was used to call homologous recombination (HR) proficient (HRP) and deficient (HRD). Samples with low yH2AX (<25% of geminin + cells with yH2AX foci) or with <40 geminin + cells were not included in the analyses, due to insufficient endogenous DNA damage or tumor cells in the S/G2-phase of the cell cycle, respectively. [76][77][78] Whole-mount immunofluorescence staining of mammary glands Intraductally injected mammary glands were dissected and incubated in a mixture of collagenase A (1 mg/ml, 10103586001, Merck) and hyaluronidase (50 mg/ml, H3506-1G, Merck) at 37⁰C for optical clearance, fixed in periodate-lysine-paraformaldehyde (PLP) buffer (1% paraformaldehyde (PFA; Electron Microscopy Science), 0.01M sodium periodate, 0.075M L-lysine and 0.0375M P-Buffer (0.081M Na2HPO4 and 0.019M NaH2PO4;pH 7.4) for 2h at room temperature, and incubated for at least 3h in blocking buffer containing 1% bovine serum albumin (Roche Diagnostics), 5% normal goat serum (Monosan) and 0.8% Triton X-100 (Merck) in PBS. Primary antibodies were diluted in blocking buffer and incubated overnight at room temperature. Secondary antibodies diluted in blocking buffer were incubated for at least 6h. Nuclei were stained with DAPI (0.1 mg/ml; Sigma-Aldrich) in PBS. Glands were washed with PBS and mounted on a microscopy slide with Vectashield hard set (H-1400, Vector Laboratories). Primary Pathology All IHC and H&E slides were scored blindly by a pathologist in Slide Score (www.slidescore.com) (both from DCIS-MIND models and primary DCIS). ER, PR was scored on a scale from 0-100% (R10% is positive), HER2 was scored as 0, 1, 2 or 3+ (3+ is positive), Ki67 was scored on a scale from 0-100%. Growth of human DCIS cells was assessed by Ku80 stainings as either negative or positive. Ku80 stainings and H&E stainings were scored blindly by a pathologist for grade, calcifications, periductal fibrosis, comedonecrosis, invasion and growth pattern. Further patient data was obtained from NKI patient records following institute guide lines.

Microdissection
The pathologist scored the tumor percentage and indicated the DCIS regions for isolation on a H&E slide.
From 5 to 15 (depending on the DCIS area size) FFPE 10 mm slides, the DCIS regions were dissected by scraping the areas off under a stereomicroscope using a needle. The scraped off tissue was stored in PKD digestion buffer (Qiagen, #80234) and stored at 4 ⁰C for up to a week. DNA and RNA was isolated simultaneously with the Allprep DNA/RNA FFPE isolation kit (Qiagen, #80234) by using the QIAcube, according to manufacturer's protocol.

CNA-seq
The total amount of DNA was quantified on the Nanodrop 2000 (Thermofisher). The amount of double stranded DNA in the genomic DNA samples was quantified by using the Qubit dsDNA HS Assay Kit (Invitrogen, cat no Q32851). A max of 2000 ng of double stranded genomic DNA were fragmented by Covaris shearing to obtain fragment sizes of 160-200 bp. Samples were purified using 2X Agencourt AMPure XP PCR Purification beads according to manufacturer's instructions (Beckman Coulter, cat no A63881). The sheared DNA samples were quantified and qualified on a BioAnalyzer system using the DNA7500 assay kit (Agilent Technologies cat no. 5067-1506). With an input of maximum 1 mg sheared DNA, library preparation for Illumina sequencing was performed using the KAPA Hyper Prep Kit (KAPA Biosystems, KK8504). During library amplification 6-8 PCR cycles were used to obtain enough yield for the exome capture. After library preparation, the libraries were cleaned up using 1X AMPure XP beads. All DNA libraries are analyzed on a BioAnalyzer system using the DNA7500 chips for determining the molarity.
Up to 13 uniquely indexed samples are mixed together by equimolar pooling. The pools are analyzed on the Agilent Technologies 2100 Bioanalyzer. Pools are diluted to 10 nM, and measured on the qPCR. The pool is subjected to sequencing on an Illlumina Hi-Seq2500 machine, each pool in one lane of a single read 65 bp run, according to manufacturer's instructions. The resulting reads were trimmed using Cutadapt 61 to remove any remaining adapter sequences. The trimmed reads were aligned to the GRCh38 version 97 and GRCm38 version 89 reference genome using BWA aln. 62 Mouse reads were filtered out by AstraZeneca's tool disambiguate. 63 The resulting alignments were sorted and marked for duplicates using Picard tools. QC statistics from Fastqc, 64 Samtools 65 and the above-mentioned tools were collected and summarized using Multiqc. 66 The copy-number data was segmented using QDNAseq (version 1.22.0) 67 from Bioconductor. The entire analysis was implemented by Julian de Ruiter using Snakemake (snakemake version 7.2.1; wrapper version 0.60.0) 68 and is freely available on GitHub (https://github.com/jrderuiter/snakemakecnvseq). Unsupervised clustering was performed on the segmented copy-number data. Copy-number instability was scored by calculating the fraction of bins with copy-number values above or below a threshold of respectively 2.5 and 1.5 in the segmented copy-number data. KCsmart R-package (version 2.48.0) 79 and GISTIC2.0 69 were used to determine focal copy number groupwise aberrations. For the oncoprint, genes were selected that were in the CGC-list (version August 2019; https://cancer.sanger.ac.uk/ census).

Whole exome sequencing & panel sequencing
Genomescan prepared the samples according to the procedure for Hybridization Capture using an Agilent SureSelect custom 0.5-2.9Mb kit for the panelseq and the Agilent SureSelectXT Human All Exon V7 kit for the WES samples. The prepared libraries were sequenced with Illumina sequencing technology and prepared according to manufacturer's protocols. The reads were trimmed using Cutadapt 61 to remove any remaining adapter sequences, filtering reads shorter than 60 bp after trimming to ensure good mappability. The trimmed reads were aligned to the human (GRCh38) and mouse (GRCm38) reference genome using BWA. The human alignment was processed for duplicate marking, indel realignment, and base recalibration using Picard Tools and GATK, as recommended by GATK best practices, and filtered to remove contaminating mouse reads using AstraZeneca's tool disambiguate. 63 QC statistics from Fastqc. 64 FastQC: a quality control tool for high throughput sequence data. Available online at: http://www. bioinformatics.babraham.ac.uk/projects/fastqc) and the above-mentioned tools were collected and summarized using Multiqc. 66 Mutect2 was used for SNP calling followed by the LearnReadOrientationModel and FilterMutectCalls commands. SNPs that had a TLOD of <10, a coverage of less than 15, an alternative frequency of less than 0.2, had a different function than exonic or splicing, were classified as synonymous-SNV, and/or had a population frequency of more than 0.01 in one of the following databases downloaded with ANNOVAR (1) (1000g, Kaviar, hrcr1, gnomad_genome, gnomad_exome, esp6500siv2, exac_03, gme) were excluded. For the oncoprint, genes were selected that were in the PanelSeq-list (See supplemental informtion).

Whole genome sequencing
Whole genome sequencing was carried out by Illumina Cambridge Ltd, UK. Whole genome short insert 450-500bp libraries were prepared in accordance with Illumina protocols using either Illumina Truseq PCR free protocol or Truseq Nano kit with 5 PCR cycles, depending on the quantity of starting DNA available. 150bp paired-end sequencing was performed using a Hiseq X to achieve an average sequence coverage of 113X in tumors and 38X in matched normal controls from the same individual. The resultant reads were aligned to the reference human genome (GRCh38) using a Burrows-Wheeler Aligner, bwa mem (version 0.7.17-r1188). Paired tumor-normal bam files were interrogated for somatic substitution mutations using Caveman (Cancer Variants through Expectation Maximisation) (1.13.15) https://github.com/cancerit/CaVEMan. 74 Small somatic insertions and deletions were called using split-read mapping using a modified Pindel version 3.2.0 https://github.com/cancerit/cgpPindel. 75 Annotation to Ensembl build 91 was used to identify mutations falling in coding regions of the genome. 5 0 UTR variants, silent, intronic, and upstream mutations were excluded. For the oncoprint, genes were selected that were in the PanelSeq-list (See Table S3).

RNA-seq
Quality and quantity of the total RNA from was assessed by the 2100 Bioanalyzer using a Nano chip (Agilent, Santa Clara, CA). The percentage of RNA fragments > 200nt fragment distribution values (DV200) were determined using the region analysis method according to the manufacturer's instructions manual (Illumina, technical-note-470-2014-001). Strand-specific libraries were generated using the TruSeq RNA Exome Library Prep Kit (Illumina Inc., San Diego) according to the manufacturer's instructions (Illumina, # 1000000039582v01). Briefly, total RNA was random primed and reverse transcribed using SuperScript II Reverse Transcriptase (Invitrogen, part # 18064-014) with the addition of Actinomycin D. Second strand synthesis was performed using Polymerase I and RNaseH with replacement of dTTP for dUTP. The generated cDNA fragments were 3 0 end adenylated and ligated to Illumina Paired-end sequencing adapters and subsequently amplified by 15 cycles of PCR. The libraries were validated on a 2100 Bioanalyzer using a 7500 chip (Agilent, Santa Clara, CA) followed by a 1-4 plex library pooling containing up to 200ng of each sample.
The pooled libraries were enriched for target regions using the probe Coding Exome Oligos set (CEX, 45MB) according to the manufacturer's instruction (Illumina, # 1000000039582v01). Briefly, cDNA libraries and biotin-labeled capture probes were combined and hybridized using a denaturation step of 95 C for 10 minutes and an incubation step from 94 C to 58 C having a ramp of 18 cycles with 1 minute incubation and 2 C per cycle. The hybridized target regions were captured using streptavidin magnetic beads and subjected to two stringency washes, an elution step and a second round of enrichment followed by a cleanup using AMPure XP beads (Beckman, A63881) and PCR amplification of 10 cycles. The target enriched pools were analyzed on a 2100 Bioanalyzer using a 7500 chip (Agilent, Santa Clara, CA), diluted and subsequently pooled equimolar into a multi-plex sequencing pool. The libraries were sequenced with 65 base single reads on a HiSeq2500 using V4 chemistry (Illumina Inc., San Diego).
The reads were trimmed using Cutadapt 61 to remove any remaining adapter sequences, filtering reads shorter than 20 bp after trimming to ensure good mappability. The trimmed reads were aligned to the GRCh38 reference genome version 101 and GRCm38 reference genome version 100 using STAR. 73 Mouse reads were filtered out by AstraZeneca's tool disambiguate. 63 QC statistics from Fastqc 64 and the above-mentioned tools were collected and summarized using Multiqc. 66 Gene expression counts were generated by featureCounts 70 using gene definitions from Ensembl GRCh38 version 101. Normalized expression values were obtained by correcting for differences in sequencing depth between samples using DESeqs median-of-ratios approach 80 and then log-transforming the normalized counts. The entire analysis was implemented by Julian de Ruiter using Snakemake (snakemake version 7.2.1; wrapper version 0.63.0 68 ; and is freely available on GitHub (https://github.com/jrderuiter/snakemake-rnaseq). DESeq2 71 version 1.30.1 was used for differential expression analysis followed by fgsea 72 version 1.16.0 for pathway analysis and DIDS 26 version 0.9.1 to identify subgroup markers in heterogeneous populations.
To analyze the 90 informative genes contributing to the three DCIS subtypes in the Translational Breast Cancer Research Consortium 15 we clustered the genes and the samples based on their normalized, log-transformed counts using hierarchical clustering algorithm (hclust function from the hclust R package). The distance measure was taken to be 1 -correlation (Pearson for genes and Spearman for samples) and the clustering method is ward.D2. The resulted heatmap with the dendograms was plotted using the ComplexHeatmap R package.
Single-sample gene set enrichment analysis (ssGSEA) has been done using GSVA package with the ssGSEA method and the default parameters. 81 The analysis has been done separately for the DCIS-MIND and primary DCIS samples (which were normalized separately in this case). After this we clustered the genes and the samples based on their normalized, log-transformed counts using hierarchical clustering algorithm (hclust function from the hclust R package). The distance measure was taken to be 1 -correlation (Pearson for genes and Spearman for samples) and the distance between DCIS-MIND and primary DCIS sample from the same patient were all set to zero, forcing them to be next to each other in the dendogram and the heatmap plot. The clustering method is ward.D2 and the resulted heatmap with the dendograms was plotted using the ComplexHeatmap R package.
To calculate the Cox2, P16 and Ki67 classifier RNAseq data was used for COX2 and P16 with an expression level cut-off of 6 or 5 respectively for patient samples and 2 or 4 for PDX samples. For Ki67 IHC data was used, with a cut-off at 10%.
To calculate the 812-gene classifier, we performed the pipeline provided by Strand et al. 15

QUANTIFICATION AND STATISTICAL ANALYSIS
Linear regression analyses have been performed using IBM SPSS Statistics (IBM, Version 27). All quantifications were analyzed using a Student's t test, while survival curves were analyzed using a log-rank (Mantel-Cox) test performed in GraphPad Prism (GraphPad Software Inc, version 9). In all cases, the p values are represented as follows: ***p < 0.001, **p < 0.01, *p < 0.05, and not statistically significant when p > 0.05.

OPEN ACCESS
Article