Deterministic evolution and stringent selection during preneoplasia

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention 1 . Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.
In rapidly adapting asexual populations, including microorganisms and tumours, multiple mutant lineages often compete for dominance 2 . These complex dynamics determine the outcomes of evolutionary adaptation but are difficult to observe in vivo. Experimental evolution has yielded fundamental insights into clonal dynamics in microorganisms, enabling characterization of mutant clones and their fitness benefits 3,4 . The same forces of mutation and selection fuel clonal expansions in somatic cells during ageing, contributing to malignancy, but their dynamics are poorly understood [5][6][7] .
Cancers arise from a mutated cell that undergoes premalignant clonal expansion while accruing additional mutations. These mutations can spread in phenotypically normal tissues before apparent morphological changes, with aneuploidy and driver mutations preceding cancer diagnosis by years 5,8,9 . Identification of the causes of, and barriers to, malignant transformation requires characterization of the molecular phenotypes that precede this event in a tissue-specific manner. However, repeated sampling of healthy or preneoplastic tissue is impractical and thus evolutionary dynamics have been inferred from sequencing data 5,6,10 . For example, we inferred stringent subclonal selection in premalignant Barrett's oesophagus, whereas matched adenocarcinomas largely exhibited neutral evolution 6 , presumably due to rapid growth after transformation and diminishing returns epistasis 11 . Despite these insights, the order of somatic alterations and patterns of clonal expansion that precede transformation are obscured in established cancers 5,12 , necessitating new approaches to empirically measure premalignant evolution.
Gastric cancer (GC), the fourth-leading cause of cancer mortality worldwide, lacks routine screening albeit its long lead times contributing to late diagnoses, poor prognosis and limited treatment options 13,14 . Therefore, it is crucial to identify the molecular determinants of GC and its non-obligate precursor, intestinal metaplasia, which is poorly characterized compared with precursor lesions in the adjacent oesophagus (Barrett's oesophagus) [15][16][17] . Although the utility of forward-genetic and GC organoids as preclinical models has been established [18][19][20][21] , in the former, combinatorial hits were engineered to bypass nascent progression and accelerate transformation 19,21 .
Here we model tumorigenesis from the 'bottom up' using CRISPR-Cas9-engineered human gastric organoids (HGOs) to identify causal relationships between initiation of genetic insults and resultant genotypes and phenotypes. Because TP53 inactivation is a common early event preceding numerical and structural chromosomal abnormalities (aneuploidy) in chromosomal instable (CIN) GC 19,22,23 , we use non-malignant HGOs as a tabula rasa to study preneoplasia induced by TP53 deficiency over a 2-year time span. HGOs are ideal for this task because they recapitulate the cellular attributes of in vivo models, including three-dimensional tissue structure, multilineage differentiation and disease pathology 21 .
Whereas TP53 is altered in over 70% of CIN GCs 22,23 , its ability to elicit aneuploidy, a hallmark of most solid cancers, has been controversial and appears tissue dependent [24][25][26][27] . Moreover, the extent to which specific copy number alterations (CNAs) are selectively advantageous, and their tumorigenic impact is largely unknown 28,29 . We chart genotype-to-phenotype maps of gastric preneoplasia following TP53 inactivation in multiple HGO cultures and demonstrate that these models recapitulate genomic hallmarks of gastro-oesophageal tumorigenesis, including the multi-hit, temporal and repeated acquisition of CNAs and structural variants (SVs), accompanied by progression towards malignant transcriptional states. Prospective lineage tracing with linked single-cell expression profiles delineates early clonal dynamics, showing extensive clonal interference, stringent selection and rapid adaptation, underpinned by temporal genomic contingencies and phenotypic convergence. Our findings highlight the power of experimental evolution in human organoids to investigate occult preneoplastic processes and the repeatability of somatic evolution.

TP53 -/induces CNAs in defined orders
To model tumour initiation in CIN GC, we established HGOs from non-malignant tissue from three human donors undergoing gastrectomy and introduced biallelic TP53 frameshift mutations via CRISPR-Cas9, resulting in an inactive gene product (Fig. 1a, Extended Data Fig. 1, Supplementary Tables 1 and 2 and Methods). From each donor (D1-3), three independent, clonally derived TP53 -/cultures (C1-3) were established, yielding nine cultures for long-term propagation, five of which were each split into three replicates (R1-3) for cellular barcoding studies (n = 24 cultures). Another 'hit' in the APC tumour suppressor, a Wnt pathway negative regulator altered in 20% of CIN GC (Extended Data Fig. 2a), was concurrently engineered in C2 and C3 from D3 (referred to as D3C2 and D3C3, respectively; Supplementary Figs. 1 and 3) to examine the evolutionary consequence of dual tumour suppressor inactivation. The clonal status of CRISPR-edited sites was verified via Sanger sequencing and confirmed by whole-genome sequencing (WGS) at multiple time points . Throughout, we refer to time as days after TP53 deficiency was engineered and we group TP53 -/and TP53 -/and APC -/cultures unless otherwise specified.
Across all cultures the fraction of genome altered (FGA), a measure of aneuploidy, increased over time at varying rates and plateaued around day 600 ( Fig. 1d and Methods). For example, D1C1, which accrued early arm-level alterations, exhibited over 20% FGA by day 260 compared with a median FGA of about 5% across all cultures at similar time points. TP53 -/and TP53 -/-/APC -/cultures exhibited comparable FGA at final time points (average 11.3 and 10.7%, respectively), consistent with the expectation that APC loss does not fuel gastric cell aneuploidy. In several cultures FGA decreased over an interval due to clonal extinction (D3C3 day 190 versus day 442; D2C2 day 428 versus day 609) (Fig. 1d and Supplementary Figs. 5 and 6). As expected, FGA was lower in TP53 -/-HGOs than in CIN GCs (median FGA 34.5% in TCGA, according to cBioPortal).
Investigation of the temporal onset of arm-level and focal CNAs in TP53 -/-HGOs showed preferred orders ( Fig. 1e and Supplementary  Table 3). Specifically, loss of chr9p and chr3p repeatedly occurred (across donors and cultures) within 200 days but seldom later, suggesting a period during which these alterations were particularly advantageous. Chr9p deletion spans the CDKN2A tumour suppressor commonly altered in the CIN subgroup of gastric (roughly 41%; Extended Data Fig. 2a) and oesophageal (roughly 74%) adenocarcinomas, and co-occurs with TP53 alterations 19,22 . Indeed, CDKN2A loss signals the initiation of Barrett's oesophagus progression to dysplasia and oesophageal adenocarcinoma 30 and GC premalignancy 19 Table 3). A genome caretaker, FHIT, is lost early during tumour progression leading to deoxythymidine triphosphate depletion, replication stress and DNA breaks 31 . Notably, 12% of CIN GCs harbour FHIT alterations (Extended Data Fig. 2a). Although CDKN2A and FHIT deletions are insufficient for malignant transformation 15,32 , their recurrent early loss during in vitro evolution and in GCs implies a role in tumour initiation. Additional GC-associated CNAs include loss of chr18q and gain of chr20q, which consistently occurred late (around 600 days). Such late alterations may reflect dynamic selective pressures from increased fitness or new evolutionary paths enabled by earlier alterations 4 . These data demonstrate that TP53 loss facilitates aneuploidy in gastric cells and accrual of tissue-specific CNAs in a defined order.

Selection and clonal interference
We next sequenced (WGS, mean coverage 26×) five TP53 -/cultures at multiple time points (Fig. 2a, Extended Data Fig. 3a and Supplementary Table 4). This confirmed biallelic TP53 and APC inactivation at CRISPR target sites ( Supplementary Fig. 2) and showed an increase in the weighted genome instability index (wGII), the fraction of genome with loss of heterozygosity (LOH), as well as focal deletions and amplifications during prolonged culture (  Table 3). WT cultures exhibited simple focal FHIT deletions at late passages, probably due to clonal expansion of an initially rare event, and suggestive of somatic mosaicism ( Supplementary Figs. 11a,b and 12a,b). Single-base substitutions 1, 5 and 40, which are ubiquitous and implicated in ageing and cancer, were the most prevalent mutational signatures 33 . However, by the late time point D3C2 developed single-base substitution 17a/b (Extended Data Fig. 3b and Supplementary Table 4), which is prevalent in gastro-oesophageal carcinomas and progressive Barrett's oesophagus lesions 32 .
All classes of alterations accumulated in evolved TP53 -/-HGOs but SVs were particularly notable, with non-clustered and simple clustered rearrangements dominating at early time points followed by complex clusters (ten or more rearrangements) involving deletions, inversions and translocations over time ( Fig. 2a and Extended Data Fig. 3c,d). This is exemplified by D3C1, which accrued multiple interchromosomal rearrangements (Fig. 2b). Although such complex SVs are seldom reported in normal tissues, they are prevalent in progressive Barrett's oesophagus 16 . SV burden increased markedly in TP53 -/-HGOs between early and late time points (median change of 148%), exceeding by over threefold the change in SV burden (45%) between endoscopies in patients with Barrett's oesophagus harbouring biallelic TP53 inactivation and who subsequently progressed to oesophageal adenocarcinoma (average 2.2 years, range 0.65-6.16 years) 16   Article non-progressors (lacking TP53 biallelic inactivation) had a low and stable SV burden between endoscopies (Extended Data Fig. 3e).
The FHIT locus frequently harboured complex SVs, including deletion chasms at fragile sites (rigma), as reported in GC and Barrett's oesophagus 34 (Fig. 2c, Extended Data Fig. 4a and Methods). We traced the genesis of rearrangements at the FHIT in D3C1, starting from a small deletion at day 115 and culminating in rigma by day 264. The subclone harbouring this rearrangement was lost (Fig. 2d,e, yellow subclone) but a separate subclone (blue) with a distinct FHIT rigma emerged and persisted, suggesting convergent evolution. Thus, rearrangements with multiple junctions evolve over several generations, not as a single event as previously proposed 35 . Similar events evolved in other cultures, including a chr3 and chr9 translocation (Extended Data Fig. 4b-f and Supplementary Fig. 14a,b). Despite these rearrangements, overall genomic content remained diploid as confirmed by flow cytometry (Supplementary Fig. 14c and Supplementary Table 4).
Clonal competition and extinction were investigated by determination of subclonal populations from CNA profiles (via bulk WGS) across five time points for D3C1 and D2C2. By day 115, D3C1 had acquired numerous deletions (9p, FHIT) and several SVs, including a persistent chr11-chr14 translocation. Over 600 days, multiple CNA-defined subclones increased in frequency before extinction (Fig. 2d,e and Supplementary Table 5). For example, a chr4 -, 9q + subclone arose early but disappeared by day 264, outcompeted by a chr19psubclone that later acquired chr8p -, 9q.2 + and 16palterations and remained dominant until day 404. This subclone was ultimately outcompeted by one with chr18q loss that acquired gain of chr20q, both recurrent late events in multiple cultures. Thus, some clones fix and achieve dominance whereas others reach substantial frequencies before going extinct, presumably due to clonal interference. Distinct CNA subclones coexisted for extended durations (around 140 days), suggesting comparable fitness (for example, chr8p -, 9q.2 + , 16pand 18qsubclones) and intermittent periods of clonal competition and stasis as seen in other cultures (Extended Data Fig. 4c,d). These data demonstrate stringent selection and pervasive clonal interference in premalignant epithelial populations.

Transcriptional changes following TP53 -/-
Phenotypic and transcriptional changes during in vitro evolution were evaluated based on growth dynamics of TP53 -/-HGO cultures and single-cell RNA sequencing (scRNA-seq) at early, mid and late time points (Fig. 3a, Extended Data Figs. 1 and 3a, Supplementary Table 2 and Methods). We investigated changes in cell proliferation by fitting a Loess regression model to cell numbers at each passage, using growth derivative and fold change as a surrogate for fitness. Higher growth derivatives were observed at late and mid versus early time points (Fig. 3b and Supplementary Fig. 15a); the use of raw cell numbers yielded similar results (P = 0.003, two-way repeated-measures analysis  Table 6).
The mucosal-like phenotype in WT cultures, defined by mucin and TFF gene expression, was lost following TP53 -/in D1 and D3. Additionally, in D1 intestinal goblet cell-specific markers-including TFF3, WFDC2 and MUC5B-were upregulated at the late time point, as commonly seen in intestinal metaplasia 36 . GC-associated genes, including claudins (CLDN3, CLDN4, CLDN7) and the carcinoembryonic antigen (CEA) family (CEACAM5, CEACAM6) increased in expression over time in D1 and D3. The inverse was observed in D2 cultures, plausibly due to an inflamed biopsy and the predominance of enterocytes in WT culture 14 (Extended Data Fig. 5a,c). The absence of MUC5AC following TP53 -/and increase in CEACAM6 expression was verified by immunofluorescence staining in D3C2 (Supplementary Fig. 15c).
We investigated the overlap in transcriptional features across TP53 -/-HGOs by intersection of significantly differentially expressed genes (DEGs) from early to late time points across the six cultures with scRNA-seq data. In total, 13 consistently upregulated and 40 downregulated genes were identified (Bonferroni corrected P < 0.05, Wilcoxon rank-sum test; Fig. 3e Table 6). Several pathways were enriched across multiple cultures and donors, including upregulation of tumour necrosis factor (TNF) signalling via nuclear factor kappalight-chain-enhancer of activated B cells (NF-κB), as reported in CIN tumours 39 and comparisons of GC versus normal tissue 40 (four of six cultures), apoptosis (five of six cultures) and hypoxia (five of six cultures). Downregulated pathways included MYC, E2F targets and G2M checkpoints, although these were more variable and probably reflect survival programmes. Thus, despite heterogenous single-gene trajectories, pathways implicated in malignancy were shared across cultures and donors.
We next projected batch-corrected scRNA-seq data from early, mid and late time points individually onto the reference embedding to identify gastric cell types that were most similar (Fig. 4c, Extended Data Fig. 5e-h and Methods). Because the reference atlas lacked preneoplastic populations, cells were projected onto either normal or tumour cell state in which the majority of HGO cells mapped onto the latter. Shifts in cell states over time were evident for all cultures, some of which have been implicated in the normal-to-gastritis transition that can lead to intestinal metaplasia and ultimately malignancy (shown schematically in Fig. 4d). Changes in cell type frequencies were quantified for each HGO culture by identification of the 25 nearest neighbours (NNs) in the reference population (Fig. 4e). An increase in mucosal-like malignant cells was observed in three of seven cultures at the late time point, with 68.7, 80.1 and 37.3% of NNs being mucosal-like malignant cells for D3C2, D3C3 and D1C1, respectively. By contrast, for D2, mucosal-like malignant cells decreased whereas non-mucosal-like malignant cells increased from WT to the late time point (D2C2, 45.6%; D2C3, 64.4%, NNs) (Fig. 4e), explaining transcriptional differences relative to D1 and D3 (Fig. 3). Notably, approximately 30% of cells in D2WT projected near enterocytes, potentially contributing to gastritis-like features and underlining the transcriptional similarity between enterocytes and malignant cells 42 . WT cultures from D1 and D3 exhibited predominantly mucosal phenotypes. The decrease in mucosal gene expression suggests that the evolved TP53-deficient HGOs were en route towards intestinal metaplasia and malignancy, albeit at different rates, corroborating the supervised analyses based on specific marker genes. Although our HGO cultures harbour hallmarks of CIN GC, they do not exhibit evidence of histologic transformation (Supplementary Fig. 15d).

Deterministic growth of rare subclones
We next leveraged our HGO models to characterize preneoplastic subclonal dynamics at cellular resolution via prospective lineage tracing   with high-complexity cellular barcodes. To jointly recover lineage and transcriptional states we developed expressed cellular barcodes (ECBs), which uniquely label each cell ( Supplementary Fig. 17a,b and Methods). ( Supplementary Fig. 17c,d). Each ECB parental line was split into three replicates to evaluate the reproducibility of clonal dynamics, in which outgrowth of the same subclone is assumed to reflect an intrinsic fitness advantage and divergent subclone dominance suggests acquired fitness differences (Fig. 5a). Longitudinal sWGS of these long-term ECB cultures demonstrated marked reproducibility at the genomic level, with recurrent CNAs shared across replicate cultures (Fig. 5b, Extended Data Fig. 6   green asterisks denote CNAs unique to one replicate. Only chromosomes harbouring newly arisen CNAs (not present in the parental population) are numbered, for simplicity. c, Muller plots depicting ECB frequencies (assessed by barcode sequencing) over time, where each colour represents a distinct subclone in each replicate. Note that, for D3C2, R2 (D3C2R2), the barcode was lost around day 273. d, Dot-plots indicating ECB subclone frequency (indicated by size) and estimated growth curve derivative per subclone (indicated by colour). Image of stomach in a is from Servier Medical Art, CC BY 3.0. Supplementary Fig. 18). For example, in D2C2 new CNAs emerged around day 258 (loss of chr4q and chr13, gain of chr20q) across all three replicates. In D2C1, R2 (D2C1R2) gain of chr8q was detected by day 258 and persisted ( Fig. 5b) but was mutually exclusive with gains of chr3q in R1 and R3. By contrast, CNAs in different cultures from the same donor were more variable (Fig. 1c).    6 | Genotype-to-phenotype mapping defines molecular determinants of winning subclones. a, Inferred CNA heatmap from scRNA-seq data for D2C2R2 at day 173, where each row represents a cell. Colour bar at the left indicates the ECB to which each cell maps. Numbered barcodes were selected for further investigation. Inset shows a subpopulation within ECB-0 with additional CNAs, termed 0a, and the ECB-0 parent subclone is termed 0b. b, CNA profile for the D2C2 parental population (also shown in Fig. 5b). Through DNA sequencing of ECBs at regular intervals, we estimated the relative abundances of subclones over time and constructed Muller plots to visualize clonal dynamics (Fig. 5c). Colours were assigned to barcodes based on subclone frequencies across replicates within a culture, with the highest-frequency subclone coloured red. For example, the red band in D2C1R1 represents the same barcoded subclone as in D2C1R2 and D2C1R3. For each culture (except D2C1R2) the same (red) subclone became dominant across all replicates (Fig. 5c), consistent with an intrinsic fitness advantage and deterministic outgrowth (Fig. 5a). For D2C1 replicates R1 and R3 the red subclone became dominant in line with their shared CNA profiles whereas in R2 the green subclone, which acquired a chr8q gain (spanning the MYC oncogene), overtook the population. Intriguingly, brown and green clones expanded concomitantly before going extinct, suggesting their mutual dependence.

Article
Of note, subclone frequency correlations over time across replicates was generally high, reflecting similar subclonal dynamics within a culture and similar patterns across cultures (Extended Data Fig. 6). Especially striking was the remergence of the blue and purple subclones in D2C2R2 and D2C2R3 at around 200 days (Fig. 5c). By construction of subclone-specific growth curves and estimation of their derivatives, we found that 'winning' subclones had high initial fitness and increased in proliferative capacity over time ( Fig. 5d and Methods). Thus lineage tracing shows reproducible dynamics across replicate cultures, with adaptive lineages sweeping rapidly to fixation and dominant clones comprising 75% (median across cultures) of the population by day 144 post ECB transduction (Supplementary Table 7). These patterns are reminiscent of rapid adaptation in isogenic microbial populations attributable to standing variation in the initial population 43,44 .

Molecular features of winning subclones
To investigate the targets of selection and how they change over time and across populations, we leveraged ECBs jointly capturing lineage and transcriptional states in individual cells. Specifically, we sought to characterize the molecular features of winning subclones that dominated the population after prolonged evolution by performing scRNA-seq for several donors and replicates at selected time points when the population was heterogeneous. For D2C2R2, which was sampled at day 173, 1,284 cells passed quality control and we identified 20 subclones with at least ten cells, all of which were among the top 38 most frequent ECBs based on barcode sequencing. Arm-level CNAs were inferred from the scRNA-seq data using inferCNV (Methods), showing numerous subclone-specific CNAs (Figs. 5b and 6a and Methods). Reassuringly, aggregate CNA landscapes were concordant with WGS data and scDNA-seq showed profiles and frequencies similar to subclone-specific CNAs inferred from scRNA-seq (Supplementary Fig. 19 and Methods). A detailed examination of this replicate (D2C2R2) showed complex evolutionary dynamics amongst coexisting subclones. Most cells comprising the winning subclone (ECB-0, red) acquired chromosome 3p -, 3q + , 9pand 9q + alterations early because these events were clonal or nearly clonal in the parent population at day 143 (Fig. 6a,b). A subpopulation within ECB-0 (termed 0a) additionally acquired chr4qand chr20q + and ultimately became dominant, with these alterations present in roughly 90% of the population at day 315 ( Fig. 5b and Supplementary  Fig. 20). Similar dynamics were seen across all replicate cultures in which winning subclones contained a nested CNA-defined subclone (Extended Data Figs. 7-9). These patterns may reflect a 'rich-get-richer' effect whereas fitness advantages acquired early drive clonal expansions, thereby increasing the likelihood of additional alterations that fuel growth 45 .
Because successful subclones consistently acquired additional genetic diversity, we sought to investigate the functional relevance of these events, focusing on a subset of subclones with divergent CNAs. As an example, D2C2R2 consisted of at least five different CNA clones at the time of barcode insertion ( Fig. 6c and Supplementary Table 8).
Multiple instances of convergent evolution were evident within this culture, in which subclones acquired the same CNA independently, implying stringent selection. For example, ECB-0a, ECB-11 and ECB-56 each lost variable-sized regions of chr4q. ECB-9 lacked common early alterations including chr3p -, but subsequently acquired chr9p/q alterations. Despite the incomplete set of CNAs, the growth of ECB-9 closely trailed that of the winning subclone (ECB-0; Fig. 6d). Convergent CNA evolution was also evident across cultures, in which chr15 and chr20 amplifications were present in the majority of cells in D3C2R1 at day 441, and these events plus chr11 amplification were present in R2 and R3 subclones by day 259.
Although highly fit subclones differed in genomic landscapes, we reasoned that they would share transcriptional programmes. Indeed D2C2R2, the winning subclone 0a (but not its parent, 0b) exhibited high expression of several GC genes including CEACAM5, CEACAM6, CLDN3, CLDN4 and CLDN7 (Fig. 6e). These genes were also highly expressed in winning subclones of all other replicate cultures, except for ECB-1a (green) in D2C1R2, which acquired 8q gain (Extended Data Figs. 7e, 8d and 9d). The winning subclone, 0a (versus 0b), also upregulated GC genes, including RNF186 which regulates intestinal homeostasis and is associated with ulcerative colitis 46 ; MUC13, which encodes a transmembrane mucin glycoprotein 47 ; CCL20, a chemokine and candidate biomarker 48 ; and LGALS1 (galectin-1), which promotes epithelial-mesenchymal transition, invasion and vascular mimicry 49 (Fig. 6f and Supplementary Table 9). GSEA analysis, comparing the winning subclone in D2C2 with all other cells, showed upregulation of several pathways including TNF signalling via NF-κB, as well as hypoxia, apoptosis and p53 (Fig. 6g, Extended Data Fig. 10a and Supplementary Table 9). These same pathways were upregulated in three barcoded replicates for D2C2 at the final time point (day 315), as well as in the non-barcoded D2C2 culture at mid and late time points (relative to early) (Fig. 6f, right) and in other donors/cultures (D1C1, D1C2, D1C3, D2C3, D3C2) (Fig. 3f). Moreover, these pathways were upregulated in winning subclones from independent barcoded donors/cultures (Extended Data Figs. 7-9), including the divergent subclone (ECB-1a, green) in D2C1R2 ( Fig. 5c and Extended Data Fig. 8f), emphasizing their reproducibility. More generally, strong concordance between winning subclone and non-barcoded late subclones was observed across the top ten altered gene sets irrespective of mycoplasma levels, antibiotic treatment and other sources of biological and technical variation (Supplementary Fig. 21 and Methods). Similarly, winning subclones clustered with late cultures, which exhibited malignant transcriptional states based on unsupervised LSI projection (D1C1, D2C2, D2C3 and D3C2) ( Fig. 6h and Extended Data Fig. 10a,b). Notably, there was a significant difference in the activation of p53, apoptosis and TNF signalling via NF-κB pathways (Fisher's exact test, Bonferroni corrected P < 0.05) between late (relative to early) and winning subclones compared with all other subclones (Extended Data Fig. 10c). These data highlight convergent phenotypic evolution in which the early activation of specific pathways is selectively advantageous, canalizing cells towards malignancy.

Discussion
Through multiyear experimental evolution of TP53-deficient HGO cultures, we model preneoplastic evolution and genotype-phenotype relationships following this common initiating insult. Remarkably, TP53 deficiency was sufficient to recapitulate multiple hallmarks of CIN GC including aneuploidy, specific CNAs, SVs and transcriptional programmes, emphasizing the importance of cell-intrinsic processes during premalignant evolution. Although aneuploidy propagates heterogenous evolution, our data show preferred orders in the acquisition of CNAs, with early loss of chr3p and 9p frequently followed by biallelic inactivation of CDKN2A and/or FHIT and relatively late gain of 20q. Such preferred mutational orders have been described during tumorigenesis, most notably in the colon, but the resolution of inferences from Article cross-sectional data or established tumours is inherently limited 12,50 . Evolutionary phases in which deletions preceded whole-genome doubling and subsequent amplifications were recently reported in a murine model of KrasG12D, Trp53-deficient pancreatic cancer, but neither gene nor chromosome level orderings were seen in this system 51 .
Our TP53 -/-HGOs exhibited transcriptional and genomic hallmarks of premalignant gastro-oesophageal lesions despite remaining histologically normal. This is consistent with the requirement for genomic perturbation for even the earliest stages of gastro-oesophageal carcinogenesis and the accrual of complex rearrangements years before cancer diagnosis 15,16,23 .
TP53 -/-HGOs appear to be on a trajectory similar to TP53-deficient Barrett's oesophagus, for which the presumed cell of origin is gastric cardia 17 , and proposed biomarkers of progression to oesophageal cancer include CNA acquisition and SV burden 16,52 . These in vitro models thus recapitulate occult preneoplasia and mirror the latency of human tumorigenesis, with additional time or in vivo selective pressures evidently required for malignant transformation and further features of invasive disease such as whole-genome doubling or ERBB2 amplification 22 .
The finding that TP53 deficiency elicits a temporally defined order of genomic aberrations raises the possibility that these features may similarly predict progression to CIN GC. Future evaluation of this hypothesis will require annotated intestinal metaplasia tissue collection with long-term follow-up. Although TP53 deficiency elicits tissue-specific alterations that may aid in the detection of high-risk lesions, this constrained evolutionary state is unlikely to persist indefinitely given ensuing genome instability, emphasizing the need for earlier detection.
By joint measurement of lineage, CNAs and transcriptional states in individual cells, we investigated the molecular basis of clonal expansions and fitness. This showed stringent selection and reproducible subclonal dynamics across replicate cultures in which the same, initially rare, subclone fixed in the population. Pervasive clonal interference was evident amongst subclones, accompanied by intermittent periods of relative stasis, suggesting that an optimal karyotype has yet to be achieved, as reported in colorectal adenoma 53 . Furthermore, we observed a marked degree of phenotypic convergence on common dominant pathways across cultures and donors, irrespective of mycoplasma infection and antibiotic treatment. This evolutionary reproducibility is particularly notable given these and other potential sources of technical and biological variation, and implies that any such effects are evidently modest relative to the overwhelmingly dominant effect of TP53 inactivation.
These first in-kind measurements address open questions concerning selection and determinism in clonal evolution extendable to other tissues. In the vast space of initiating insults, recurrent tissue-specific alterations can be prioritized to identify selectively advantageous alterations, temporal order constraints and convergent phenotypes. Such constraints, due to epistasis, can show barriers to malignant transformation and potential therapeutic targets. We anticipate that our results will advance empirical and theoretical investigations of mutation, selection and genome instability in human cells, much as the long-term evolution experiments pioneered by Lenski and colleagues decades ago continue to yield fundamental insights into microbial adaptation 3,4 .

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-023-06102-8.