Free-living human cells reconfigure their chromosomes in the evolution back to uni-cellularity

Cells of multi-cellular organisms evolve toward uni-cellularity in the form of cancer and, if humans intervene, continue to evolve in cell culture. During this process, gene dosage relationships may evolve in novel ways to cope with the new environment and may regress back to the ancestral uni-cellular state. In this context, the evolution of sex chromosomes vis-a-vis autosomes is of particular interest. Here, we report the chromosomal evolution in ~ 600 cancer cell lines. Many of them jettisoned either Y or the inactive X; thus, free-living male and female cells converge by becoming ‘de-sexualized’. Surprisingly, the active X often doubled, accompanied by the addition of one haploid complement of autosomes, leading to an X:A ratio of 2:3 from the extant ratio of 1:2. Theoretical modeling of the frequency distribution of X:A karyotypes suggests that the 2:3 ratio confers a higher fitness and may reflect aspects of sex chromosome evolution.

to evolve. Much like the diversity unleashed by domestication, cultured cell lines, which can be 51 considered "domesticated", may be informative about the evolutionary potentials at the cellular 52 level. 53 54 In this quasi-unicellular state, gene dosage has been observed to change extensively as 55 polyploidy, aneuploidy (full or partial) and various copy number variations (CNVs) are common in 56 cancer cell lines [6]. Since these cells lines are derived from somatic tissues of man or woman 57 (referred to as male and female cells, for simplicity), they should be different in their sex 58 chromosomes in relation to the autosomes (A's). Nevertheless, the possibility of separate 59 evolutionary paths has not been raised before. Somatic cells have an inactive X chromosome in 60 females and a Y chromosome in males [7]. Since cell lines presumably do not need sexual characters, 61 we ask how the X:A relationship might have evolved in both male and female cells. More generally, 62 we ask whether the evolution in this relationship may shed light on the emergence of mammalian sex 63 chromosomes and the subsequent evolution. 64 65 In this study, we analyze 620 cancer cell lines that have been genotyped using SNP arrays [8].

66
Among them, 279 are derived from female tissues and 341 from male tissues. We observed the 67 elimination of the Y and the inactive X chromosome, followed by the evolution toward a new 68 equilibrium with 2 active X chromosomes and 3 sets of autosomes (2X:3A). We discuss the 69 implication of these findings for the evolution of sex chromosome, the transition between uni-and 70 multi-cellularity and cancers biology. 71 72

74
Convergent sex chromosomes evolution between sexes 75 A most common form of genomic changes in cell lines is the loss of heterozygosity (LOH) when 76 one of the two homologous chromosomes is eliminated [6]. We therefore examine single nucleotide 77 polymorphisms (SNPs) across the 620 cell lines for occurrences of LOH on each autosome and the X 78 chromosome. Male and female cell lines are separately analyzed. 79 80 Figure 1A shows the LOH frequency for each autosome (black dots) and the red dot represents 81 the sex chromosomes (X in female and Y in male). For autosomes, the percentages of LOH are 82 remarkably similar between sexes, with a correlation coefficient of 0.94 among 620 cell lines. There 83 is a slight tendency for the smaller autosomes to have higher LOH rate (R=~-0.4, p=~0.046, Figure  84 S1). The median percentage of LOH is about 13% for autosomes. However, the losses of X (36% in 85 females) and Y (40% in males) stand out. Given its rank as the 7 th largest chromosome, the X is not 86 expected to be lost in more than 15% of cell lines. Since the X expression is not lost, we infer that 87 it's the inactive X(or Xi) that is eliminated.

89
Female lines lose the inactive X (Xi) and male lines lose the Y chromosome at a higher rate than 90 other chromosomes. The two sexes may thus be expected to converge toward having a single sex 91 chromosome. Furthermore, given that spontaneous LOH is not infrequent and the loss cannot be 92 regained, long term cultures might evolve to complete LOH for sex chromosomes as well as 93 autosomes. The genome-wide low rate of LOH suggests selection holding back such changes. The 94 strong correlation between sexes further reflects a balance between the production and elimination of 95 LOH's, likely involved natural selection.

97
A most unexpected finding is that, accompanying the loss of the Y or Xi, an extra X chromosome 98 is often gained. Figure 1B shows approximately equal numbers of male cell lines with one or two X 99 chromosomes (partial X aneuploidy not counted). This extra X is active because the inactivating 100 XIST lncRNA is silenced in male cell lines (Figure 1C), consistent with previous findings [9]. XIST 101 does not become activated in free-living cells that do not already express this. The expression of X-102 linked genes is higher in those male lines with two X's than in those with one X and the up-103 regulation occurs along the length of the X chromosome ( Figure 1D).

105
The pattern is more complex in female lines which, in their original state, contain an Xa and an 106 Xi, the latter expressing XIST[10][11] [12]. We focus on female lines that experienced LOH of the 107 X, which should be genetically equivalent to male lines that have lost the Y. These female lines 108 indeed evolve in a manner identical with the male lines. First, female lines that have gained an X are 109 almost as frequent as lines with one X, much like the male lines with one vs. two Xa's ( Figure 1E). 110 Second, female lines with an additional X do not express Xist and all X's can thus be presumed 111 active ( Figure 1F)[13]. As in male lines, the X does not switch its state after chromosome 112 duplication.

114
Cancer cell lines usually have high rate of aneuploidy and could be heterogeneous within the 115 line, thus making its status difficult to assess. To assess the level of within-line heterogeneity, we 116 chose two representative cell lines to count the X chromosomes in individual cells using fluorescent 117 in situ hybridization (FISH). The two lines are A549 (a male cell line from adenocarcinomic 118 alveolar basal epithelium) and HeLa (a female cervical cancer cell line). Neither line expresses XIST 119 (Table S2), suggesting that all X chromosomes are active. Figure 2A-B shows results from 120 individual A549 and HeLa cells with two and three X's. Figure 2C-D shows the X karyotype 121 distributions. While there is a modest degree of heterogeneity within each line, almost all cells have 122 two or more active X chromosomes. While labor intensity of assays and cell availability limited our 123 sample size, we nevertheless can conclude that within-cell line heterogeneity does not seem to 124 undermine our conclusions.

126
Evolution toward a new X:A expression ratio (E X/A ) 127 With an extra copy of active X, the "expression phenotype" is expected to change. The ratio of 128 the median gene expression on the X to that on the autosomes(E X/A ) is of particular interest. E X/A has 129 been reported to be around 0.5~0.8 for normal mammalian tissues [14][15][16]. We assayed E X/A by 130 separating lines derived from cancerous and normal tissues. Figure 3A shows that E X/A distributions 131 center on ~0.84 in normal cell lines and on 1 in cancerous cell lines. Given the controversy in the 132 assay of E X/A , we also varied the threshold for counting expressed transcripts (see Materials and 133 Methods). By varying the threshold ( Figure 3B), E X/A ranges from 0.78 to 1.05 in normal cell lines 134 but is consistently higher by approximately 15% in cancer cell lines. The same pattern is seen in the 135 RNA-seq data ( Figure S2). 136 137 138 The concerted evolution of autosomes as a set 139 While sex chromosomes evolve, autosomes should also evolve. Since the generation of 140 aneuploidy may happen independently for each autosome, a key question is whether selection 141 operates on the autosomes as a set. Does natural selection favor cells that have full sets of 142 autosomes? 143 Figure 4A shows the distribution of chromosome number across the 620 cell lines we studied. 144 Apparently, cancerous cell lines acquire autosomes during evolution. The distribution of ploidy 145 (n=22) number shows peaks at 2 and 3, indicates many cell lines appear to be in transition between 146 full diploidy and triploidy of 44 and 66 autosomes. Similarly, the majority of sublines of HeLa cells 147 we examined have 55-75 chromosomes centering about the triploid count of 69 ( Figure S4A).

148
Indeed, autosomes appear to exist as a full complement with n=22. Although autosomes may evolve 149 as a set, cells most likely add one autosome at a time. It is hence desirable to track each chromosome 150 individually. Single cells were individually isolated from a HeLa cell line and subsequently grown to 151 a sub-line of 10 6 cells. We subjected 6 such sub-lines to whole genome sequencing such that each 152 chromosome can be tracked individually. Smaller chromosomes are indeed more erratic in their 153 numbers in cell lines. Only the largest 14 chromosomes (13 autosomes and X), which together 154 account for ~75% of the genome, are used to test the convergence of autosomes. The cutoff is based 155 on the observation that chromosome 13 is the largest autosome yielding viable trisomic new-156 borns [17][18][19]. We reason that, if whole organisms can survive trisomy, the fitness consequence of 157 the particular aneuploidy would probably be very small at the cellular level. 158 159 160 In all 6 lines, each of the 13 autosomes has 2 -4 copies, ranging from an average of 2.62 to 3.23 161 (Table S1). If each autosome behaves independently, the number of autosomes that increase by x 162 copies (x = 0, 1, 2 etc.) should follow a Poisson distribution with a mean of . Two different lines, 163 with  = 10/13 and  = 16/13, are shown in Figure 4B and C. In the former, all cells have x = 0 or 164 x=1 and, in the latter, all cells have x =1 or x=2 (Table S1). The data suggest that each autosome 165 increases by one copy and only after all of the 13 autosomes have gained an extra copy do further 166 increases continue. Figure S4B shows the composite distribution of the five lines with  < 1. The 167 pattern, like that of Figure 4B, is statistically significant (P = 0.0021 by the  2 test) with an excess at 168 x =1. These results suggest that the larger autosomes evolve cohesively as a set. With autosomes 169 evolving as a cohesive unit, X:A can be represented by whole numbers of 1:2, 2:3 etc.

171
Evolution of the C(Xa:A) ratio underlying E X/A 172 We now summarize the evolution of cell lines by their C(Xa:A) genotypes. C(Xa:A) is the 173 number of active X chromosomes and the ploidy number of autosomes (in multiples of 22) and is 174 equal to C(1,2) in normal cells. For the purpose of counting on active Xa's, data from most male 175 lines are usable. For female lines, only data from the LOH lines of the X can be used. Between the 176 two sexes, C(Xa:A) distributions are very similar and the combined distribution is used in the 177 analysis ( Figure S4C). 178 Shown in Figure 4D, most lines have the C(1:2) or C(2:3) genotype which together account for 179 2/3 of the lines. Given that C(1:2) is the starting genotype, its common occurrence at 37.4% is not 180 surprising. The high frequency of C(2:3), however, is unexpected. To reach C(2:3) from the starting 181 point of C(1:2), cells should evolve to either C(2:2) or C(1:3) first, but neither genotype is commonly 182 seen in these cells lines. In contrast, C(2:3) at 29.2% is the second most common genotype. If we 183 include the two genotypes, C(2:4) and C(3:3), that are derivatives of C(2:3), this inclusive C(2:3) 184 cluster is the most common genotype. The model of the next section helps to interpret the 185 observation.

187
A model for the evolution of free-living cells 188 The pathways of chromosomal evolution can be diagrammed as a series steps in Figure 5A.

189
Each node represents a C(Xa:A) genotype, the abundance of which is reflected in the size of the 190 node. Thicker arrows indicate faster transitions which add/delete one X while the thinner arrow 191 denotes the slower transition of adding/deleting the whole set of autosomes. The fitness of each 192 genotype, W, is assumed to be determined by the Xa/A ratio. In general, one would expect the wild 193 type (W1) to be the fittest genotype and we particularly wish to know whether that is indeed the case 194 here.

196
We first model the evolution under strict neutrality where all nodes have the same fitness. For 197 simplicity, genotypes are grouped into 3 clusters centering around the 3 dominant genotypes, C(1:2), 198 C(2:2) and C(2:3), the frequencies of which are x1, x2 and x3, respectively. Each cluster consists of 199 the dominant genotype as well as the less common ones adjacent to it (see Figure 5A). For instance, 200 x2 is the sum of the frequencies of C(2:2) and C(3:2) and x1 is those of C(1:2), C(1:1) and half of 201 C(1:3). The frequency of the last one, being adjacent to both C(1:2) and C(2:3), is split between the 202 two clusters. Tallying up the numbers in Figure 4D, we obtain x1 = 0.41, x2 = 0.092 and x3 = 0.482 203 with a total of 0.984, excluding the marginal genotypes. The analysis below can be expanded to 204 account for each genotype separately. The transitions between clusters are defined as follows: 205 206 where u and v are the transition rates and x i (T) is the frequency of cluster i at time T. Let X(T) be the 208 vector of [x 1 (T), x 2 (T), x 3 (T)], expressed as 209 210 When T>> 0, where z = ab + b +1. The genotype frequencies evolve toward the equilibrium, [ab, b, 1] /z, which 216 depends on a and b, but not u and v. We posit that a > 1 and b > 1 because, as the chromosome 217 number increases, the probability of chromosome gain/loss increases as well. By Eq. 2, x1(T) > 218 x2(T) > x3(T) when T >> 0. In short, the relative frequency should be in the descending order of 219 C(1:2), C(2:2) and C(2:3) if there is no fitness difference among genotypes. This predicted 220 inequality at T>> 0 is very different from the observed trend.

222
Eq. 2 assumes that cell lines have been evolving long enough to approach this equilibrium. A 223 more appropriate representation should be X(T) where T can reflect the time a cell line has been in 224 culture. It is algebraically simpler if T is measured by the rate of chromosomal changes, u or v, rather 225 than by the actual cell generation (Eq. 1, Figure. 5B and legends). We also assume u > v as u 226 involves only the X but v involves the whole set of autosomes. With the initial condition of X(0) = 227 [1,0,0], Figure 5B shows that the C(2:3) cluster approaches the equilibrium more slowly than the 228 other two clusters. Therefore, the observed high frequency of the C(2:3) cluster (x3 = 0.482 vs. x1 = 229 0.41 and x2 = 0.092) is incompatible with a neutrally evolving model of chromosome numbers. The 230 discrepancy is true at all time points and is more pronounced at smaller T's.

232
Rejecting the neutral evolution model, we now incorporate fitness differences into Figure  can either be positive or negative. Here, we add a fourth genotype, C(2:4). In the supplement, we 235 model 4 genotypes with x1 -x4 for the frequencies of C(1:2), C(2:2), C(2:3) and C(2:4) 236 respectively. An expanded transition matrix is used to model selection, followed by a normalization 237 step (Eq. S1). The solution in the form of X(T) = X(0) M T is given in Eq. S2 and the equilibrium 238 X(T) is given in Eq. S3.

243
S3 shows that s<0 is necessary for x2 to be smaller than x3, and t>0 is necessary for x3 to be close to 244 x1 (see Supplement). Figure 5C is an example in which s = -0.5 and t = 0.5. The equilibrium at T 245 >> 0 is indeed close to the observed values. 246 In conclusion, it appears that the extant state in multicellular organisms of C(1:2) is not the fittest 247 genotype for free-living mammalian cells. The observed genotypic distributions suggest that C(2:3) 248 may have a higher fitness than the wild type, C(1:2).

250
Discussion 251 Free-living mammalian cells like all living things speed up the evolution when the environment 252 changes. The practice of cell culturing, however, is to slow down the evolution in order to preserve 253 cell lines' usefulness as proxies for the source tissues. Nevertheless, changes are inevitable and the 254 evolution of sex chromosomes is but one example. It should be noted that cell lines derived from 255 cancerous tissues and normal tissues are different in one important aspect. Cell lines derived from 256 normal tissues generally do not undergo karyotypic changes at an appreciable rate [20][21][22]. They are 257 therefore much less responsive to selection in cultured conditions that favor new karyotypes. Cancer 258 cell lines, having been through more rounds of passages, have generally experienced stronger 259 selection more frequently than normal cell lines.

261
Our observations suggest that the extant X:A relationship (C(1:2)) may not be optimal for free-262 living mammalian cells. The highest fitness peak, instead, appears to be closer to the karyotype of 263 C(2:3) as free-living cells reproducibly evolve toward this new karyotype. The fitness peaks in free-264 living cells being different from that of the multi-cellular organisms is not unexpected. With many 265 possible conflicts between individual cells and the community of cells (i.e., the organism), the 266 interest of the community may lie in its ability to regulate the growth potential of its constituents. 267 Free-living cells, on the other hand, are driven by selection to realize their individual proliferative 268 capacity relative to other cells.

270
The convergence among these many cell lines to C(2:3) is unexpected in the context of cancer 271 evolution. The TCGA project (reference) has shown that cancer evolution is a process of divergence, 272 not convergence. Indeed, only 2 genes have been mutated in more than 10% of all cancer cases and 273 tumors of the same tissue origin from two different patients may often share no mutated genes at 274 all[5] [23]. Therefore, the karyotypic convergence reported here is rather unusual.

276 277
We note that C(2:3) toward which cultured cells evolved happens to be the smallest possible 278 increase in the X/A ratio from C(1:2  can be obtained. 346 347

Materials and Methods 348
Chromosome number estimation of HeLa sub-lines. 349 The processing of clonal expansion and whole genome sequencing of HeLa We used the genotype information and absolute allelic copy number estimation generated 380 from PICNIC to infer LOH, as well as copy number, of a specific chromosome. As for a 381 chromosome, if ≥95% of SNP sites were homologous we considered that there was a LOH(loss of 382 heterogeneity) event for this chromosome. Similarly, if ≥ 95% of detected alleles on the chromosome 383 had a constant copy number of 0, 1, 2, 3 or 4, the copy number would be considered as the copy 384 number of the chromosome. The copy number of the Y chromosome was estimated separately. In 385 females, although all sites on Y chromosome should have yielded 0 copies, only ~ 60% of sites 386 detected by the Y chromosome probes showed a copy number of 0. This result indicated that several 387 X homologous regions on the Y were covered by ~30% of Y probes. Therefore, Y chromosome loss 388 was defined as when more than 60% of SNP probes from the Y chromosome showed a copy number 389 of 0.

391
Sex chromosome genotype inference 392 The expression level of XIST can be used as a proxy to distinguish the active X chromosome 393 from the silent one as this gene was expressed on the inactive X chromosome and functioned in 394 cis [13]. According to Greenman's and Barretina's studies, 496 cancer cell lines have both copy 395 number and expression data. As expected, XIST was silenced in male cell lines,as well as in females 396 with whole X chromosome LOH (Fig. 1C). Based on X chromosome LOH and copy number 397 information, we identified five genotypes, including XaO (female lines with one X-20 lines), XaXa 398 (female lines with isodisomy of X-17lines), XaXb (female lines with heterozygous for the X-28 399 lines), Xa[Y] (male lines with one X-53 lines) and XaXa[Y] (male lines with two X's-69 lines). 400 401 C(Xa:A)(ratio of active X's to autosomes) calculation 402 All male (341 lines) and female cell lines with whole X chromosome LOH (103 lines) were 403 employed for C(Xa:A) calculation. C(Xa:A) was defined as the ratio of absolute X copy number to 404 that of all autosomes. 405 406 E X/A (ratio of X to autosomal expression) calculation 407 E X/A was defined as the ratio of the expression of X-linked genes to that of autosomal ones. 408 The median values of expressed X-linked and autosomal genes were used to calculate E X/A in both 409 cancerous and normal cell lines. For the datasets from the Affymetrix U133 + 2.0 array, genes with 410 signal intensities ≥ 32 (log 2 >5) were considered to be expressed. While as for RNA-seq data, genes 411 with RPKM values ≥ 1 were considered to be expressed 412 Previous studies have shown that E X/A value may be affected by gene set used [15]. In addition, 413 several silent genes in normal tissues have been shown to be expressed in tumor tissues [31]. Those 414 genes were dominant on X chromosome, which could result in an increase of E X/A. To exclude the 415 possibility that E X/A ratios may be biased in cancerous cell lines, gene sets for E X/A calculation were 416 first selected in normal cell lines by three criteria, with the same sets then selected in cancerous cell 417 lines. The three filtering criteria for gene set selection were RPKM >0, 1, and 5 in normal cell lines 418 (Fig. 2C). 419 420

Differences in X-linked gene expression between Xa[Y] and XaXa[Y] lines 421
To explore the impact of extra X chromosome on gene expression levels of X-linked genes, 422 118

cell lines with Xa[Y] and 109 cell lines with XaXa[Y] were used. T-test with Benjamini and 423
Hochberg adjusting method was employed to determine genes, the expression of which are 424 significantly changed due to an extra X copy. 648 detected X-linked genes are plotted in Fig. 2A.

425
The free statistical programming language R was used for the statistical analysis (version 3.0.1). 426 427 X chromosome Fluorescence in situ hybridization 428 HeLa cells (from the Culture Collection of the Chinese Academy of Sciences, Shanghai, 429 China) were cultured in DMEM (Life Technologies) supplemented with 10% fetal bovine serum 430 (FBS), 100 U/ml of penicillin, and 100 μg/ml of streptomycin. A549 cells (from Mi-lab) were 431 cultured in RPMI-1640 (Life Technologies) with 10% fetal bovine serum (FBS), 100 U/ml of 432 penicillin, and 100 μg/ml of streptomycin at 37℃ with 5% CO2. Approximately 2 ×10 6 cells were 433 seeded and cultured in 10 cm dishes with 10 ml growth medium as described above. To synchronize 434 the cells, 200 μl of thymidine (100 mM) was added to the cells. After incubating for 14 hours, the 435 cells were washed twice with 10 ml PBS and then supplemented with 10 ml growth medium 436 containing deoxycytidine (24 μM). After incubating for 2 hours, 10 µl nocodazole (100 μg/ml) was 437 added to the cells. The cells were incubated for an additional 10 hours.

439
After synchronization, cells were harvested and treated with 4 ml hypotonic solution (75 mM, 440 KCl) pre-warmed to 37°C for 30 min. The cells were then fixed via three immersions in fresh 441 fixative solution (3:1 methanol:acetic acid) (15 min each time). The fixed cell suspension was 442 spotted onto a clean microscope slide and allowed to air dry. We used the ''XCyting Chromosome 443 Paints'' and "Xcyting Centromere Enumeration Probe" (MetaSystems, Germany) for whole X 444 chromosomes and centromere of X chromosome fluorescence in situ hybridization (FISH) analysis, 445 respectively. Following the manufacturer's instructions, 10 µl of probe mixture was added to the 446 prepared slide. The slide was then covered with 22 x 22 mm 2 cover slip and sealed with rubber 447 cement. Next, the slide was heated at 75ºC for 2 min on a hotplate to denature the sample and probes 448 simultaneously, followed by incubation in a humidified chamber at 37°C overnight for hybridization. 449 After hybridization, the slide was washed in 0.4 x SSC (pH 7.0) at 72ºC for 2 min, then in 2 x SSC 450 and 0.05% Tween-20 (pH 7.0) at room temperature for 30 seconds, before being rinsed briefly in 451 distilled water to avoid crystal formation. The slide was drained and allowed to air dry. Finally, 5 µl 452 DAPI (MetaSystems) was applied to the hybridization region and covered with a coverslip. The slide 453 was processed and captured using fluorescence microscopy as recommended (Olympus FV1000, 100 454 ⅹobjective).      slight tendency for the smaller autosomes to have higher LOH's than for the larger ones (R=~-0.4, 2 p=~0.046). X chromosome shows significant deviated from the regression line. 3 4 Figure S2: E X/A ratio in cancerous and normal cell lines by RNA-seq. The gene expression information (gtf files) by RNA-seq was downloaded from UCSC (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/). There are 7 cancerous and 11 normal cell lines respectively. E X/A was calculated by the median value of expressed X-linked and autosomal genes. Expressed genes were selected as RPKM >1. Figure S3: The frequency spectrum of E X/A in male and female cancerous cell lines compared to normal male and female cell lines. The median expression values of X and autosomes genes were used to compute E X/A for each cell lines. The proportion of cell lines within in a bin (0.1) was plotted as Y-axis. The E X/A of cancerous cell lines show a strong right shift compared to that of normal cell lines. Figure S4: (A) Both the ancestral and sub-clonal HeLa population have 55-75 chromosomes centering around the triploid count of 69. (B) The composite distribution of the five lines with <1. And the comparison to Poisson distribution. (C)The frequency spectrum of C(Xa:A) in male and female cancerous cell lines. The median values for copy numbers on autosomes and X chromosome were used to calculate C(Xa:A). The proportion of cell lines within a bin (0.1) was plotted as the Y-axis. The discrete peaks denote the four major genotypes (X:3A; X:2A; 2X:3A; 2X:2A).