The human inactive X chromosome modulates expression of the active X chromosome

Summary The “inactive” X chromosome (Xi) has been assumed to have little impact, in trans, on the “active” X (Xa). To test this, we quantified Xi and Xa gene expression in individuals with one Xa and zero to three Xis. Our linear modeling revealed modular Xi and Xa transcriptomes and significant Xi-driven expression changes for 38% (162/423) of expressed X chromosome genes. By integrating allele-specific analyses, we found that modulation of Xa transcript levels by Xi contributes to many of these Xi-driven changes (≥121 genes). By incorporating metrics of evolutionary constraint, we identified 10 X chromosome genes most likely to drive sex differences in common disease and sex chromosome aneuploidy syndromes. We conclude that human X chromosomes are regulated both in cis, through Xi-wide transcriptional attenuation, and in trans, through positive or negative modulation of individual Xa genes by Xi. The sum of these cis and trans effects differs widely among genes.


In brief
Through RNA sequencing of individuals with sex chromosome aneuploidy, San Roman et al. identify modular ''active'' (Xa) and ''inactive'' (Xi) X chromosome transcriptomes. Looking beyond classical X inactivation, which acts in cis, they find that Xi modulates Xa transcript levels in trans. They identify 10 X chromosome genes most likely to contribute to male-female differences in common disease.

INTRODUCTION
The X chromosome of eutherian mammals exists in two distinct epigenetic states that are referred to as ''active'' (Xa) and ''inactive'' (Xi). [1][2][3] The ''nÀ1'' rule (where n is the number of X chromosomes per cell) states that all diploid human somatic cells possess one X chromosome in the active state (Xa), while all other (i.e., nÀ1) copies of chromosome (Chr) X 4 are transcriptionally repressed through a mechanism known as X chromosome inactivation (XCI). Despite the name, Xi is functionally active, making critical contributions to human fitness and viability. For example, 99% of fetuses with only one sex chromosome (45,X) abort spontaneously, suggesting that viability hinges on gene expression from a second sex chromosome-either Xi or Y. 5,6 The rare survivors likely have a mixture of 45,X cells and cells with a second sex chromosome, and they display a constellation of anatomic features known as Turner syndrome. 7,8 methods that made it difficult to separate the effect of Xi copy number from the potentially confounding effects of correlated factors such as Chr Y copy number, hormonal differences, or tissue composition. More importantly-as underscored by this study-previous work assumed, without directly testing, the independence and additivity of Xi and Xa expression. In particular, these studies assumed that any increase in expression observed with additional copies of Xi was due to expression from Xi, which may not always be the case. Given these limitations, we hypothesized that revisiting Xi gene expression with alternative experimental and analytic methods would reveal new insights.
Here, we used a series of quantitative approaches to investigate gene expression from Xi and Xa. Inspired by previous studies, we took advantage of the natural occurrence of diverse sex chromosome aneuploidies in the human population. We performed RNA sequencing (RNA-seq) on two cell types (lymphoblastoid cell lines and primary skin fibroblasts) from 176 individuals spanning 11 different sex chromosome constitutions-from 45,X (Turner syndrome) to 49,XXXXY. We analyzed the resulting data from these 176 individuals using linear regression models to identify significant changes in Chr X gene expression in identically cultured cells with zero, one, two, or three copies of Xi. 38% of Chr X genes displayed significant Xi-driven expression changes, which we quantified on a gene-by-gene basis using a novel metric that we developed called DE X . By combining DE X findings with allele-specific analyses performed in the same cell lines and comparing our results with published, independent annotations of genes subject to XCI, we found that Xi positively or negatively modulates steady-state levels of transcripts of at least 121 genes on Xa, in trans. Thus, Xi and Xa expression are highly interdependent. By combining DE X with published gene-wise metrics of evolutionary constraint, we identified a set of 10 Chr X genes most likely to drive phenotypes that are associated with natural variation in Xi copy number. These 10 candidate ''drivers'' can now be prioritized in studies of sex differences in common disease and in explorations of sex chromosome aneuploidy syndromes.

Sampling gene expression across sex chromosome constitutions
To conduct a robust, quantitative analysis of Xi's impacts on X-linked gene expression, we recruited individuals with a wide range of sex chromosome constitutions to provide blood samples and/or skin biopsies ( Figure 1A). We generated or received Epstein Barr virus-transformed B cell lines (lymphoblastoid cell lines [LCLs]) and/or primary dermal fibroblast cultures from 176 individuals with one to four X chromosomes and zero to four Y chromosomes. After culturing cells under identical conditions, we profiled gene expression by RNA-seq in LCLs from 106 individuals and fibroblast cultures from 99 individuals (some individuals contributed both blood and skin samples; Tables 1 and S1). To enable analysis at both the gene and transcript isoform levels, we generated 100-bp paired-end RNA-seq reads to a median depth of 74 million reads per sample. A resampling (bootstrapping) analysis of our dataset indicated that including more individuals with sex chromosome aneuploidy would only marginally increase the number of differentially expressed genes detected in our analyses ( Figure S1; STAR Methods).
A metric for the impact of Xi on gene expression To leverage the full power of our datasets, we compiled all RNAseq data for each cell type into a single analysis. We included protein-coding and well-characterized long non-coding RNA (lncRNA) genes with a median expression in either 46,XX or 46,XY samples of at least 1 transcript per million (TPM). This resulted in 357 Chr X genes expressed in LCLs and 393 expressed in fibroblasts. Combining these two gene lists, we analyzed 423 Chr X genes in all. These genes reside within structurally and evolutionarily distinct regions ( Figure 1B): two pseudoautosomal regions (PAR1 and PAR2), which are identical in sequence between Chr X and Y, and the non-pseudoautosomal region of the X (NPX), which has diverged in structure and gene content from the non-pseudoautosomal region of the Y (NPY). 27  (C) Linear modeling strategy for analyzing RNA-seq data from individuals with one to four copies of Chr X (zero to three copies of Xi). See also  27,29 We hypothesized that each copy of Xi would incrementally increase expression of some Chr X genes, and therefore, for each gene, we modeled expression as a linear function of Xi copy number, controlling for Chr Y copy number and batch ( Figure 1C; STAR Methods). To assess whether expression of each Chr X gene changed linearly per Xi, we fit non-linear least square regression models to the expression data using power functions (STAR Methods). Most NPX and PAR1 genes previously annotated as escaping XCI were best fit by linear models in which expression increases by a fixed amount per Xi, while most genes previously annotated as subject to XCI were best fit by models with no change in expression per Xi ( Figure S2; see STAR Methods for the derivation of XCI status annotations from published studies). These results validate the ''nÀ1'' rule at the transcriptomic level, indicating that each cell has a single copy of Xa and nÀ1 copies of Xi. Moreover, linear modeling revealed that contributions by Xi to Chr X gene expression are strikingly modular, meaning that each Xi is more or less equivalent.
Linear models allowed us to identify genes whose expression changed significantly with additional copies of Xi and to quantify the absolute changes in expression (i.e., changes in read counts per Xi). To compare genes expressed at different levels, we also quantified the relative changes in expression per Xi. Specifically, we divided the change in expression per Xi (slope of regression, b X ) by the expression from the single Xa (average intercept across batches, b 0 )-a metric we refer to as DE X (Figure 2A). DE X = 0 indicates that adding one or more copies of Xi does not affect the level of expression (e.g., PRPS2; Figures 2B and S3); DE X > 0 indicates that expression increases under these circumstances (e.g., KDM5C; Figures 2C and S4), DE X = 1 indicates that Xa and Xi contribute equally; and DE X < 0 indicates that expression decreases (e.g., F8; Figures 2D and S5). XIST, the lncRNA that acts in cis to transcriptionally repress X chromosomes from which it is expressed, 30,31 was the only gene without detectable expression in cells with one copy of Chr X (Xa) that was expressed robustly in cells with one or more inactive copies (Xi). Considering samples with two or more X chromosomes, we found that XIST expression increased linearly with each additional copy of Xi ( Figures 2E and S4).
DE X values vary widely among Chr X genes but not between cell types Analyzing DE X values across Chr X genes revealed that Xi contributions to expression varied widely. Of 357 Chr X genes expressed in LCLs, 235 (66%) showed no significant change in expression level with additional copies of Xi (DE X z 0), and the same was true for 304 (77%) of 393 Chr X genes expressed in fibroblasts ( Figures 2F and 2G; full results in Table S2). This is consistent with these genes being expressed-in the respective cell types-from a cell's first X chromosome (Xa) and silencing on all others (Xi). The remaining 122 (34%) Chr X genes expressed in LCLs and 89 (23%) Chr X genes expressed in fibroblasts had significantly negative or positive DE X values. Combining the results in LCLs and fibroblasts, Xi copy number significantly impacts gene expression levels for 162 of 423 (38%) Chr X genes expressed in one or both cell types. NPX genes' DE X values ranged from À0.39 to 1.2 ( Figures 2F and 2G). PAR1 genes had DE X values near one, while PAR2 genes had values near zero (Table S2). The stark difference between PAR1 and PAR2 likely reflects their evolutionary origins: PAR1 was preserved on Chr X and Y through sex chromosome evolution and retains autosome-like features, while PAR2 evolved later through a transposition from Chr X to Chr Y. 32 For nearly all Chr X genes, the change in expression per Xi falls short of that contributed by Xa (DE X <1), similar to previous studies using allelic ratio analysis. 13,14 Only two NPX genes-XIST and PUDP-and three PAR1 genes-DHSRX, PLCXD1, and PPP2R3B-showed DE X values approaching or exceeding one in both LCLs and fibroblasts.
We assessed whether these Chr X expression dynamics were influenced by factors apart from Xi count. We found few differences between cell types; genes expressed in both LCLs and fibroblasts displayed concordant DE X values ( Figure 2H). This is consistent with studies of differential expression between 46,XY and 46,XX tissues that found correlated expression changes for Chr X genes across diverse tissues. 15 To control for any effects of gonadal sex or Y chromosome copy number on our results, we reanalyzed the data from samples with zero (45,X; 46,XX; 47,XXX; 48,XXXX) or one copy of Chr Y (46,XY; 47,XXY; 48,XXXY; 49,XXXXY), modeling expression as a function of Xi copy number and batch. DE X values were unaffected by the presence or absence of a Y chromosome ( Figure S6). Because of its design, our study reveals that these consistent Chr X expression dynamics derive from direct, cell-autonomous contributions of Xi rather than systemic effects of hormones or environmental factors.
For genes with multiple transcript isoforms (alternative transcripts), we asked whether DE X values were consistent between isoforms (STAR Methods). For most genes, transcript isoforms displayed concordant DE X values. However, for genes with multiple transcript isoforms, 33 (19%) in LCLs and 25 (13%) in fibroblasts had discordant DE X values: at least one isoform's DE X differed significantly from zero (false discovery rate [FDR] < 0.05), while another isoform's DE X did not ( Figure S7; Table S3). The most striking case is that of UBA1, where    Figure S8). To assess reproducibility, we compared our results with those from an independent dataset that used microarrays to assay gene expression in LCLs across diverse sex chromosome constitutions. 24 Reanalyzing this dataset using linear models, we found that the resulting microarray DE X values correlated well with the DE X values calculated from our RNA-seq data ( Figure S9; STAR Methods).

Supernumerary copies of Chr Y and 21 show little attenuation of gene expression
To determine whether the attenuated expression observed with extra copies of Chr X also occurs with additional copies of other chromosomes, we analyzed cells from individuals with additional copies of Chr Y or with trisomy 21, a common autosomal aneuploidy and the cause of Down syndrome. 33 For Chr Y, we used the same linear model as for Chr X: modeling expression as a function of Chr Y copy number, Xi copy number, and batch ( Figure 3A; STAR Methods). We calculated DE Y values separately for NPY and PAR genes because NPY genes are not expressed in samples with zero Y chromosomes, while PAR genes are expressed in all samples.
For NPY genes, we analyzed samples with one to four Y chromosomes to quantify expression differences, if any, between the first and additional Y chromosomes. Expression of all NPY genes increased significantly, with DE Y values close to 1, consistent with near-equal expression from each copy of Chr Y (e.g., KDM5D; Figures 3B-3D and S10; full results in Table S4).
For PAR genes, we analyzed samples with zero to four Y chromosomes. As with Chr X, PAR1 gene expression increased with additional copies of Chr Y, yielding DE Y values close to one, whereas PAR2 genes had DE Y values near zero ( Figures 3C and  3D). This implies, first, that PAR1 genes are expressed on each copy of Chr X or Y, while PAR2 genes are only expressed on the first copy of Chr X (Xa), and, second, that PAR1 gene expression from each additional Chr X or Y is roughly equal to expression from the first.
Finally, we examined Chr 21 gene expression as a function of Chr 21 copy number (Figures 3E and S11; STAR Methods). Nearly three-quarters of expressed Chr 21 genes significantly (FDR < 0.05) increased in expression with an additional copy of Chr 21 (e.g., CCT8; Figure 3F), and none significantly decreased ( Figure 3G; Table S5). These results align well with independent studies of Chr 21 gene expression. 34 In sum, unlike genes on Chr X, our analysis reveals that most genes on Chr Y and Chr 21 are expressed similarly on each copy of their respective chromosomes. The median DE values for Chr Y (including NPY and PAR1) and Chr 21 genes range from 0.74 to 1. By comparison, NPX genes without NPY homologs had median DE X z 0, while NPX genes with NPY homologs had modestly higher median DE X values (LCLs: 0.26, fibroblasts: 0.17; Figure 3H). Even PAR1 genes, which as a group had the highest median DE X values, were modestly attenuated on Xi compared with Chr Y, especially in LCLs ( Figure 3H). This Y-vs.-X effect was most pronounced for CD99, located near the PAR1-NPX/Y boundary (Tables S2 and S4), consistent with suggestions that PAR1 gene expression on Xi is modestly attenuated by spreading of heterochromatin. 15 These differences highlight the absence of a chromosome-wide mechanism attenuating (or otherwise altering) gene expression on supernumerary copies of Chr Y and Chr 21, in contrast to Chr X.
Xi modulation of Xa transcript levels revealed by divergence of DE X and allelic ratio DE X conveys the change in a gene's expression due to an additional Xi regardless of the mechanism(s) responsible for this change. We hypothesized that a Chr X gene's DE X value could reflect the combined effects of two mechanisms: (1) transcription of Xi allele(s) and (2) modulation of the Xa allele by Xi in trans.
Seeking evidence of these mechanisms, we searched gene by gene for agreements and disagreements between our calculated DE X values and published descriptions of the genes as ''escaping'' XCI (being expressed from Xi) or being subject to it (silenced on Xi). For this purpose, we curated annotations of each expressed gene's XCI status from studies of allele-specific expression [13][14][15][16]18 (STAR Methods; Table S6). Many genes with DE X > 0 were classified as expressed from Xi (50/74 in LCLs and 43/61 in fibroblasts), indicating that transcription from Xi alleles underlies their DE X values ( Figure 4A; Table S6). Genes with these characteristics overlapped significantly between fibroblasts and LCLs ( Figure 4B).
For 102 (24%) of the 423 Chr X genes we evaluated in LCLs or fibroblasts, our calculated DE X values were at odds with expectations arising from the published annotations of XCI status (Figures 4A and 4C; Table S6). For example, among genes with DE X > 0, 22 in LCLs and 14 in fibroblasts were described as silenced on Xi. Additionally, previous models offer no explanation for the 48 genes in LCLs and 28 genes in fibroblasts with DE X < 0, most of which were described as silenced on Xi (Figures 4A and 4C; Table S6). Genes with these characteristics did not overlap significantly between LCLs and fibroblasts, even though most are expressed in both cell types, indicating that this regulation is largely cell-type specific ( Figure 4C). These Article ll OPEN ACCESS unanticipated findings are unlikely to reflect experimental error in the previous or current studies. Instead, they suggest that, for many Chr X genes whose Xi allele(s) are silent, the Xa allele is nonetheless upregulated (DE X > 0) or downregulated (DE X < 0) by Xi.
To corroborate these findings in our own dataset, we performed an allele-specific analysis in our LCL and fibroblast samples with two X chromosomes. To distinguish between Chr X alleles, we identified heterozygous SNPs in expressed genes (STAR Methods). We then identified samples in our dataset with skewed XCI (21 LCL and 10 fibroblast samples; Figures S12-S16) and used these samples to compute the average ratio of Xi to Xa expression (the allelic ratio [AR]) for each gene. To calculate the AR, we required heterozygous SNPs in at least three samples, resulting in AR values for 151 genes in LCLs and 119 in fibroblasts (Table S6; Figure S17). In LCLs and fibroblasts, respectively, 38 (25%) and 22 (18%) of these genes had AR values significantly greater than zero, indicating that they are expressed from Xi ( Figures 4D and 4E); these results agreed well with published AR values ( Figure S18).
We next compared each gene's AR and DE X values. If Xi and Xa expression are fully independent of each other, and therefore additive, we would expect the AR for a given gene to approxi-mate its DE X value. However, if Xi modulates the gene's Xa transcript levels in trans, then independence and additivity will not be observed, and instead the gene's DE X and AR values will differ. Most X-linked genes, e.g., EIF2S3, had AR values that approximate their DE X values, and AR and DE X were highly correlated among many informative genes in both LCLs and fibroblasts ( Figures 4F and 4G). For these genes, the DE X value may directly reflect the level of transcription from Xi. However, for 33 informative genes in LCLs or fibroblasts, AR and DE X were significantly different, indicating that Xi modulated Xa transcript levels upward or downward in trans (Figures 4F-4H; STAR Methods). Some genes, like MPP1, were not expressed from Xi (AR z 0) but nonetheless had DE X values significantly different from zero: 0.24 in LCLs and 0.21 in fibroblasts, indicating that levels of Xa-derived transcripts are positively regulated by Xi. Other genes, like DDX3X and PUDP, had significant expression from Xi (AR > 0, FDR < 0.05) and evidence of Xi regulation of steady-state expression levels. DDX3X had an AR (LCLs: 0.55, fibroblasts: 0.42) that is significantly higher than its DE X value (LCLs: 0.26, fibroblasts: 0.16) in both LCLs and fibroblasts, indicating both that DDX3X is expressed on Xi and that its steady-state transcript levels are negatively regulated by Xi. Conversely, PUDP had an AR (LCLs: 0.71, fibroblasts: 0.73)    Figure 4I). This represents the union of the 102 genes for which the public AR data cannot explain the DE X values ( Figure 4C) and the 33 genes with AR values significantly different from DE X ( Figure 4H). The observed modulation of steady-state transcript levels suggests that Xi regulates the expression of genes on Xa in trans.
Combining DE X and expression constraint metrics identifies likely drivers of Xi-associated phenotypes While the somatic cells of all diploid individuals have one Xa, the number of Xis varies in the human population from zero to four. This variation is associated with many important differences in phenotypes and disease predispositions, for example those observed between 45,X (Turner syndrome) and 46,XX individuals, between 47,XXY (Klinefelter syndrome) and 46,XY individuals, or even between 46,XY males and 46,XX females. We hypothesized that phenotypes and predispositions associated with Xi copy number are due to changes in the copy numbers of some of the Chr X genes where we found positive or negative DE X values. We reasoned that phenotypically critical genes would be ''dosage sensitive'', i.e., their expression levels would be tightly constrained by natural selection, while the expression levels of genes whose dosage is not phenotypically critical could vary with little consequence.
To gauge the constraints that selection has imposed on each gene's expression level, we turned to metrics derived from population and evolutionary genetic studies. We assessed tolerance of under-expression using (1) loss of function observed/expected upper fraction (LOEUF), the ratio of observed to expected lossof-function (LoF) variants in human populations, 35 (2) RVIS, the residual variation intolerance score, 36 and (3) pHI, the probability of haploinsufficiency. 37 Both LOEUF and RVIS use large-scale human genomic sequencing data to evaluate selection against LoF variants, while pHI is based on evolutionary and functional metrics. LoF variants should be culled from the population in genes whose under-expression is deleterious, while they may accumulate in genes whose under-expression has little effect on fitness.
To assess tolerance of over-expression, we examined conservation of targeting by microRNAs (miRNAs; P CT score 38 ), which repress expression by binding to a gene's 3 0 untranslated region. 39 Genes sensitive to over-expression have maintained their miRNA binding sites across vertebrate evolution, while genes whose over-expression has little or no effect on fitness show less conservation of these sites. 40 To weigh these four metrics simultaneously, we calculated each gene's percentile rank for each metric, from most constrained (high percentile) to least constrained (low percentile). We calculated percentiles separately for autosomal (including PAR1) and NPX genes and then, for each gene, averaged percentile rankings across the four metrics.
We first examined expression constraints for PAR1 genes whose high DE X values suggested that they may drive pheno-types associated with Xi copy number. Compared with autosomal genes, PAR1 genes are less constrained on average (p = 5.5eÀ5, Wilcoxon rank-sum test), with most ranking in the least-constrained quartile (Figures 5A and 5B; Table S7). This indicates that altering their expression levels has little impact on human fitness. Indeed, homozygous LoF mutations have been reported for 3 of 15 PAR1 genes, demonstrating dispensability. 35 Only two PAR1 genes, SHOX and SLC25A6, rank in the more constrained half of the comparison group (Table 2). SHOX copy number contributes to variation in height in individuals with sex chromosome anomalies, 41-45 while SLC25A6 has not yet been linked to any phenotype. Apart from these two genes, the high tolerance of under-and over-expression for most PAR1 genes argues against prominent roles in phenotypes associated with Chr X (or X + Y) copy number.
Turning to the much larger set of NPX genes, we found that their widely ranging constraint metrics correlated poorly with their DE X values ( Figures 5C and 5D; Table S7). Thus, DE X alone does not predict dosage sensitivity among NPX genes. To identify the NPX genes most likely to drive Xi-copy-numberdependent phenotypes, we selected those with |DE X | R 0.1 (FDR < 0.05) in LCLs or fibroblasts and ranked these by their average constraint metrics. Five of the top 10 genes by these criteria (Table 2) had NPY homologs, a significant enrichment (p = 7.0eÀ4, hypergeometric test), and all five had DE X values > 0.1 in both cell types. Of the five genes without NPY homologs, only two had DE X values significantly greater than zero in both cell types: SMC1A and CDK16. The remaining three genes had DE X values significantly different from zero in only one of the two cell types analyzed, and they tended to have lower absolute DE X values.
If these 10 genes are dosage-sensitive drivers of Xi-dependent phenotypes-even when harboring no mutations-then one might expect mutant phenotypes to be pronounced and to display distinctive modes of inheritance. Accordingly, we searched OMIM for disease annotations. Germline mutations in seven of the 10 genes are reported to cause severe developmental disorders, including well-characterized childhood syndromes for five of the genes (Tables S2 Table S7). Indeed, five of the seven mutation-bearing genes are reported to display dominant inheritance (affected heterozygous females)-a significant enrichment among X-linked genes (p = 0.0037, hypergeometric test) and consistent with extraordinary dosage sensitivity. Extrapolating from these findings, we speculate that some of the genes without annotated OMIM phenotypes may have important roles in disease; in the case of ZFX, no LoF mutations are reported in gnomAD even though the gene's roles in regulating stem cell self-renewal and cancer cell proliferation are well documented. [46][47][48] Taken together, these 10 genes represent good candidates for driving Xi-dependent phenotypes characteristic of individuals with sex chromosome aneuploidies-as well as differences in disease risks between ordinary (euploid) females and males.

DISCUSSION
We analyzed Chr X gene expression quantitatively in two types of cells cultured from individuals with one to four X chromosomes, e.g., 45,X to 49,XXXXY (Figure 1). Folding this diversity of sex chromosome constitutions into a single linear model ( Figure 2) yielded advantages over previous studies, which compared sex chromosome constitutions in pairwise fashion, most frequently 45,X vs. 46,XX; 46,XY vs. 47,XXY; or 46,XX vs. 46,XY. First, our linear model embodied, tested, and confirmed-at the level of the X transcriptome-the ''nÀ1'' rule, 4 whereby diploid somatic cells with a given number (n = 1, 2, 3, 4) of X chromosomes have a single Xa and nÀ1 Xi's. Second, linear modeling provided the power needed to detect and precisely quantify increases or decreases in expression of individual Chr X genes as a function of Chr X copy number. Third, linear modeling revealed that the expression contributions made by each copy of Xi are modular, indicating that each copy of Xi is equivalent, or nearly so, even among unrelated individuals. Fourth, by comparing samples that vary in Xi copy number with and without a Y chromosome, we found that expression from Xa is quantitatively indistinguishable in phenotypic males and females-as is expression from Xi (Figure S6). Thus, both Xi and Xa make modular contributions to Chr X gene expression-contributions independent of and unaffected by the presence of the NPY or the gonadal sex of the individual.
Finally, linear modeling of gene expression as a function of Chr X copy number yielded the metric DE X , which captures the positive or negative impact of Xi(s) on steady-state transcript levels for each gene, normalized to account for gene-to-gene variation in expression level (Figure 2A). Fully 38% (162/423) of expressed Chr X genes in LCLs or fibroblasts displayed a statistically significant positive or negative DE X value, indicating that their expression is impacted by the presence of one or more copies of Xi. This is nearly double what would be expected based on the prior literature's estimates of escape, which-based upon our reanalysis using the broadest definition of escape-includes only 20% (86/423) of expressed Chr X genes in these cell types (Table S6). DE X values varied widely among Chr X genes, from À0.39 to 1.2 ( Figures 2F and 2G), but showed much less variation between the two cell types studied ( Figure 2H), suggesting the possibility that the DE X ''settings'' for each gene were established prior to the embryonic divergence of the hematopoietic and skin fibroblast lineages and subsequently maintained through development.
We extended the utility of the DE X metric by cross-referencing and comparing it, one gene at a time, with an orthogonal metric: the AR of Xi and Xa transcripts in cells with skewed XCI and SNP heterozygosity. AR values significantly greater than zero unambiguously identify Chr X genes that are expressed from both Xi and Xa (and therefore ''escape'' XCI). 49    expression of at least 121 genes, or nearly 29% of the 423 Chr X genes that are demonstrably expressed in either LCLs or fibroblasts. This modulation is manifest whenever a gene's AR and DE X values differ significantly, and it is most starkly apparent when the gene is not expressed from Xi (i.e., when AR approximates zero) but nonetheless displays a significantly positive or negative DE X value. While ''escape'' from XCI has been well documented over the past four decades, 12-15 the novel combination of the AR and DE X metrics reported here was required to observe modulation, explaining why it has previously been unappreciated.
Thus, combined analysis of DE X and AR reveals a nuanced, gene-by-gene tapestry of Xi-driven changes in expression of Chr X genes. For some genes, DE X was explained entirely by expression from the Xi allele, while for others, DE X was explained entirely by modulation-positive or negative-of steady-state RNA levels derived from the Xa allele. For a third set of genes, DE X was explained by the combined effects of expression from Xi and modulation of steady-state RNA levels. Proposals of uniform, chromosome-wide ''X chromosome upregulation'' (XCU) during mammalian development or evolution 50,51 will need to be revisited in light of this unforeseen diversity of gene-by-gene responses to variation in Chr X copy number.
Finally, we paired the DE X metric with population and evolutionary measures of constraint on expression levels to identify 10 NPX genes that are most likely (among the 423 Chr X genes expressed in LCLs and/or fibroblasts) to drive Xi-associated phenotypes (Figures 5C and 5D; Table 2). Despite their high DE X values, most PAR genes did not exhibit the constraints on expression levels that we required for inclusion in this select group of candidate drivers ( Figures 5A and 5B). We propose the 10 NPX genes-five of which have divergent NPY homologs-as potential drivers of (1) differences in health and disease between 46,XY and 46,XX cohorts and (2) the distinctive phenotypes associated with sex chromosome aneuploidies, including Turner syndrome (45,X) and Klinefelter syndrome (47,XXY). We speculate that one or more of these 10 NPX genes, which include transcriptional and epigenetic regulators, may also drive the modulation of Xa genes by Xi.

Limitations of the study
The human individuals sampled here are mostly of European ancestry; it will be important to validate these findings in a more ancestrally diverse set of individuals. Our findings in LCLs and fibroblasts were largely concordant, but they may not generalize to all somatic tissues and cell types. Our study focused on 423 Chr X genes that are expressed in LCLs and/or fibroblasts; our conclusions may not generalize to Chr X genes that are not expressed in these cell types. Our list of Chr X genes likely to drive Xi-dependent phenotypes is incomplete, as it is biased toward genes expressed in LCLs and fibroblasts and toward genes with long open reading frames well suited to expression constraint analysis; future studies will add to this list. In addition to these caveats regarding our current findings, several topics remain unexplored in this article and should be addressed in future studies; these include the molecular mechanisms by which Xi modulates gene expression on Xa, whether these mechanisms are direct or indirect, and whether these mechanisms also affect gene expression on autosomes.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

RESOURCE AVAILABILITY
Lead contact Further information and request for resources and reagents should be directed to and will be fulfilled by lead contact, David C. Page (dcpage@wi.mit.edu).

Materials availability
Cell lines are available upon request to the lead contact.
Data and code availability d Raw, de-identified RNA-sequencing data from human cell cultures has been deposited to dbGaP under accession number phs002481.v2.p1, and processed data has been deposited at Zenodo under accession number https://doi.org/10.5281/ zenodo.7504743. d This paper analyzes existing, publicly available data. Accession numbers for these datasets are listed in the key resources table. d Original code has been deposited at Zenodo under accession number https://doi.org/10.5281/zenodo.7504743 and is publicly available as of the date of publication. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Human subjects Adults (18+ years of age) with sex chromosome aneuploidies or euploid controls were recruited through an IRB-approved study at the NIH Clinical Center (12-HG-0181) and Whitehead Institute/MIT (Protocol #1706013503). Informed consent was obtained from all study participants. Individuals with a previous karyotype showing non-mosaic sex chromosome aneuploidy were included in the study. From these individuals, blood samples and skin biopsies were collected at the NIH Clinical Center and shipped to the Page lab for derivation of cell lines. In addition, blood samples from individuals with sex chromosome aneuploidies, and euploid family members, ranging in age from 4-44 years were contributed by the Focus Foundation. Additional LCLs and fibroblast cultures were obtained from the Colorado Children's Hospital Biobank and Coriell Research Institute, and cultured in the Page laboratory for at least two passages prior to collection for RNA-sequencing. Karyotyping of peripheral blood and fibroblast cell cultures was performed at the National Human Genome Research Institute Cytogenetics and Microscopy Core. To reduce the impact of sex chromosome mosaicism on our sex chromosome aneuploidy analysis, we excluded individuals with >15% mosaicism for other karyotypes. Metadata for cell lines represented in the RNA-sequencing dataset are provided in Table S1.

METHOD DETAILS
Cell culture Lymphoblastoid cell lines Blood was collected in BD Vacutainer ACD tubes and shipped at room temperature to the Page Lab for processing 1-3 days after collection. The buffy coat was resolved by centrifuging blood at 3300 rpm for 10 min, transferred to a new tube with PBS, and subjected to density gradient centrifugation in 50% Percoll (Cytiva) at 3300 rpm for 10 min. Lymphocytes were transferred to a new tube and washed twice with PBS. Lymphocytes were resuspended in 3 mL complete RPMI medium (RPMI 1640 (Gibco), 25mM HEPES (SAFC), 15% FBS (Hyclone), Fungizone (Amphotericin B, Gibco), Gentamicin (Gibco), Penicillin/Streptomycin (Lonza), pH 7.2) per tube of blood and transferred to a T25 flask, supplemented with 0.25mL EBV (produced by B95-8 marmoset lymphoblasts), and 0.2 mL of 1 mg/mL cyclosporine (LC Laboratories). They were incubated for one week at 37 C, fed 1-2 mL complete RPMI, and incubated for another week at 37 C. Once the media began to turn yellow (acidified), cultures were ''half-fed'' by removing half of the media and replacing it with double the volume. When cultures reached 15 mL, they were transferred to T75 flasks, and gradually expanded to 30 mL, while maintaining a concentration of <1 million cells/mL to ensure viability. Cells were viably frozen for future use by mixing with freezing media (LCL culture media + 5% DMSO), 1 million cells per vial. Cells were also preserved for RNA, DNA, and protein extraction (see below).

Primary fibroblast cultures
Our protocol for generating primary skin fibroblast cultures from a skin biopsy is based on Vangipuram et al. 59 From adults (18+ years of age) at the NIH Clinical Center we obtained two 4-mm skin punch biopsies from the upper arm, which were immediately placed into a 15 mL conical tube with 10 mL of media (DMEM/F12 (Gibco), 20% FBS, and 100 IU/mL Penicillin-Streptomycin. Tubes were shipped to the Page lab overnight on ice for processing. Each biopsy was used to generate a separate skin fibroblast culture. Biopsies were cut into 18 pieces of equal size and placed 3/well in gelatinized 6-well plates with 1 mL media (High Glucose DMEM (Gibco), Cell Genomics 3, 100259, February 8, 2023 e3 Article ll OPEN ACCESS 20% FBS, L-Glutamine (MP Biomedicals), MEM Non-Essential Amino Acids (Gibco), 100 IU/mL Penicillin/Streptomycin (Lonza)). Plates were gelatinized by incubating 1 mL sterile 0.1% gelatin (Sigma) solution per well for 30 min at room temperature.
Plates were incubated for 1 week at 37 C without disturbance to allow biopsies to attach to the plate and begin to grow out. During week 2, we added 200 mL of fresh media per well every 2-3 days, being careful not to disturb the biopsies. The following week (week 3), we aspirated the media and replaced with 1 mL fresh media per well every 2-3 days. During week 4, we aspirated the media and replaced with 2 mL fresh media per well every 2-3 days. At this point, the fibroblasts generally reached the edges of the wells and were expanded to two T75 gelatinized flasks per 6 well plate. After two days, we combined the cells from the two T75 flasks and split them to three T175 gelatinized flasks. After two days, cells were viably frozen with 1 million cells per vial in freezing media (fibroblast culture media + 5% DMSO). Cells were also preserved for RNA extraction (see below). During optimization of the protocol, cell culture purity was confirmed by immunofluorescence of SERPINH1, a fibroblast marker.
Cell collection for subsequent analysis Cells were collected when LCL cultures reached 30mL, and fibroblasts were $80% confluent in three T175 plates. All cell counting was performed using the Countess II cell counter (Life Technologies) and Trypan Blue exclusion. Cultures with >85% cell viability were used in subsequent experiments. To preserve cells for subsequent RNA extraction, 1 million cells were washed in PBS, pelleted, and resuspended in 500 mL TRIzol (Invitrogen) or 200 mL RNAprotect Cell Reagent (Qiagen). Cell suspensions were then frozen at À80 C. Cell cultures were maintained at low passage number; RNA-sequencing experiments were performed on samples at or below passage 4.
Periodically, and on each passage used for experiments, cell cultures were confirmed negative for mycoplasma contamination using either the MycoAlert Kit (Lonza) following the manufacturer's instructions, or PCR using SapphireAmp Fast PCR Master Mix (Takara) and the following primers: Myco2(cb): 5 0 CTTCWTCGACTTYCAGACCCAAGGCAT-3 0 Myco11(cb): 5 0 ACACCATGGGAGYTGGTAAT-3 0 PCR for GAPDH was performed on the same sample, using the following primers: hGAPDH-F: TGT CGC TGT TGA AGT CAG AGG AGA hGAPDH-R: AGA ACA TCA TCC CTG CCT CTA CTG. Known mycoplasma positive and negative samples were used as a reference.
RNA extraction, library preparation, and sequencing RNA was extracted from 1 million cells per experiment using the RNeasy Plus Mini Kit (Qiagen) following the manufacturer's instructions, with the following modifications: Cells in RNAprotect Cell Reagent were thawed on ice, pelleted, and lysed in buffer RLT supplemented with 10mL b-mercaptoethanol per mL. For most samples, ERCC control RNAs were added to the lysate based on the number of cells: 10mL of 1:100 dilution of ERCC control RNAs was added per 1 million cells. The lysate was then homogenized using QIAshredder columns (Qiagen), and transferred to a gDNA eliminator column. All subsequent optional steps in the protocol were performed, and RNA was eluted in 30 mL RNase-free water. RNA levels were measured using a Qubit fluorometer and the Qubit RNA HS Assay Kit (ThermoFisher). Before we switched to the per-cell spike-in protocol, we prepared 18 samples in which ERCC control RNAs were added based on amount of RNA after isolation: 2 mL of a 1:100 dilution of ERCC control RNAs was added per 1 mg of RNA. These samples are: #2237, 2245, 6312, 711, 4032, 706, 3429, 3430, 3442, 2690, 2703, 3107, 5297, 5566, 5755, 6029, 2547, and 525. RNA quality control was performed using the 5200 Fragment Analyzer System (Agilent); we consistently purified high-quality RNA with RNA integrity numbers (RIN) near 10. We randomized the samples by karyotype into batches for RNA extraction, library preparation, and sequencing. RNA sequencing libraries were prepared using the TruSeq RNA Library Preparation Kit v2 (Illumina) with modifications as detailed in Naqvi et al, 60 or using the KAPA mRNA Hyper-Prep Kit V2 (Roche). In both cases, libraries were size selected using the PippinHT system (Sage Science) and 2% agarose gels with a capture window of 300-600 bp. Paired-end 100x100 bp sequencing was performed on a HiSeq 2500 or NovaSeq 6000 (Illumina). Table S1 lists the library preparation kit and sequencing platform for each sample.
RNA-seq data processing and analysis All analyses were performed using human genome build hg38, and a custom version of the comprehensive GENCODE v24 transcriptome annotation. 52 This annotation represents the union of the ''GENCODE Basic'' annotation and transcripts recognized by the Consensus Coding Sequence project. 61 Importantly, the GENCODE annotation lists the PAR gene annotations twice -once on Chr X and once on Chr Y-which complicates analysis. We removed these annotations from Chr Y so the PAR genes are only listed once in our annotation, on Chr X. To analyze samples in which ERCC spike-ins were added, we merged our custom transcript annotation with the ERCC Control annotation.
Reads were pseudoaligned to the transcriptome annotation, and expression levels of each transcript were estimated using kallisto software v0.42.5. 54 We included the ''-bias'' flag to correct for sequence bias. The resulting count data (abundance.tsv file) were imported into R with the tximport package v1.14.0 62 for normalization using DESeq2 v1.26.0. 55 For downstream analysis, we used only protein-coding genes (as annotated in ensembl v104) with the following exceptions: we included genes annotated as pseudogenes on Chr Y that are members of X-Y pairs (TXLNGY, PRKY) and well-characterized long non-coding RNAs (lncRNAs) involved in X-inactivation or other processes (XIST, JPX, FTX, XACT, FIRRE, TSIX). We annotated genes distal to XG, which spans the e4 Cell Genomics 3, 100259, February 8, 2023 Article ll OPEN ACCESS pseudoautosomal boundary on Xp and is truncated on Chr Y, as part of PAR1 -15 genes in total. PAR2 comprised the four most distal genes on Xq and Yq. Annotations of non-pseudoautosomal region of the X (NPX) genes with homologs on the non-pseudoautosomal region of the Y (NPY) were derived from Bellott et al. 63 224 protein-coding genes on Chr 21 (ensembl v104) were used as a starting point for our analyses. We excluded 21 annotated genes in several regions with high homology between the long and short arms of Chr 21 because the assembly was not fully validated in these regions (https://www.ncbi.nlm.nih.gov/grc/human/issues? filters=chr:21).
Identifying genes affected by changes in chr X, Y, or 21 copy number We first defined lists of expressed NPX, NPY, PAR, or Chr 21 genes as those with median TPM of at least 1 in 46,XX or 46,XY samples. To ensure that no genes with robust expression were excluded, we also analyzed LCL and fibroblast expression data from GTEx, 53 and included several genes that were just below our TPM cutoff but had median TPM of at least 1 in those datasets.
For each expressed NPX, NPY, or PAR gene we performed linear modeling using the lm() function in R. These calculations suppose that each additional chromosome adds a consistent and equal increment to the total expression level of the gene in question.
For NPX and PAR genes we used the following equation: E represents the expression (read counts) per gene, b 0 represents the intercept, b X and b Y are the coefficients of the effect of additional copies of Chr Xi or Y, respectively, and˛is an error term. For this equation, the intercept represents the 45,X samples.
For NPY genes we employed the following equation, analyzing only those samples with one or more copies of Chr Y: The resulting p values were adjusted for multiple hypothesis testing using the p.adjust() function in R, specifying the Benjamini Hochberg method. Genes with a false discovery rate (FDR) < 0.05 were considered significant. To compute the normalized expression change per Chr Xi (DE X ) or Y (DE Y ), we divided the coefficient of interest (b X or b Y ) by the average intercept across batches, which corresponds to the baseline expression of the gene in samples with only one X chromosome (for NPX and PAR genes) or one Y chromosome (in the case of NPY genes). For Chr 21, we computed DE 21 by dividing the coefficient (b 21 ) by the average intercept across batches divided by two to obtain the average expression from one copy of Chr 21.
Assessing linearity of sex-chromosome gene expression changes To assess whether sex-chromosome gene expression changed linearly (i.e., by a fixed amount) with additional X or Y chromosomes, their expression levels across the LCL or fibroblast samples were fit by non-linear least squares to the power curves shown below, using the ''nlsLM'' function from the R package ''minpack.lm''.
NPX genes: where y Ã j = y j y k PAR genes: NPY genes: In each of the equations above, y Ã j is the normalized RNA-seq read count for a given gene in sample j, given by the raw read count in sample j divided by the average read count in the set of samples S f$g with only one chromosome of the relevant type: for NPX genes, 1 copy of Chr X (and any number of Y chromosomes); for PAR genes, 45,X samples; for NPY genes, 1 copy of Chr Y (and any number of X chromosomes). b = 0:5 and a = 1 were used as initial parameter values. Fitted values of az1 indicate a linear relationship between expression and sex-chromosome count. Fitted values of az0 or bz0 indicate no change in expression with X or Y count.

DE X calculations in samples with 0 Y chromosomes (females) and 1 Y chromosome (males)
We took subsets of the samples with either zero Y chromosomes (females) and with one Y chromosome (males) and performed the same linear modeling and DE X calculations as above. We removed MAP7D2 in female LCLs, IL13RA2 in female fibroblasts, and FHL1 in male fibroblasts because their error values (likely due to smaller sample size) were much higher than those of other genes. To compare the linear modeling results, we performed Pearson correlations between the results using all samples, and those from male-only or female-only samples.
Reanalysis of array data and comparison to RNA-seq data A previous study performed gene expression analysis, using Illumina oligonucleotide BeadArrays, of LCLs from 68 individuals of the following karyotypes: 45,X; 46,XX; 46,XY; 47,XXX; 47,XXY; 47,XYY; and 48,XXYY. 24 Since this microarray dataset was generated from an independent set of samples, we sought to validate our results through a reanalysis of the data.
The raw data from the microarrays was not publicly available, but the authors provided us pre-processed data upon request, which we used to perform our analysis. To identify genes that cleared a minimum signal threshold to be considered expressed in the microarray data, we assessed the median signal in 46,XY samples for all Chr Y genes annotated on the microarray. We focused on Chr Y genes in this analysis because many are known to be expressed exclusively in testes, and therefore could provide us with an appropriate sense of the background signal expected for genes not expressed in LCLs. From this analysis, we concluded that a signal threshold of 111 would be appropriate for identifying expressed genes ( Figure S9A). We used this threshold to identify 278 expressed Chr X genes (including PAR and NPX genes) in the microarray dataset. This was fewer than the 341 expressed Chr X genes identified in our LCL RNA-seq data, but more than double the 121 expressed Chr X genes reported in Raznahan et al. (Table S4 in Raznahan et al.). 24 This discrepancy could not be resolved by simply increasing the signal threshold in our analysis, as TMSB4X, one of the most highly expressed genes in LCLs, was excluded from the previously reported list of expressed genes.
Using our list of 278 expressed genes from the Raznahan et al. dataset, we analyzed the microarray signal values (in place of RNAseq read counts) using linear models as a function of Xi copy number, controlling for Chr Y copy number. We calculated DE X values from the microarray data and compared these to our RNA-seq dataset using a Pearson correlation, which revealed that the results were generally concordant ( Figure S9B). For genes that were lowly expressed, however, DE X values tended to be much lower in the microarray dataset, consistent with the higher sensitivity of RNA-seq data.
Isoform-specific analysis of RNA-seq data After estimating counts for each transcript using kallisto software (as described above, with 100 bootstraps), we used sleuth v0.30.0 to normalize those transcript counts. 56 X chromosome transcripts were called expressed if their corresponding gene was on the list of expressed genes (above) and median transcript counts were >200. Linear regressions and DE X calculations for transcripts were performed as for genes (above) to identify transcripts whose abundance changes significantly with additional copies of Chr X.
The following ENCODE datasets were used for visualization in IGV software 58 at the UBA1 locus.  36 including the ExAC dataset were downloaded from http://genic-intolerance.org/data/RVIS_Unpublished_ExAC_May2015.txt. Updated probability of haploinsufficiency (pHI) scores 37 were downloaded from https://www.deciphergenomics.org/files/downloads/ HI_Predictions_Version3.bed.gz. To complement these data, we obtained a list of genes with observed homozygous loss-of-function variants. 35 For sensitivity to an increase in gene dosage, we used the per-gene average probability of conserved miRNA targeting scores (P CT ). 38 For each metric, we computed a percentile rank score, ranking from most-to least-constrained. Because several of the metrics calculated scores separately for autosomal (including PAR genes) and NPX genes, we ranked autosomal (and PAR) genes separately from NPX genes. All annotated genes, regardless of expression status in LCLs or fibroblasts, were included in the rankings, with the following exceptions: 1) NPX genes previously annotated as ''ampliconic'', 64,65 since constraint metrics cannot be accurately applied to these highly similar genes, and 2) genes with <2 annotations across all metrics.
To obtain an aggregate sense of a gene's expression constraint across multiple metrics, we calculated the average. Among NPX genes, we considered those with |DE X R 0.1| (FDR < 0.05) to be most likely to contribute to phenotypes mediated by Xi copy number, prioritizing the top ten genes by the average gene-constraint metric. For PAR1 genes, we prioritized genes with an average gene constraint percentile ranking of at least 50%. To assess the phenotypic roles of highly constrained genes, we annotated them for disease phenotypes with known molecular basis from Online Mendelian Inheritance in Man (OMIM). 66 Comparisons to published annotations of X-inactivation status We re-compiled XCI status annotations of individual genes from four studies of Chr X allelic ratios. [14][15][16]18 Previous XCI status compilations 9 incorporated DNA methylation data, which we excluded because it does not directly measure Xi transcription. Previous compilations also incorporated information about expression in human-rodent hybrid cell lines carrying a human Xi 13 ; we incorporated this information only where allelic ratios in human cells were not available. Our final XCI status annotations are listed in Table S6, with the workflow for generating these annotations explained below.
The first dataset that we incorporated was derived from paired genomic and cDNA SNP-chips in skewed LCL and fibroblast cell cultures (Additional file 7 from Cotton et al.). 14 We used the AR values provided (average Xi expression column) for genes informative in at least 5 samples, resulting in AR values for 424 genes. Using the provided numbers of informative samples and standard deviations of AR values, we computed 95% confidence intervals for the AR values. We considered a gene ''Subject'' to XCI if the AR 95% confidence interval included zero or the AR value was <0.1; otherwise we considered the gene to ''Escape''.
The second dataset that we incorporated was derived from bulk or single cell RNA-seq of LCLs. 15 The bulk RNA-seq was from an individual in the GTEx dataset with 100% skewed XCI across the body (Table S5 from Tukiainen et al.). 15 The single-cell RNA-seq in LCLs was from three individuals (Table S8 from Tukiainen et al.; we excluded data from one dendritic cell sample). 15 For each dataset, we calculated an AR for each gene using read counts from the more lowly and highly expressed alleles in each sample, and used the provided adjusted p-values to identify genes with significant Xi expression (padj < 0.05). For a gene to be considered informative, we required data from at least two individuals in the single cell dataset, or one individual in the single cell dataset and informative in the bulk RNA-seq dataset, resulting in 82 informative genes. We called a gene as ''Subject'' to XCI if there was no significant expression from Xi in either the bulk or single-cell datasets, and ''Escape'' if one or both of the datasets showed evidence of Xi expression.
The third dataset that we incorporated was derived from single-cell allelic expression in fibroblasts. 16 The dataset includes five individuals (Dataset 3 from Garieri et al.) 16 and we required data from at least two samples to be considered informative for a given gene, resulting in 203 genes. We converted their reported values (Xa reads/total reads) to AR values using the following formula: AR = 1 Xa reads total reads À 1. We used the AR threshold calculated in the previous study 16 to consider a gene significantly expressed from Xi in each sample (AR > 0.0526). If a gene had no samples with significant expression from Xi or a mean AR value < 0.1 across samples, we considered it ''Subject'' to XCI; otherwise, it was judged to ''Escape'' XCI.
The fourth dataset that we incorporated was derived from allele-specific bulk RNA-seq performed on 136 samples with skewed XCI from the set of GEUVADIS LCLs (Tables S4 and S5  Article ll OPEN ACCESS that it be informative in at least 10 samples, resulting in 215 genes. We calculated an AR for each gene in each sample using the read counts from the more lowly and highly expressed alleles in each sample, adjusting for the level of skewing in each sample. To identify genes that were significantly expressed from Xi across samples, we performed paired, two-sample, one-sided t tests using the t.test function in R, asking whether the raw (pre-adjusted for skewing) AR values were greater than the baseline AR given the level of skewing in each sample (baseline AR = 1 À skewing coefficient skewing coefficient ); we corrected the resulting p values for multiple comparisons with the p.adjust function in R using the Benjamini-Hochberg method. Genes were considered ''Escape'' if they had padj < 0.01, and ''Subject'' otherwise.
Next, we synthesized the calls from these four datasets. We assigned a gene as ''Subject'' if: 1) all studies were ''Subject'' or 2) most (>50%) studies were ''Subject'' and the average AR across all studies was <0.1. We assigned ''Escape'' if 1) most (>50%) studies were ''Escape'' or 2) 50% or fewer (but more than 0) studies were ''Escape'' and either i) there was more than one study with evidence of escape or ii) the average AR across all studies was R1. Finally, we assigned ''No call'' if the gene was not informative in any of the four datasets. For these genes, we investigated whether there were any calls using hybrids from Carrel et al. 13 as compiled in Balaton et al. 9 If a gene had no call in any of the four AR datasets, but had a proportion of expression in Xi hybrids <0.22, we considered the gene ''Subject''; genes with a greater proportion were called ''Escape.'' To compare our calls with previous XCI consensus calls, we made the following modifications to the Balaton list: XG had been listed as a PAR gene, but we excluded it from our list of PAR genes because it is located at the PAR boundary and truncated on the Y chromosome. We updated its annotation to escape (''E'') since the Balaton table lists evidence for escape. The Balaton table lists XIST as ''mostly subject'' to XCI, but given its exclusive expression from Xi, we updated its status to escape (''E''). We manually examined all genes on our list that were not found in the Balaton list to make sure that genes were not misclassified due to differences in official gene names. For those genes still not present in the Balaton list after this correction, we list ''No call''. To compare with our annotations, we grouped the Balaton calls into ''Escape'' if they were annotated as ''PAR'', ''Escape'', ''Mostly escape'', ''Variable Escape'', ''Mostly Variable Escape'', or ''Discordant''. We grouped Balaton calls into ''Subject'' if they were annotated as ''Mostly subject'' or ''Subject''.
We compared our new calls with the Balaton calls for the 423 genes expressed in fibroblasts or LCLs, finding 48 where they differed. Of these, nine had a call in Balaton, but no call in the newer datasets. For two genes (TCEAL3, TMSB4X), there was no call in Balaton, but newer data enabled a call to be made (both ''Subject''). Nineteen genes were called ''Subject'' in Balaton, but new data indicates that they have expression from Xi and we categorize them as ''Escape.'' The final eighteen genes were called ''Escape'' in Balaton, but new data suggested they have no expression from Xi. In total, our classification found 86 genes that ''Escape'', 315 genes that are ''Subject'' to XCI, and 22 genes with ''No call.''

Allele-specific expression analysis
This workflow is diagrammed in Figure S13.

SNP calling
We called SNPs in each RNA-seq sample with two X chromosomes (46,XX, 46,XX,+21, 47,XXY, 48,XXYY) following the Broad Institute's ''Best Practices'' workflow for identifying short variants in RNA-seq data (https://gatk.broadinstitute.org/hc/en-us/articles/ 360035531192-RNAseq-short-variant-discovery-SNPs-Indels-). To perform our skewing analysis, we filtered for SNPs with the following properties: 1) annotated in the dbSNP database, 2) located in an exon of an expressed gene, 3) displaying a minimum coverage of 10 reads, and 4) heterozygous with at least three reads mapping to each of the reference and alternative alleles. We excluded SNPs where the presence of two alleles likely represented technical artifacts rather than biallelic expression, including in WASH6P (SNPs map to multiple near-identical autosomal paralogs), ATRX (SNP in a mutation-prone stretch of Ts), and APOOL (SNPs within an inverted repeat). For samples with a copy of Chr Y, we excluded SNPs mapping to PAR genes, to avoid measuring allelic contributions of Chr Y. Identifying cell lines with skewed X chromosome inactivation We classified genes as ''Xa only'' (only expressed from Xa) if previously characterized as ''silenced'' and found here to have DE X < 0.05 (FDR > 0.5); see Table S6. We expect that in skewed cell lines, reads from Xa-only genes should be near or completely monoallelic ( Figure S12). For each SNP in Xa-only genes, we calculated the ''skewing coefficient'' by dividing the number of reads from the dominant allele by the total number of reads covering the SNP ( Figure S14). These coefficients range from 0.5 (equal expression of two alleles) to 1 (expression from a single allele). For each sample, we computed the median skewing coefficient across all SNPs in Xa-only genes, requiring a threshold of 0.8 to classify as skewed. Using simulations, we find that this level of skewing is unlikely to occur by chance (P < 1x10 À6 ), and we do not find evidence of such skewing for SNPs on Chr 8, an autosome with a similar number of expressed genes ( Figure S16).
Several samples had few (%5) informative SNPs in Xa-only genes, but many SNPs in other genes ( Figure S15). We interpret this to mean that these samples are highly skewed and that we do not observe enough RNA reads covering both alleles to count SNPs in Xaonly genes as informative. Between these highly-skewed samples and the samples with skewing coefficients of at least 0.8, we identified 21 LCLs and 10 fibroblast cultures with skewed XCI. Determining allelic ratios for X chromosome genes After identifying the skewed cell lines, we identified genes with informative SNPs values in at least three skewed samples of a given cell type. We then computed the allelic ratio (AR) at each informative SNP by dividing the number of reads from the more lowly Article ll OPEN ACCESS expressed allele by the number of reads from the more highly expressed allele. In cell cultures that are partially skewed, genes will appear more biallelic than in completely skewed cell cultures since there are two populations of cells with different active X chromosomes present -the ''major'' and ''minor'' cell populations. Using our skewing estimates, we adjusted the AR on a per-sample basis using the following formula: Where t is the estimated percentage of cells in the ''minor'' population (i.e., with the other X chromosome active compared to the ''major'' cell population), calculated by: 1 -skewing coefficient. For highly-skewed samples, we were unable to calculate a stringent skewing coefficient due to too few SNPs, so we set skewing coefficient = 1. As a result, it is possible that allelic ratios in these samples may be slightly overestimated if the skewing coefficients are in fact <1. Within each sample, we obtained the average AR for each gene by averaging across all informative SNPs in that gene's exons ( Figure S17; Table S6) and then calculated the mean AR across skewed samples to obtain a final per-gene AR estimate (Table S6).
To assess whether AR values for each gene were significantly greater than zero, we performed one-sided t tests using the t.test function in R, asking whether the AR values were greater than zero; we corrected the resulting p values for multiple comparisons with the p.adjust function in R using the Benjamini-Hochberg method (Table S6). We also repeated this analysis excluding highly skewed samples, since the skewing coefficients cannot be stringently determined. This removed some informative genes but did not significantly affect the AR values ( Figure S19; Table S6).
To identify genes whose AR and DE X values differ significantly, we performed one-sample, two-sided t tests for the AR values across samples, setting mu = DE X (Table S6). We selected genes with Benjamini-Hochberg adjusted p values < 0.1 as having significantly different AR and DE X values. From this list we excluded genes for which the 95% confidence interval of DE X values (1.96*SE) included the mean AR value, and those for which both DE X and AR were not significantly different from zero (FDR R 0.05).
We compared our AR values derived from LCLs or fibroblasts with the four published allelic-ratio datasets described in the above methods on generating XCI status calls ( Figure S18).