Definition of germ layer cell lineage alternative splicing programs reveals a critical role for Quaking in specifying cardiac cell fate

Abstract Alternative splicing is critical for development; however, its role in the specification of the three embryonic germ layers is poorly understood. By performing RNA-Seq on human embryonic stem cells (hESCs) and derived definitive endoderm, cardiac mesoderm, and ectoderm cell lineages, we detect distinct alternative splicing programs associated with each lineage. The most prominent splicing program differences are observed between definitive endoderm and cardiac mesoderm. Integrative multi-omics analyses link each program with lineage-enriched RNA binding protein regulators, and further suggest a widespread role for Quaking (QKI) in the specification of cardiac mesoderm. Remarkably, knockout of QKI disrupts the cardiac mesoderm-associated alternative splicing program and formation of myocytes. These changes arise in part through reduced expression of BIN1 splice variants linked to cardiac development. Mechanistically, we find that QKI represses inclusion of exon 7 in BIN1 pre-mRNA via an exonic ACUAA motif, and this is concomitant with intron removal and cleavage from chromatin. Collectively, our results uncover alternative splicing programs associated with the three germ lineages and demonstrate an important role for QKI in the formation of cardiac mesoderm.


INTRODUCTION
Pluripotent stem cells (PSCs), such as embryonic stem cells (ESCs) and induced pluripotent stem cells, can differentiate into any adult somatic cell (1). They are also valuable models for the study of development (2), regenerative medicine applications (3,4) and drug discovery/toxicity testing (5). Recent advances in human PSC differentiation have enabled the production of large numbers of nearly pure germ layer committed progenitors including definitive endoderm, mesoderm, ectoderm, and their progeny (6)(7)(8)(9)(10). This has enabled the detailed characterization of the transcriptional and epigenetic/chromatin states of these lineages in comparison with hPSCs, underscoring the diversity of their transcriptional outputs (7,8,(11)(12)(13)(14). However, the use of these approaches to investigate post-transcriptional regulatory programs underlying germ layer specification remains largely unexplored.
Alternative splicing enables the widespread quantitative and qualitative regulation of gene outputs (15,16). Accordingly, different cell and tissue types have unique splicing patterns that contribute to their identity, such as those observed in muscle and brain (17,18). Studies have also identified splicing patterns that are unique to stem and precursor cells, including embryonic stem cells (ESCs) compared to cardiac precursor cells (19,20), other late mesoderm cell types (21) and neural stem cells (22,23). Tissue-and developmental stage-specific alternative splicing, and indeed most RNA processing, are controlled in large part by RNA binding proteins (RBPs) (24,25).
Quaking (QKI) proteins represent one such family of RBPs. From the single Quaking locus in human and mouse at least three major RNA and protein isoforms are produced (26,27). Like other RBPs, QKI is multifunctional, regulating both nuclear and cytoplasmic RNA processing (28)(29)(30)(31)(32), with the QKI5 isoform being necessary and sufficient for splicing function (33). Although there is a clear role for QKI in the transition from myoblasts to myotubes (29), in smooth muscle (34,35) and cardiac muscle (36), it is not known whether QKI regulates earlier cell fate decisions. Other RBPs, including RBFOX2, hnRNPF/H and MBNL family members, are known to regulate alternative splicing programs that control pluripotency, self-renewal, and differentiation (22,37,38). However, the extent to which splicing programs and regulatory RBPs diverge during early cell fate lineage commitment, and their influence on downstream differentiation programs is unknown.
Here, we used RNA-sequencing (RNA-seq) to address these questions by quantitatively profiling both developmentally-regulated and lineage-specific alternative splicing, as well as RBP abundance dynamics, in H9 hESCs differentiated in parallel to definitive endoderm (DE), cardiac mesoderm (CM) or neuroectoderm (ECT) cells. These analyses provide evidence for distinct lineage-specific splicing programs, with a particularly marked difference between DE and CM cells. In fact, many alternative exons are spliced in an opposite manner in DE and CM cells. We further show that many of these differentially spliced exons contain -or are flanked by -motifs that are bound and regulated by QKI. Consistent with these observations, QKI levels increase in CM and decrease in DE, compared to undifferentiated hESCs, and disruption of QKI expression reveals that it is required for a substantial fraction of CM-specific splicing changes. One such target is exon 7 of Bridging INtegrator 1/Amphiphysin 2 (BIN1) transcripts. Through an unusual mechanism, inhibition of splicing of this exon by QKI is coupled to the promotion of adjacent intron removal, transcript maturation, and subsequent accumulation of BIN1 protein. Moreover, knockout of QKI prevents the transition from CM to cardiomyocytes, which likely is mediated in part by a requirement of QKI for expression of BIN1. Collectively, these findings uncover a key role for alternative splicing in defining early lineage-specific gene expression patterns, and implicate QKI as a cardiac mesoderm-enriched RBP that is required for the control of lineage-specific splicing and differentiation of cardiac myocytes.

Cell culture and differentiation
All cells were cultured at 37 • C. H9-hrGFP NLS (a generous gift from Seigo Hatada), NKX2.5→EGFP HES3 (a generous gift from David Elliot, Andrew Elefanty and Ed Stanley (Murdoch Children's Research Institute), and NKX2.5→EGFP QKI KO human embryonic stem cells were cultured on hESC-qualified Matrigel (Thermo Fisher/Corning) and supplemented with mTeSR1 (Stemcell Technologies) supplemented with 1% penicillin/streptomycin (Thermo Fisher/Gibco). They were routinely passaged using ReLeSR or Gentle Cell Dissociation Reagent (both from Stemcell Technologies) following the manufacturer's instructions and at times (such as recovery after cryopreservation) with the addition of Y27632 ROCK inhibitor (Thermo Fisher) according to the manufacturer's instructions. Each of the hESC lines were derived from a female, were confirmed to have a normal karyotype during the course of the study, and routinely tested negative for mycoplasma contamination. hESC transfections were performed using Lipofectamine Stem Reagent (Thermo Fisher) according to the manufacturer's instructions.
For endoderm differentiation cells were plated as above then first differentiated to anterior primitive streak for 24 h in CDM2 basal media supplemented with 100 ng/ml Activin A (R&D Systems or Stemcell Technologies), 2 m CHIR99021 (Tocris) and 50 nM PI-103 (Tocris), For the following 48 h, cells were cultlured in CDM2 basal media supplemented with 100 ng/ml Activin A (R&D Systems or Stemcell Technologies) and 250 nM LDN-193189 (Tocris) with one media change after the initial 24 h incubation.

Plasmids and tranfections/transductions
NKX2.5→GFP hESCs were co-transfected with pX458-sgQKI as described (39) or control empty vector along with pCEP-puro which contains a puromycin-resistance gene to allow for antibiotic selection and elimination of untransfected cells. One day later 1 g/ml puromycin was added and cells were selected for 3 days, and then expanded in the absence of puromycin. Upon verification of reduced QKI protein in expanded cells, then directly sequencing PCR products amplified with PrimeSTAR Premix HS (Clontech) from total population-level genomic DNA (extracted using the DNeasy Blood and Tissue kit (Qiagen) by following the manufacturer's directions) and analyzing with TIDE (40), limiting dilutions of 0.5 cells per well were performed and delivered to 96-well plates. These were monitored visually to confirm only single cells were present in each well, then expanded. Potential clonal KO lines were initially screened by western blot, then genomic DNA (extracted and amplified as above) was cloned into pCR4Blunt-TOPO (Thermo Fisher) then six different colonies derived from independent hESC clones were sequenced to confirm two copy disruption of the reading frame.

Western blotting
Proteins were extracted using RSB100 buffer (100 mM Tris-HCl pH 7.4, 0.5% NP-40, 0.5% Triton X-100, 0.1% SDS, 100 mM NaCl and 2.5 mM MgCl 2 ) plus cOmplete EDTA-Free Protease Inhibitor Cocktail (Sigma), by direct addition of this buffer to wells/plates containing cells and scraped with cell scrapers and collect by pipet. Lysates were then vortexed intermittently while kept on ice for approximately 30 minutes, then centrifuged at 14 000 RPM at 4C for 15 min. Clarified lysate was transferred to a clean tube and either used immediately or store at −80C. Protein quantification was performed using the Bradford protein assay (BioRad). Equal protein amounts were loaded and separated on 10 or 12% SDS-PAGE then transferred to 0.4 m nitrocellulose. Blocking was performed with 5% nonfat milk (NFM) in tris-buffered saline (TBS) for 1 h at room temperature (RT). Primary antibody incubations (including anti-panQKI, anti-QKI5, anti-QKI6, anti-QKI7 (all four from Antibodies Incorporated), anti-QKI (Bethyl), antialpha/beta-Tubulin, anti-NKX2.5 (both from Cell Signaling Technologies), anti-SOX17 (R&D Systems), anti-OCT4, anti-cTNT, anti-BIN1 (all three from abcam) and anti-GAPDH (from Sigma) were performed overnight at 4 • C in NFM in TBS containing 0.01% tween-20 (TBST), then membranes were washed at least 3 times with TBST at RT. Secondary infrared-conjugated antibodies from Li-Cor were added to NFM in TBST at a dilution of 1:20 000 or 1:15 000, and incubated for 1 hour at RT protected from light. Membranes were then washed 3 times for 5 min and then 1-2 times for 15 min each before visualization on Odyssey CLx imager (Li-Cor).

Immunofluorescence
Cells were fixed with 4% paraformalydehyde diluted in phosphate-buffered saline (PBS) for 15-30 min then washed with PBS 2-3 times and stored in PBS until stained. Cells were blocked/permeabilized with 1% BSA and 0.3% Triton X100 in PBS for 1 h at RT, and were then incubated overnight at 4 • C in primary antibody (these included anti-panQKI (Antibodies Incorporated), anti-QKI (Bethyl), anti-NKX2.5, anti-SOX2 (both from Cell Signaling Technologies), anti-OCT4, anti-cTNT, anti-BIN1 (all three from abcam), anti-SSEA3, anti-SOX17 (both from R&D Systems), and anti-GAPDH (Sigma)) diluted in the above mentioned buffer. The following day, they were washed 3 times with PBS, then incubated in species-appropriate secondary antibody AlexaFluor 568 or AlexaFluor 647 (Thermo Scientific) at 1:1000 dilution in the above blocking buffer with the addition of 1 g/ml DAPI) for 1 h at RT, and then washed 3 times with PBS. Cells were then imaged immediately or stored for later analysis. Imaging was performed with Opera Phenix High Content Screening microscope (Perkin Elmer). Fluorescence intensity values were calculated for each channel on a per-cell and per-well basis using Harmony software (PerkinElmber), and recorded then subsequently analyzed in GraphPad Prism. Low magnification live cell imaging for NKX2.5→EGFP shown in Supplementary Figure S5F were performed on an IX71 inverted fluorescence microscope using the FITC filter (Olympus). Image analyses were done using Fiji (v2.0.0-rc-69/1.52p).

RNA extraction, RT-qPCR and RT-PCR
RNA was extracted from cells by directly adding TRIzol (Thermo Fisher) to wells containing cells, or using the ReliaPrep RNA MiniPrep system (Promega), each according to the manufacturer's instructions. In the case of TRIzol-based extractions, nucleic acid was treated with Turbo DNase (Thermo Fisher) then RNA re-extracted with phenol/chloroform/isoamyl alcohol and chloroform extraction. Reverse transcription was performed with Super Script III Reverse Transcriptase (Thermo Fisher) using a combination of anchored oligo dT and random hexamers. Depending on the input material amount, cDNA was then diluted within a range of 1:10 to 1:50 then used for qPCR analysis with PowerUp SYBR Green Master Mix (Thermo Fisher) on a StepOnePlus Real-Time PCR machine (Thermo Fisher). For standard PCR, cDNA was added and amplified using Taq 2X Master Mix (NEB) with cycle number determined empirically to ensure no overamplification. PCR products were analyzed by either agarose gel or DNA1000 BioAnalyzer chip (Agilent) run on the 2100 BioAnalyzer (Agilent).

RNA sequencing and analysis
RNA was extracted with TRIzol (see above; Thermo Fisher) and rRNA was depleted using the RiboCop rRNA Depletion Kit (Lexogen). This was used as input to generate paired-end libraries using the TruSeq Stranded Library Kit (Illumina) which were then pooled and sequenced on two High Output NextSeq 550 flow cells (Illumina) with a 75 base paired-end protocol. Reads were mapped using STAR version 2.6.1c onto the human GRCh38 reference build, with the parameters recommended by the ENCODE consortium. DESeq2, version 1.20.0, was used to determine changes in gene expression following the authors' vignette.
For splicing analyses, we used VAST-tools (version 2.0.2, database has.16.02.18) and rMATS (version 4.0.2, GEN-CODE v30 annotation) using the default parameters for each. Raw output was filtered based on quality scores, requiring at least 10 total reads per event in at least one replicate for a condition. Differential splicing was assessed with the VAST-tools diff module. Reads mapped onto the UCSC Genome Browser (hg38) were used to make coverage tracks and screen shots of these were used to visualize read levels for transcript abundance and alternative splicing. The RNAseq data was deposited at Gene Expression Omnibus (GSE162649).

Computational analysis of CLIP and RNA-seq datasets
We collected functional genomic datasets from a variety of sources assaying RNA binding proteins, including EN-CODE (42), CLIPdb (43) and starBase (44). All datasets were indexed by their genomic coordinates, which were used to compare to the genomic coordinates of alternative exons and 250 bp of upstream or downstream introns using the RELI tool (45). In brief, RELI estimates the significance of the observed number of intersections between an input set of regions (e.g. alternative exon sites) and each member of a library of functional genomics datasets (e.g. CLIPseq peaks). Significance is estimated by comparing to a null model formed by randomly sampling non-significant sites in each corresponding group used in this study (e.g. exons that are expressed (>50 base mean of average normalized read counts) but do not significantly change in CM cell compared to DE cell dataset (P ≥ 0.5), and their flanking 250 bp of upstream and downstream intron sequence). The distribution of the expected intersection values from the null model resembles, and therefore is modeled as, a normal distribution, with the model parameters estimated from a series of 2000 sampling procedures. To measure the frequency of RBP binding to alternative CE events, we merged the coordinates of each region of interest (250 bp upstream intron, cassette exon, and 250 bp of downstream intron) and performed RELI analysis as above. The 'ratio' value indicates the binding frequency observed for a given RBP CLIP dataset. See below for statistical analysis used for RELI and RNA-seq.

Flow cytometry and fluorescence-activated cell sorting (FACS)
Cells were prepared for FACS by removal from plates/wells with Accumax (Innovative Cell Technologies Inc.) according to the manufacturer's instructions. Staining for CXCR4 surface marker was performed using APC anti-human CD184 (CXCR4; BioLegend) or with APC Mouse IgG 2a , k isotype control (BioLegend) following the instructions from the manufacturer. The cells were finally passed through a 40 m strainer then kept on ice before sorting with BD FACS Aria IIU (Beckton Dickinson) with assistance from core facility operator Mark Griffin. For analysis of EGFP levels in NKX2.5→EGFP cells, cells were detached with 1x Trypsin/EDTA (0.25%; Thermo Fisher) according to the manufacturer's instructions and either measured immediately or fixed with 2% paraformaldehyde dissolved in PBS if Nucleic Acids Research, 2022, Vol. 50, No. 9 5317 measured later on the Guava EasyCyte HT (Luminex) flow cytometer. For all flow/FACS applications, gating on relevant forward and side scatter parameters was performed to restrict any events not identified as single cells. FlowJo software was used for data analysis.

Liquid chromatography with tandem mass spectrometry (LC-MS/MS)
Sample digestion: The samples were prepared similar to as described (46). Briefly, the agarose bead bound purified proteins were washed several times with 50mM TEAB pH 7.1, before being solubilized with 40 ul of 5% SDS, 50mM TEAB, pH 7.55 and incubated at RT for 30 min. The supernatant containing proteins of interest was then transferred to a new tube and reduced by making the solution 10mM TCEP (Thermo, #77720) and incubated at 65 • C for 10 min. The sample is then cooled to RT and 3.75 ul of 1M iodoacetamide acid added and allowed to react for 20 min in the dark after which 0.5 ul of 2M DTT is added to quench the reaction. 5 ul of 12% phosphoric acid is added to the 50 ul protein solution. 350 ul of binding buffer (90% methanol, 100 mM TEAB final; pH 7.1) is then added to the solution. The resulting solution is added to S-Trap spin column (protifi.com) and passed through the column using a bench top centrifuge (30 s spin at 4000g). The spin column is washed with 400 ul of binding buffer and centrifuged. This is repeated three times. Trypsin is added to the protein mixture in a ratio of 1:25 in 50 mM TEAB, pH 8, and incubated at 37 • C for 4 h. Peptides were eluted with 80 ul of 50 mM TEAB, followed by 80 ul of 0.2% formic acid, and finally 80 ul of 50% acetonitrile, 0.2% formic acid. The combined peptide solution is then dried in a speed vac and resuspended in 2% acetonitrile, 0.1% formic acid, 97.9% water and placed in an autosampler vial.
All LC-MS/MS data were acquired using XCalibur, version 2.1.0 (Thermo Fisher Scientific) in positive ion mode using a top speed data-independent acquisition (DIA) method with a 3 s cycle time. The survey scans (m/z 400-1000) were acquired in the Orbitrap at 60 000 resolution (at m/z = 400) in profile mode, with a maximum injection time of 50 ms and an AGC target of 4 000 000 ions. The S-lens RF level is set to 60. DIA windows were staggered at 12 Da Isolation as described (47). HCD MS/MS acquisition is performed in profile mode using 15 000 resolution in the orbitrap, with the following settings: collision energy = 30%; maximum injection time 22 ms; AGC target 500 000 ions and scan range of 145-1450.
Database searching: Each sample had three bioreps and the DIA data were analyzed using Scaffold DIA (3.0.1). Raw data files were converted to mzML format using Pro-teoWizard (3.0.21054) (48). Analytic samples were aligned based on retention times and individually searched against pan human library.dlib with a peptide mass tolerance of 10.0 ppm and a fragment mass tolerance of 10.0 ppm. The digestion enzyme was trypsin with a maximum of 1 missed cleavage site(s) allowed. Only peptides with charges of +2 or +3 were considered. Peptides identified in each sample were filtered by Percolator (3.01.nightly-13-655e4c7-dirty) (49)(50)(51) to achieve a maximum FDR of 0.01. Individual search results were combined and peptide identifications were assigned posterior error probabilities and filtered to an FDR threshold of 0.01 by Percolator (3.01.nightly-13-655e4c7-dirty).
Peptide quantification was performed by Encyclopedia (1.2.2). For each peptide, the five highest quality fragment ions were selected for quantitation. Only peptides exclusive to each protein or cluster were used for quantification. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis were grouped to satisfy the principles of parsimony. Proteins with a minimum of two identified peptides were thresholded to achieve a protein FDR threshold of 1.0%.

Real-time bioenergetic analysis in intact cells
The XF24 Extracellular Flux Analyzer (Agilent, Santa Clara, USA) was used to measure bioenergetic function in intact, day 8 or day 15 cardiac-differentiated WT or QKI-KO (clone 4-2) HES3 NKX2.5→EGFP as described previously (52,53). In preliminary studies, we determined that 1.5 × 10 4 cells was the optimal number of cells/well to allow detection of changes in oxygen consumption rate (OCR) for subsequent experiments. The XF24 Extracellular Flux Analyzer records the changes in oxygen levels in real-time by utilizing specific fluorescent dyes. Prior to the bioenergetic measurements, the culture medium was changed to unbuffered DMEM without serum. Next, we measured indices of mitochondrial function. Oligomycin, FCCP (carbonyl cyanide 4-(trifluoromethoxy) phenylhydrazone), and antimycin A/rotenone (AA + Rot) were injected sequentially through ports of the Seahorse Flux Pak cartridges to reach 2 M, 1 M, 2 g/ml and 2 M, respectively. Key bioenergetic parameters such as basal respiration (resting cell respiration), ATP production (calculated from the drop in OCR, in response to the ATP-synthase inhibitor, oligomycin), proton leak (migrated protons to the matrix without producing ATP), maximal respiratory capacity (maximal oxygen consumption achievable by using the uncoupling agent FCCP), spare respiratory capacity (accessible mitochondrial reserve capacity under high bioenergetic demands), and non-mitochondrial O 2 consumption (a small amount of residual OCR related to O 2 consumption due to the formation of reactive oxygen species or function of extramitochondrial enzymes) were measured. All measurements were normalized to protein content, determined in each individual well.
For quantification of protein abundances by western blot, after scanning infrared signals on the Odyssey CLx (Li-Cor), the region of interest for each band was selected and the value recorded using Image Studio software (Li-Cor). The values measured were normalized to a 'loading control' value (either Tubulin or GAPDH), at which time the Student's t-test was used to determine if experimental treatments were statistically significant between groups. In some cases, these normalized protein abundances were converted to values relative to control by dividing each measured experimental replicate value by the mean control value to determine the ratio of experimental to control; mean control values were then displayed at a value of 1.0 with experimental mean values shown relative to control, each ±SD.
RT-qPCR Data were analyzed using the delta-delta CT method and reported as relative to HMBS 'housekeeping gene'; we also confirmed similar results using values obtained from EEF1A1 and GAPDH as 'housekeeping genes'. Statistical significance was measured using the Student's ttest with dCT values from biological replicates as input. In most cases data were displayed ddCT (2 -dCT ) values as these convey dynamic changes in various cellular states, e.g. during differentiation between lineages or progeny cell types. In some cases, data were displayed as relative to control, as described above.
RT-PCR/BioAnalyzer quantification was determined by calculating the ratio of PCR products' concentration (nM) and reporting mean percent included or product represented of the band(s) of interest ± SD.
Quantification of immunofluorescence data was performed with Harmony software (PerkinElmer). No correction was performed for very low signal as some is detectable in any given channel due to background, and as such, this was included in reported values. Population-level quantification (signal per well) was measured for statistical significance by directly comparing WT to KO signal (experimental signal (e.g. QKI, BIN1, etc.) was normalized to either DAPI or GAPDH signal) output with the Student's t-test. Quantification of fluorescence intensity signals on the per cell level was normalized as above (using DAPI or GAPDH) for each event, then these values were binned and a histogram of distribution was generated. Statistical significance was measured by the Mann-Whitney U test.
Splicing events calculated by VAST-tools were deemed significantly differential if they had a 95% likelihood of being differential (i.e. MV[abs(dPSI) at 0.95] > 0) and the mean PSI/PIR difference was greater than 10. Changes between lineages were displayed in violin plots and statistical significance between distributions of positive/negative events were measured by Mann-Whitney U test. GO analysis was done separately for CE/MIC and IR event types with a background comprising all genes that contain a splicing event of the type being tested that survived filtering, requiring a minimum three-fold enrichment. If categories mutually overlapped by >70%, only the more significantly enriched category was kept.
For RELI analysis the significance of the observed number of intersections, e.g. a Z-score and the corresponding P-value, along with an enrichment score (observed intersection count divided by expected intersection count) were calculated, and are provided in Supplementary Table S4. We enlisted a significance cutoff of adjusted P-value < 0.05, ratio > 0.05 and enrichment ≥ 2 to ensure any given RBP's binding was well-represented in our splicing datasets.
Quantification of EGFP levels by flow cytometry were determined by analysis using FlowJo software and the histogram output was also generated by this program. Statistical significance between mean values were measured between independent biological replicates and calculated with the Student's t-test.

Data availability
The RNAseq data was deposited at Gene Expression Omnibus (GSE162649); proteomics datasets were deposited at MassIVE (MSV000088762).

Identification of developmentally-regulated and lineagespecific alternative splicing
To identify alternatively spliced events in human pluripotent stem cells and their germ layer-committed progenitor cells, we differentiated H9 hESCs (undifferentiated, UD) in parallel to DE (7), CM (8) and ECT ( Figure 1A; (9)), and performed bulk RNA-sequencing (RNA-seq) to analyze their transcriptomes. We first measured known pluripotency-or lineage-specific biomarkers at the level of RNA (POU5F1/OCT4 for UD, SOX17 for DE, HAND1 for CM, and PAX6 for ECT; see Figure 1B  analysis of transcript level changes (57) shows a close clustering of the three sample replicates from each lineage, indicating a high level of reproducibility ( Supplementary Figure S1C). As noted previously (13) most protein-coding transcript abundance changes relative to UD (P ≤ 0.01 by Benjamini-Hochberg) were shared between germ layercommitted progenitor cells (Supplementary Figure S1D; Supplementary Table S1).
We used VAST-tools (58,59) to measure splicing changes in each lineage relative to UD hESCs and detected 588 events in ECT, 977 in DE, and 1144 in CM (dPSI ≥ 10, see Materials and Methods); Supplementary Figure S1E and Supplementary Table S2). The majority of these are alternative cassette exons or 3-27 nt microexons (CE/MIC), or retained introns (Supplementary Figure S1E). We observe a similar frequency of increased inclusion vs skipping among differentially spliced CE/MIC events in DE, CM and ECT, compared to UD cells ( Figure 1C). The majority of alternatively spliced exons detected in DE or CM cells are specific to either lineage (unlike for transcript abundance changes (Supplementary Figure S1D)), but more than half of the events detected in ECT cells overlap with the other lineages ( Figure 1C) irrespective of the cutoffs used (i.e. lower or higher dPSI values (Supplementary Table S2)). We observe a similar trend for intron retention (IR), however there are significantly more retained introns in CM cells and fewer in DE or ECT cells (P < 0.0001 by Mann-Whitney U); additionally the frequency of unique IR events in ECT cells is higher compared to CE/MIC events in ECT cells (Supplementary Figure S1F). Furthermore, hierarchical clustering of CE/MIC ( Figure 1D) or IR (Supplementary Figure  S1G) events revealed that CM and ECT splicing patterns are more similar to each other than to DE. These observations reveal that most alternatively spliced exons detected in DE and CM cells are spliced in a lineage-specific manner.
We validated exons strongly differentially spliced in either CM cells (RAI14 exon 11, SLK exon 13, and MARK3 exons 17 and 18) or DE cells (PICALM exon 17 and CASK exon 19) by RT-PCR ( Figure 1E). Interestingly, alternative exons in FOXP1 (exons 18A or 18B; (60)) and DNMT3B (exons 21 and 22; (61-63)), previously defined for their roles in controlling transitions between pluripotent and differentiated states, show relatively large inclusion level changes in CM cells, and to a lesser extent in ECT cells or DE cells relative to UD hESCs ( Figure 1E). An additional class of exons shows opposite splicing patterns between DE and CM cells; this class includes BIN1 exon 7, which, relative to UD cells is more included in DE cells, more skipped in CM cells, and relatively unaffected in ECT cells. Other examples such as CLSTN1 exon 11 and NF1 exon 23 show the opposite trend ( Figure 1E). Validation of these specific events further corroborates that DE and CM splicing patterns are the most divergent groups compared here ( Figure 1D). Direct comparison using VAST-tools of CM relative to DE revealed 1382 total alternatively spliced events ( Supplementary Figure S1E), including 144 additional unique alternative exons ( Figure 1F, compare to Figure 1C) and 682 distinct retained introns (Supplementary Figure S1H) not detected when compared to UD hESCs. Many of these are due to a modest alteration in inclusion in DE relative to UD, and an additional but opposite modest change in inclusion in CM relative to UD (Supplementary Table S2). Moreover, specific CE/MIC events detected by RNA-Seq were validated at a high rate by RT-PCR assays (dPSI correlation: DE R 2 = 0.87, CM R 2 = 0.94, ECT R 2 = 0.74, DE/CM R 2 = 0.98), supporting the robustness of the measurement (Supplementary Figure S1I). We also observed significant enrichment of functional terms associated with genes containing these differentially spliced exons that are consistent with the biology of each lineage: i.e. the term synapse part is enriched for ECT cells, various cell adhesion/junctionbased terms are enriched in genes with more skipping in DE cells, and contractile fiber part and related terms are associated with genes displaying increased exon inclusion in CM cells, ( Figure 1G). Finally, we used rMATS (64) to independently identify CEs in each of the above comparisons and intersected these with exons significantly alternatively spliced by VAST-tools analysis. This indicated a positive correlation between the two dataset outputs, and a similar pattern of overlap between lineages that we observed with VAST-tools CEs (Supplementary Figure S1J; Supplementary Table S3), comparable to previous observations (65). Collectively, these data suggest that alternative splicing in germ layer-committed progenitor cells represents a lineagespecific regulatory program that is orthogonal to transcriptional regulation.

Binding and motif enrichment analyses reveal a role for QKI in CM-specific alternative splicing
We hypothesized that lineage-specific alternative splicing is regulated by distinct RBPs, and tested this with multiple approaches. First, we re-purposed a transcription factor/chromatin data analysis method called Regulatory Element Locus Intersection or RELI (45) to measure RBPs binding in or around lineage-specific CE/MIC events (CM compared to DE) identified as significantly differentially spliced by both VAST-tools and rMATS outputs (Supplementary Figure S1H and Supplementary Tables S2 and S3). This generated a list of 68 skipped and 62 included exons. In brief, RBP-RELI uses a simulation-based procedure to systematically gauge the significance of the intersection between a set of input regions (e.g. and in this case, exons alternatively-spliced between CM and DE cells, together with 250bp of flanking upstream and downstream intron sequence) and each member of a large library of functional genomics experiments, in this case, peaks from over 450 RBP crosslinking immunoprecipitation (CLIP)-seq experiments ( (43,44,66). This measures the binding frequency (fraction of splicing events), enrichment (fold-change over background), and significance (P-value) of RBPs to the lineage-specific CEs and their flanking intronic regions, compared to a background set of expressed exons and introns whose splicing is unchanged. This approach limits experimental bias arising from focusing on a single RBP, and facilitates discovery-based analysis.
RBP-RELI analysis yielded a list of 55 total unique CLIP datasets representing 17 individual RBPs with significant binding in or around exons that are alternatively spliced in CM cells compared to DE cells (cutoff: P-adj < 0.05, ratio > 0.05, enrichment ≥ 2 (see Materials and Methods); Supplementary Table S4). We plotted the resulting Z-scores of the top 10 most significant (or fewer if observed) RBPs relative to their transcript binding region (upstream intron, exon, downstream intron) and whether higher inclusion or skipping was observed (Figure 2A). This analysis indicated that Quaking (QKI) is the most frequently observed RBP with CLIP peaks mapping in or around these lineage-specific CEs (14 occurrences out of 55 total significant RBP datasets; Figure 2A and Supplementary Table  S4). Moreover, QKI may directly bind up to 45% of transcripts alternatively spliced between CM cells and DE cells (measured by at least one CLIP peak mapping within the genomic coordinates encompassing the CE and its flanking intron sequences (Supplementary Figure S2A and Supplementary Table S4)). The most significant binding was observed in the exon or upstream intron associated with skipping, or in the downstream intron when associated with inclusion, consistent with previously described positional influences of QKI on alternative splicing (Figure 2A; (29,67)). We also used the output comparing alternatively spliced exons in CM cells to DE cells by rMATS to measure motif enrichment with rMAPS2 (56). Consistent with the RBP-RELI results, this showed significant (by Wilcoxon's rank sum test) position-dependent enrichment of the QKI binding motif ACUAA that correlated with either exon inclusion or skipping ( Figure 2B and Supplementary Table S5) (68)(69)(70). Specific examples are: CM cell-activated exons in NF1, CLSTN1, and MARK3 ( Figure 1E), which show QKI binding in the downstream intron ( Figure 2C and Supplementary Figure S2B); CM cell-repressed exons in RAI14 and SLK ( Figure 1E) show QKI binding in the upstream intron and/or exon ( Figure 2C and Supplementary Figure  S2B); the DE cell-repressed exon in PICALM ( Figure 1E) shows QKI binding in the downstream intron (Supplementary Figure S2B). Additional observations suggest a role for TIA1 and TIAL1 in promoting inclusion by binding introns downstream of these lineage-specific CEs, and for RBFOX2 acting as a repressor of these exons (Figure 2A). This was also apparent from the rMAPS2 analysis; additionally, we observed enrichment of the CU-rich PTB family binding motif, which was particularly associated with exon skipping (Supplementary Table S5). Collectively, these results highlight the use of RBP-RELI to effectively identify candidate splicing regulatory RBPs of alternative splicing programs and suggest a prominent role for QKI in the regulation of CM-specific alternative splicing.

QKI5 increases in CM cells and decreases in DE cells
Given the above results, we hypothesized that QKI and possibly other splicing regulatory RBPs' levels increase during CM-and decrease during DE-lineage commitment. We identified RBPs (71) whose abundance significantly change (base mean of average normalized read counts > 50; P ≤ 0.01 by Benjamini-Hochberg; log 2 fold change > 0.4) (57) in our RNA-seq data (Supplementary Table S1) and found that each germ layer representative cell type has a unique network of RBPs, with comparison of CM to DE cells representing the most divergent set (Supplementary Figure S2C). Likewise, comparing the proteomes of DE, CM and UD cell extracts by liquid chromatography/tandem mass spectrometry (LC- Figure  S2D; Supplementary Table S6). This provides further evidence suggesting that lineage-enriched RBPs direct lineagespecific splicing, and allowed for rapid identification of RBPs: (i) whose levels change in CM compared to DE and (ii) that show significant binding and/or motif enrichment in the CM cell compared to DE cell splicing datasets. We focused on this short list of candidates and found both QKI and PTBP2 increase in CM and decrease in DE cells at both the RNA (Supplementary Table S1) and protein (Figure 2D) level. Of these, QKI exhibited significant binding (PTB family members failed to pass statistical cutoff; Figure  2A) and more binding (45% of significant CEs for QKI in HepG2 replicate 1 compared to 37% for PTBP1 in HepG2 replicate 2; Supplementary Table S4 and Supplementary Figure S2A) by RELI analysis, and more significant and widespread motif enrichment (lowest P = 1.02 -6 for QKI compared to 7.72 -6 for PTBP1 by Wilcoxon's rank sum test; Figure 2B and Supplementary Table S5) to CM-specific alternatively spliced exons. Given these observations we chose to focus on QKI-regulated splicing during the exit from pluripotency and commitment to DE or CM. Western blot analysis of pluripotency (OCT4), DE (SOX17), or CM (NKX2.5) differentiation markers indicate specific and efficient differentiation of each cell type, and validated the changes observed for QKI protein by LC-MS/MS analysis ( Figure 2E). Specifically, QKI significantly decreased in DE and increased in CM cell extracts, relative to UD hESCs (P < 0.05 by Student's t-test; Figure 2E). Consistent with a greater magnitude of change in QKI splicing targets, comparison of its levels from CM cell lysates to DE cell lysates indicated an 81% increase in CM cells (**P < 0.01 by Student's t-test; Figure 2E). We also measured the abundance of QKI5, QKI6, and QKI7 isoforms relative to total QKI protein as previously described (33), and found that the most abundant (>95%) is the nuclear isoform QKI5 (Supplementary Figure S2E). Additionally, we verified that QKI protein is predominantly nuclear, and co-expressed with lineage-specific biomarkers (SSEA3 for UD, SOX17 for DE and NKX2.5 for CM; Supplementary Figure S2F). Collectively, these data support a prominent and direct role for QKI in the regulation of an alternative splicing program specifically linked to formation of the CM lineage.

Cardiac mesoderm-specific exons are responsive to the levels of QKI
To determine whether QKI promotes CM-specific alternative splicing, we used CRISPR/Cas9 editing to generate hESC QKI knockout (QKI-KO) cells expressing EGFP under the control of the endogenous NKX2.5 promoter, which is specifically activated in cardiac mesoderm/myocytes (55), and then measured the inclusion levels of several CMspecific alternatively spliced exons in this line. Disruption of the QKI ORF was performed using a guide RNA targeting exon one as previously described (39) and confirmed by analysis of PCR products derived from genomic DNA (Supplementary Figure S3A). We also selected and expanded single cells from which three clonal NKX2.5→GFP knockout hESCs were assessed. Clones Figure 2A) and/or rMAPS2 (see Supplementary Table S5) analyses in DE cells or CM cells relative to UD hESCs, or CM cells relative to DE cells; relative abundance is shown as row Z-score. (E) Western blot of protein extracted from triplicate cultures of UD hESCs, DE cells and CM cells probed with antibodies for (top) OCT4 (magenta), SOX17 (green), NKX2.5 (magenta), and tubulin (gray), or (bottom) probed with antibodies for tubulin (magenta) and panQKI (green), showing mean infrared signal ± standard deviation for panQKI relative to tubulin (rel IR signal ± SD) below (*P ≤ 0.05, **P ≤ 0.01 by Student's t-test).
the same two lesions, while clone 4-5 shares one of the same lesions and an additional unique lesion (Supplementary Figure S3A). QKI protein is undetectable in all three clonal and the mixed population of cells ( Figure 3A). Correspondingly, the inclusion of predicted QKI target exons changes. RAI14 exon 11 and BIN1 exon 7 cassette exons are skipped more in wild-type (WT) CM cells ( Figure 1E), consistent with the prediction that they are QKI-repressed ( Figure 2C), are more included in the QKI knock-out cells ( Figure 3B). Conversely, NF1 exon 23 and CLSTN1 exon 11, which are more included in CM cells ( Figure 1E), consistent with the prediction that they are QKI-activated ( Figure  2C and Supplementary Figure S2B respectively), are more skipped in the QKI knock-out lines ( Figure 3B). Also, PI-CALM exon 17, which is repressed in DE cells ( Figure 1E), likely via QKI binding its downstream intron (Supplementary Figure S2B) and a reduction in QKI5 protein levels ( Figure 2E), is less included in the absence of QKI (Supplementary Figure S3B). The CM cell-specific alternative exon observed in DNMT3B that was not predicted to be regulated by QKI is unaffected by QKI loss (Supplementary Figure S3C). These findings were independently validated following knockdown with two independent shRNAs targeting the QKI5 isoform (Supplementary Figure S3D-F and G). We used a CRISPR activation (CRISPRa) approach (72) to increase the transcription of endogenous QKI (also in NKX2.5→EGFP hESCs) and asked if this would produce opposite effects as those observed in the QKI-KO hESCs for the above splicing events. We confirmed transgene targeting to the AAVS1 loci by genomic DNA PCR (data not shown), and doxycycline-inducible expression of dCas9:VPR and QKI protein isoforms by western blot (Figure 3C). A negative control consisting of H9 hESC lysate showed no reactivity with the Cas9 antibody, and the no guide RNA control NKX2.5→EGFP + dCas9:VPR cell lysates showed basal levels of QKI protein ( Figure 3C; compare to Supplementary Figure S2D also). Induction of QKI lead to a gain of function corresponding to the predicted splicing outcome: QKI-repressed events in RAI14 and BIN1 showed more skipping, QKI-activated events in NF1 and CLSTN1 showed more inclusion ( Figure 3D), and the QKIindependent event in DNMT3B showed no change (Supplementary Figure S3H). Comparing the dPSI values for QKI-KO relative to WT hESCs, QKI overexpression (QKI-OE) relative to WT hESCs, CM cells relative to UD hESCs, DE cells relative to UD hESCs, and CM cells relative to DE cells for these events indicates a similar pattern for QKI-KO relative to WT hESCs and DE cells relative to UD hESCs, and also a distinct pattern in which CM cells compared to UD hESCs, CM cells compared to DE cells, and QKI-OE compared to WT hESCs are more similar ( Figure 3E). Therefore, these CM cell-specific exons bound by QKI are responsive to both increases and decreases in QKI protein levels.

QKI is required for efficient early cardiomyocyte differentiation
Our data provide evidence that QKI5 may directly promote up to 45% of CM lineage-specific CE splicing patterns (binding 59 out of 130 events; Supplementary Figure   S2A and Supplementary Table S4). We used the QKI null hESCs to compare the efficiency at which these and WT control cells form CM and early day 8 cardiomyocytes (8). QKI protein increases when WT hESCs differentiate to CM ( Figure 2E) and cardiomyocytes (∼6-fold increase relative to tubulin), but as expected is undetectable in UD and day 8 polyclonal QKI-KO cells subjected to the cardiomyocyte differentiation protocol ( Figure 4A). Interestingly, QKI KO does not affect NKX2.5→EGFP levels during CM differentiation ( Figure 4B) but the expression of this marker is significantly reduced upon further maturation to day 8 cardiomyocytes ( Figure 4B and C; *P < 0.05, **P < 0.01 and ****P < 0.001 respectively, by Student's t-test), suggesting that QKI may be required for early cardiomyocyte differentiation.
To further investigate this, we measured endogenous biomarkers for CM and cardiomyocyte cell fate in QKI-KO and WT NKX2.5→EGFP hESCs subjected to these differentiation protocols. Consistent with reporter EGFP levels, endogenous NKX2.5 mRNA is unchanged in day 4 CM, but is significantly lower in the 4-2 clone KO day 8 cardiomyocyte cells (P < 0.05 by Student's t-test; Figure  4D); cardiac Troponin T (cTNT; TNNT2 gene), however, is lower in both CM and cardiomyocyte QKI KO cells (Figure 4D). Additional measurement of mRNA biomarkers for CM progenitor cells showed ISL1 and GATA4 abundance is lower in QKI KO cells, but HAND1 and MEF2C do not significantly change (Supplementary Figure S4A). Thus, QKI loss may moderately disrupt CM cell differentiation or specifically perturb certain sub-populations of cardiac progenitor cells. We also used high content imaging to analyze these developmental transitions in WT and QKI-KO NKX2.5→GFP hESCs, and observed no gross morphological perturbations, nor significant changes in OCT4 protein levels in the QKI-KO cells (Supplementary Figure  S4B). However, on day 4 of differentiation to CM, endogenous NKX2.5 protein is significantly lower on a per-cell basis in the absence of QKI (P < 0.05 by Mann-Whitney U test; Supplementary Figure S4C) further supporting the notion of a moderate perturbation of CM differentiation in the absence of QKI. Interestingly however, by day 8, both NKX2.5 and cTNT protein levels are markedly lower in QKI-KO cells by western blot analysis ( Figure 4E). High content imaging analysis (on both a per cell ( Figure 4F) and population basis (Supplementary Figure S4D)) also indicate significantly reduced cTNT protein levels in day 8 QKI-KO cells (****P < 0.0001 by Mann-Whitney U and ***P < 0.001 by Student's t-test, respectively). Likewise, analysis of the morphology of QKI-KO cells indicates a reduction of branched cardiac muscle-like structures at day 8 ( Figure 4F and Supplementary Figure S4E). QKI is required for late (days 15 and 30) hESC-derived cardiomyocyte cell contractile function and proper expression of structural and molecular markers (73), however it is an open question as to whether cardiomyocyte bioenergetic function is affected in the absence of QKI. Reduction of cardiac mitochondrial respiration can lead to embryonic or neonatal lethality, and cardiomyopathy (74)(75)(76), and thus is a wellestablished metric for measuring cardiomyocyte function (77). We measured this in both WT and clone 4-2 QKI-KO hESCs subject to day 8 or day 15 cardiomyocyte differen-  tiation. Intriguingly, we observed significantly lower mitochondrial function in the absence of QKI in both day 8 and day 15 cardiomyocyte cells ( Figure 4G). Specifically, basal respiration, ATP production-coupled respiration, maximal respiration, and spare respiratory capacity were all significantly reduced in QKI-KO hESCs at day 8 and to a greater magnitude at day 15 (upper bound *P < 0.05 by Student's t test; Figure 4F). Therefore, QKI is required for proper mitochondrial function during both earlier and later cardiomyocyte differentiation. We also tested if RAI14 exon 11, NF1 exon 23 and BIN1 exon 7 inclusion levels were altered at various time points in these cells subjected to cardiac lineage differentiation. QKI is absolutely required for skipping of RAI14 exon 11; NF1 exon 23 inclusion increases during differentiation, but much less so in the absence of QKI; BIN1 exon 7 skipping increases during differentiation, but is little changed in the absence of QKI ( Figure 4H and Supplementary Figure S4F).Taken together, we conclude that QKI is required for proper cardiac differentiation.

Complex regulation of BIN1 by QKI during cardiac cell lineage commitment
Interestingly, Bin1 is required for cardiac development (78): specifically, either the absence or loss of expression of its cardiac-specific isoform results in improper calcium signaling or defective myofibril formation, respectively (79,80). Given its importance in cardiac development and homeostasis, evidence of direct QKI binding (Supplementary Figure S5A), and its tissue-specific splicing patterns, we explored the consequences of QKI loss on its overall expression ( Figure 5A). Accordingly, we cloned and sequenced full-length BIN1 cDNAs from both UD and day 8 cardiomyocyte-differentiated WT and QKI-KO cells, and observed the expected increase in exon 7 inclusion in the absence of QKI, but also more inclusion of BIN1 exon 13, the clathrin and AP (CLAP) binding domain-coding exons, and exon 17 ( Supplementary Figures S5A and 5A). Total BIN1 mRNA levels in WT cells increases dramatically replicates; ns = not significant, *P < 0.05, and **P < 0.01 relative to WT, determined by Student's t-test). (C) Flow cytometry analysis of NKX2.5→EGFP UD (gray), WT, or polyKO day 8 c.myo (light or dark red, respectively); mean percent GFP positive is shown above each histogram (± standard deviation; biological replicate n = 3) and ****P < 0.0001 measured by Student's t-test; y-axis denotes number of events measured. (D) RT-qPCR measuring abundance of RNA extracted from UD hESCs, d4 CM cells, and d8 c.myo cells for endogenous NKX2.5 and TNNT2 (cTNT) mRNAs relative to HMBS mRNA in WT, polyKO or 4-2KO cells (biological replicate n = 2 or 3 each; ns = not significant, *P < 0.05, **P < 0.01, measured by Student's t-test). (E) Western blot of protein extracted from NKX2.5→EGFP UD hESCs and d8 c.myo cells either WT or pKO for QKI and probed with antibodies for tubulin (green, top) cTNT (magenta, middle (TNNT2)) and endogenous NKX2.5 (green, bottom). (F) Indirect immunofluorescence with high content imaging analysis of NKX2.5→EGFP WT (left) or QKI polyclonal KO (right) d8 cardiomyocyte cells showing QKI5 (magenta), cTNT (yellow), EGFP (green), brightfield (gray), and DAPI (blue) with each fluorescent channel overlaid (scale bar indicates 100 m). On the right, the fluorescence intensity values per cell (relative to DAPI) are shown for QKI5 and cTNT with P values calculated by Mann-Whitney U test. (G) Real-time mitochondrial bioenergetic measurement of intact day 8 or 15 cardiac differentiated WT or QKI-KO 4-2 cells measuring basal respiration (BR), ATP production (ATP prod), proton leak (PL), maximal respiration (Max resp), spare respiratory capacity (SRC), and non-mitochondrial O 2 consumption (NMOC); the y-axes denote oxygen consumption rate (OCR) normalized to total protein content in pMoles/min/g protein (n = 9 or 10 independent replicates each; ns = not significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 by Student's t-test). (H) RT-PCR and BioAnalyzer gel-like image showing quantification of RAI14, NF1, and BIN1 alternative exons using RNA extracted from WT or polyKO NKX2.5→EGFP cells either UD, or subjected to the following differentiation protocols: day 2 lateral mesoderm (d2 lat meso), day 4 CM, day 6 cardiomyocyte (d6 c.myo), and day 8 cardiomyocyte (d8 c.myo) (biological replicate n = 3; mean percent included ± is shown below by heatmap; see Supplementary Figure S4G for mean numerical values and standard deviation from the mean). See also Supplementary Figure S4.      19 and 20); value shown is ddCt relative to HMBS and normalized to WT; n = 2 or 3 independent replicates, ns = not significant, *P < 0.05, **P < 0.01, ***P < 0.001 by Student's t-test compared to WT. (C) Western blot of protein extracted from UD WT, polyKO, and 4-2KO NKX2.5→EGFP hESCs probed with a BIN1 antibody that recognizes all isoforms (green) and GAPDH antibody (magenta); three independent biological replicates are shown, with mean protein levels relative to WT and normalized to GAPDH shown below ± standard deviation from the mean (ns denotes not significant by Student's t-test). during cardiac differentiation. In the absence of QKI, the level is lower in UD hESCs, but slightly higher than WT in day 8 QKI-KO cardiomyocyte cells ( Supplementary Figure S5B). Its protein levels in WT cells also increase during cardiac differentiation: a three-fold increase is observed by both western blot (Supplementary Figure S5C) and dataindependent acquisition (DIA) LC-MS/MS (Supplementary Figure S5D and Supplementary Table S7) when comparing UD hESCs to d8 cardiomyocyte cells. In QKI-KO cells, however, BIN1 protein levels are significantly lower in both CM (*P < 0.05 by Student's t-test) and cardiomyocyte cells (***P < 0.001 by Student's t-test; Supplementary Figure S5C). Moreover, the use of high-content imaging supported these findings when BIN1 protein was measured relative to GAPDH per well (Supplementary Figure S5E) and per cell in WT and QKI-KO day 8 cardiomyocytes (Supplementary Figure S5F). Therefore, during cardiac differentiation we observe complex changes in BIN1 expression in the absence of QKI.
To specifically determine how QKI regulates BIN1 transcript maturation during cardiac differentiation, we measured RNA and protein abundance, and splicing of exon 7 and the region encompassing exons 12 through 18 in WT and QKI-KO UD hESCs, CM cells, and day 8 cardiomyocyte cells. In UD QKI-KO hESCs, BIN1 mRNA is significantly reduced (upper bound *P < 0.05 by Student's ttest, see Figure 5B). Its protein levels do not significantly change, but there is a clear shift to a prominent band of a lower molecular weight ( Figure 5C and Supplementary Figure S5C). It is unknown if this represents an isoform change or gain/loss of a post-translational modification. As measured previously ( Figures 3B and 4G), exon 7 is significantly more included in the absence of QKI (****P < 0.0001 by Student's t-test; Figure 5D), and we observed more inclusion of exon 17 in the 4-2KO clonal cell line ( Figure 5E and Supplementary Figure S5G). We conclude that QKI is required for maintaining BIN1 transcript levels and exon 7 skipping in UD hESCs.
In QKI-KO cells subject to CM differentiation, there is a slight increase in BIN1 mRNA compared to WT (Figure 5F), but a significant reduction in its protein levels (*P < 0.05 by Student's t-test; Figure 5G). We also observe more inclusion of exon 7 in QKI-KO CM cells (Figure 5H), and a more dramatic change in its downstream exons: the CLAP domain exons and exons 13 + 17 are included more in KO, and exon 17 is skipped more ( Figure 5I and Supplementary Figure S5G). Therefore, in CM cells, QKI is required for BIN1 protein accumulation and proper splicing, but total BIN1 mRNA levels are unchanged in the absence of QKI.
We measured the 5 end of BIN1 mRNA and found a significant increase in its level (*P < 0.05 by Student's ttest) in QKI-KO cells subject to day 8 cardiomyocyte differentiation, but this was not apparent when we measured the 3 end ( Figure 5J). In the absence of QKI, we observed large magnitude reductions in BIN1 protein levels by western blotting (***P < 0.001 by Student's t-test; Figure 5K). Consistent with the above results ( Figure 4G), exon 7 is included more in the absence of QKI ( Figure 5L). Finally, the downstream alternative exons are more impacted in QKI-KO cardiomyocyte cells than in UD or CM cells: inclu-sion of CLAP domain-encoding exons increased, inclusion of exons 13 + 17 are significantly higher (***P < 0.001 or ****P < 0.0001 by Student's t-test for polyKO or 4-2KO, respectively) with a concomitant increase in exon 17 skipping and the 'skip all' isoform ( Figure 5M and Supplementary Figure S5G). It is unlikely that mis-processing of BIN1 alone is sufficient for the early human cardiomyocyte defects we observed in QKI-KO cells (Figure 4), since QKI loss results in over 1000 transcript-level and 461 splicing changes in QKI-KO day 15 cardiomyocytes (73). However, our data do indicate a loss of 'muscle-specific' (81) and a gain of 'neuron-specific' BIN1 isoforms (82)(83)(84)(85), in the absence of QKI (see also Discussion). Taken together, we conclude that QKI is required for cardiac muscle-specific BIN1 transcript processing and protein accumulation.

QKI promotes BIN1 exon 7 skipping and nuclear export
To explore the potential molecular mechanisms through which QKI regulates BIN1 exon 7 splicing, we generated a splicing reporter containing this exon, along with 126 bp of upstream and 184 bp of downstream intron sequence (41). Transfection of this reporter into HuH7 WT or QKI-KO cells (39,86) and analysis by RT-PCR showed a modest increase in exon 7 inclusion in the absence of QKI ( Figure  6A), consistent with QKI promoting skipping of this exon. Strikingly, the largest magnitude changes observed were an increase in unspliced reporter RNA and a decrease in exon 7 skipping ( Figure 6A). Transfection of HuH7 QKI-KO cells with myc:Qk5 fully repressed inclusion, and partially rescued unspliced RNA accumulation, despite its expression level being below that observed in WT HuH7 cells (Figure 6A and B). Interestingly, adding back an RNA-binding mutant version of Qk5 (K120A/R124A (87)) did not affect reporter splicing by a large magnitude, compared to vectortransfected control QKI-KO HuH7 cells ( Figure 6A and B). We conclude that QKI is necessary and sufficient to promote BIN1 exon 7 skipping and reduce accumulation of unspliced reporter RNA.
Closer inspection of BIN1 exon 7 and surrounding sequences revealed a potential exonic QKI motif ACUAAC and putative 'half-site' in the downstream intron CUAAC ( Figure 6C). Analysis of iCLIP reads from C2C12 myoblasts using the panQk antibody (33) indicated direct binding to this region (Supplementary Figure S6A). We hypothesized that these motifs are required for proper splicing, and tested this by generating exonic or intronic deletion or substitution mutants, and double mutants to ablate the respective motifs in the reporter then measured their splicing in WT HuH7 cells ( Figure 6C). Overall, deletion or substitution of the exonic motif resulted in more inclusion of exon 7, unspliced RNA, and mis-spliced RNA; the intronic mutants alone affected little to no, or very small magnitude changes ( Figure 6C and Supplementary Figure S6B). We next asked whether QKI levels impact both exon 7 skipping and intron retention of endogenous BIN1 transcripts using QKI-KO hESCs subjected to 8 day cardiomyocyte differentiation. We measured total BIN1 RNA levels (exon 6 or exon 8 internal primers), intron 6, exon 7 and intron 7 inclusion by RT-qPCR (see schematic and figure legend for more information; Figure 6D). Importantly, the concomi- , under each experimental condition described (error bars denote standard deviation from the mean; ns denotes not significant, *P < 0.05, **P < 0.01; note that in the absence of minus reverse transcription controls we cannot exclude the possibility that reporter DNA might have been amplified and co-migrate with 'unspliced'). (C) Overview graphic of pDUP-BIN1 exon 7 splicing reporter (top) with WT and mutant sequences shown in inset below. Representative RT-PCR and BioAnalyzer gel-like image (bottom) showing quantification of WT or mutant splicing reporter products derived from RNA extracts from WT HuH7 cells transfected in triplicate; the mean percentage observed of each product(s) is shown below the gel-like image represented by intensity plot (*P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 by Student's t test; see also Supplementary Figure S6B for mean values ± standard deviation and minus reverse transcription controls). (D) RT-qPCR of RNA extracted from d8 c.myo NKX2.5→EGFP WT, polyKO, or 4-2KO cells measuring (from left to right) total exon 6 ('total'; primers internal to exon 6 including common exon 6 forward), intron 6 included ('int6+'; common forward primer in exon 6 with reverse primer in intron 6), exon 6-7 spliced ('ex7+'; common forward primer in exon 6 with reverse primer spanning exon 6:7 junction), exon 6-8 spliced ('ex7-'; common forward primer in exon 6 with reverse primer spanning exon 6:8 junction), exon 6-8 spliced ('ex7-'; forward primer spanning exon 6:8 junction with common reverse primer in exon 8) exon 7-8 spliced ('ex7+'; forward primer spanning exon 7:8 junction with common reverse primer in exon 8), intron 7 included (int7+; forward primer in intron 7 with the common reverse primer in exon 8), and total exon 8 levels ('total'; forward primer in exon 8 with the common reverse primer in exon 8) from the endogenous BIN1 transcript; each value is shown normalized to HMBS mRNA and relative to WT; graphic below bars indicates approximate location of forward and reverse primer pairs; n = 3 biological replicates, ns = not significant, *P < 0.05, **P < 0.01, ***P < 0.001, relative to WT by Student's t-test). (E) RT-qPCR of RNA extracted from d8 c.myo NKX2.5→EGFP WT or polyKO cells' cytoplasmic (cyto), nucleoplasmic (nucl), or chromatin (chr) fractions, measuring XIST (top) or 7SK (bottom) RNAs. Their levels are shown normalized to HMBS mRNA and relative to total (unfractionated) cell extract RNA levels (**P < 0.01, ***P < 0.001, ****P < 0.0001 by Student's t test). (F) Western blot of protein extracted from subcellular fractions of WT (left) or polyKO (right) d8 c.myo NKX2.5→EGFP cells as described above and probed with anti-GAPDH (magenta) and anti-histone H3 (green) antibodies. (G) RT-qPCR of RNA extracted from subcellular fractions of WT or polyKO d8 c.myo NKX2.5→EGFP cells as described above and measuring HMBS (left) or BIN1 (right) RNA, shown as the percentage of RNA observed by compartment, relative to total RNA from unfractionated extracts; ns denotes not significant, *P < 0.05, **P < 0.01, ***P < 0.001 by Student's t test.
tant increases in both exon and intron 7 inclusion observed in the reporter assays were also apparent in endogenous BIN1 transcripts in the absence of QKI ( Figure 6D). Therefore, QKI and the exonic motif in BIN1 exon 7 promote exon skipping and intron removal. We hypothesized that un-or mis-processed BIN1 RNA might be retained on chromatin in the absence of QKI, which could explain the increase in total RNA accumulation but the observed sharp reduction in BIN1 protein. To test this, we subjected WT or QKI-KO hESCs to an 8 day cardiomyocyte differentiation protocol, performed subcellular fractionation, and measured protein and RNA levels. The chromatin-associated lncRNA XIST was significantly enriched in the chromatin fraction (****P < 0.0001 or **P < 0.01 by Student's t-test), and the nucleoplasmic RNA 7SK was enriched in the nucleoplasmic fraction (Figure 6E). Western blotting analysis indicated the presence of GAPDH within the cytoplasmic fraction, and histone H3 in the chromatin fraction ( Figure 6F), therefore fractionspecific components are enriched in their cognate compartments. Next, we measured a transcript not predicted to be directly regulated by QKI (HMBS) in WT and KO cells, and observed no significant difference between their distributions ( Figure 6G). In contrast, total BIN1 RNA is significantly enriched in the chromatin fraction and depleted from the nucleoplasmic and cytoplasmic fractions (upper bound is P ≤ 0.05 by Student's t-test; Figure 6G). These observations support a model in which QKI promotes the processing of BIN1 transcripts to facilitate chromatin release and subsequent nucleocytoplasmic export, thereby facilitating BIN1-dependent functions during cardiogenesis.

Studies in model organisms indicate
Quaking is required for embryonic development (26,88,89), and particularly so in mesoderm/muscle formation (89)(90)(91)(92)(93)(94) and neural cell specification (88,95). However, its roles at the earliest stages of lineage specification and cardiomyocyte formation had not previously been investigated. Here, we uncovered that this RBP significantly increases or decreases during the exit from pluripotency and commitment to either mesoderm or endoderm, respectively. We also found extensive evidence of QKI direct binding to, and motif enrichment in and around, cardiac lineage-specific alternatively spliced exons. Additionally, we observed many differentiation-and lineage-specific changes in the levels of other RBPs, and certainly there are important regulatory interactions between some of these and QKI. However, given the significant representation of QKI described above, we propose that it is a mesoderm-enriched splicing regulator that critically shapes the transcriptome during lineage-commitment.
We also find that QKI is required for efficient early cardiomyocyte differentiation, and regulates multiple exons in genes linked to cardiac tissue development (78,96). Among these exons we focused on the QKI-dependent repression of exon 7 of BIN1. Interestingly, our data suggest that QKI regulates BIN1 splicing through a set of unique molecular mechanisms, in which QKI binds to an exonic sequence motif to coordinately promote skipping of exon 7, removal of intron 7, and then ligation of exons 6 and 8, which licenses nuclear BIN1 mRNA export; these regulatory events are required for BIN1 protein accumulation in hESC-derived cardiomyocytes ( Figures 5 and 6). An exonic ACUAA motif is critical for exon 7 skipping, downstream intron removal, and proper 3 splice site usage in our splicing reporter (Figure 6), suggesting that characterizing this motif simply as a 'splicing silencer' is inadequate. QKI and this exonic motif cooperate to ensure proper quantitative and qualitative expression of BIN1 in human cardiomyocytes, which is required for their differentiation and homeostasis. We do note it is unlikely that rescuing BIN1 expression alone in QKI-KO cells would be sufficient to restore cardiomyocyte differentiation, as QKI regulates thousands of transcripts during this process (73). Furthermore the Bin1 knockdown cardiac phenotype is more nuanced (80) than the one we have observed here, in qki -/- (89) and in other qk mutant mice (97). However, these BIN1-specific regulatory events are critically important: inclusion of BIN1 exon 7 in skeletal muscle is associated with centronuclear myopathy in myotonic dystrophy patients (98), and functions in controlling endocytosis via promoting an interaction with Dnm2 in neural cells, where it is positively regulated by SRRM4 (82). In post mitotic neurons, QKI is absent (95). Other BIN1 exons, 13, 13-16 (CLAP domain) and exon 17, included in neurons but skipped in cardiomyocytes, are also repressed by QKI ( Figure 5), suggesting that cardiac versus neuronal BIN1 splicing patterns are determined by QKI. Dysregulation of total BIN1 levels and splicing of these exons in the brain is correlated with Alzheimer's disease (99,100). Therefore, the tight regulation of BIN1 levels and isoform output is required for proper tissue homeostasis and differentiation in cardiac muscle, skeletal muscle and the brain, and we establish that QKI directly regulates these via RNA splicing.
The gene expression process is highly regulated, and subject to dynamic on/off states as well as finely tuned rheostat-like adjustments to properly respond to various inputs (101)(102)(103). Extensive mapping of chromatin/histone and promoter states, and transcription factor and RNA polymerase II occupancy during commitment of hESCs to each of the three germ layers has identified both unique and shared states between these different progenitor cell types (11,13). Consistent with previous findings (13), we observe quantitative changes in gene expression that are shared between the three germ layer derivatives (Supplementary Figure S1D). This implies that most transcriptional on/off states are developmentally regulated, and are not lineage-specific. In turn, these are targets of alternative splicing, which can have profound effects on the ultimate gene product repertoire. Interestingly, the splicing patterns of DE and CM cells are uniquely different from hESCs, and even more different from one another (Figure 1). Of particular interest is that several exons in genes coding for transcription/chromatin modifying factors that promote differentiation are markedly different between DE and CM cells. These resemble examples of splicing level regulation, the products of which can profoundly alter the transcriptional response, either directly (i.e. splicing of the transcription factor FOXP1, which can control transitions between pluripotency and differentiation (60)) or indirectly by altering chromatin states (i.e. splicing of the DNA methylase DNMT3B, which can control DNA methyltransferase ac-