PABPN1 loss-of-function causes APA-shift in oculopharyngeal muscular dystrophy

Summary Alternative polyadenylation (APA) at the 3′ UTR of transcripts contributes to the cell transcriptome. APA is suppressed by the nuclear RNA-binding protein PABPN1. Aging-associated reduced PABPN1 levels in skeletal muscles lead to muscle wasting. Muscle weakness in oculopharyngeal muscular dystrophy (OPMD) is caused by short alanine expansion in PABPN1 exon1. The expanded PABPN1 forms nuclear aggregates, an OPMD hallmark. Whether the expanded PABPN1 affects APA and how it contributes to muscle pathology is unresolved. To investigate these questions, we developed a procedure including RNA library preparation and a simple pipeline calculating the APA-shift ratio as a readout for PABPN1 activity. Comparing APA-shift results to previously published PAS utilization and APA-shift results, we validated this procedure. The procedure was then applied on the OPMD cell model and on RNA from OPMD muscles. APA-shift was genome-wide in the mouse OPMD model, primarily affecting muscle transcripts. In OPMD individuals, APA-shift was enriched with muscle transcripts. In an OPMD cell model APA-shift was not significant. APA-shift correlated with reduced expression levels of a subset of PABPN1 isoforms, whereas the expression of the expanded PABPN1 did not correlate with APA-shift. PABPN1 activity is not affected by the expression of expanded PABPN1, but rather by reduced PABPN1 expression levels. In muscles, PABPN1 activity initially affects muscle transcripts. We suggest that muscle weakness in OPMD is caused by PABPN1 loss-of-function leading to APA-shift that primarily affects in muscle transcripts.


Introduction
The landscape of mRNAs in eukaryote cells is highly affected by the 3 0 UTR of transcripts.The length of the 3 0 UTR affects multiple layers of mRNA processing, including nuclear export, translation efficiency, and stability or decay. 1,2The 3 0 UTR length is determined by cleavage of the polyadenylation site (PAS); a long 3 0 UTR is generated from a distal PAS, whereas utilization of proximal PAS generates the short isoform.Approximately 70% of genes have two or more PASs at the 3 0 UTR. 3 PASs can be discriminated between canonical and alternative PASs (APAs), and from the latter, alternative transcripts are formed. 3,4In general, longer 3 0 UTR transcripts have higher translation efficiency, 5 whereas short 3 0 UTR have reduced abundance in the heavy polysomes and reduced translation efficiency. 6APA generates alternative transcripts that augment the cell transcriptome and protein landscape.
APA is regulated by RNA processing and RNA-binding proteins. 7,8Some are ubiquitously expressed and others have a cell-or tissue-specific expression. 9,10Consequently, alternative transcripts generated from APA can show a celltype-specific pattern and can affect biological processes in health and disease. 7,11mong the RNA-binding proteins regulating APA site utilization, the polyadenosine (poly(A)) RNA-binding protein nuclear1 (PABPN1, OMIM: 3602279) suppresses the APA.PABPN1 depletion causes genome-wide APA site utilization and shorter transcripts. 3,12These shorter transcripts have a lower abundance in polyribosomes, affecting translation efficiency. 6ABPN1 is a vital protein and is ubiquitously expressed in all cells.PABPN1 levels are reduced during normal aging in skeletal muscles. 13Reduced PABPN1 levels were also reported in oculopharyngeal muscular dystrophy (OPMD), an adult-onset autosomal dominant myopathy.OPMD is caused by a trinucleotide alanine expansion mutation in the first exon of the gene encoding PABPN1. 14The expanded PABPN1 forms nuclear aggregates and entraps other nuclear proteins and mRNAs, a histopathological mark of OPMD. 15A mouse model overexpressing the expanded PABPN1 in the skeletal muscles exhibits muscle wasting, 16 concomitant with genome-wide APA alterations. 3,12This mouse model was instrumental to demonstrate that PABPN1 aggregation leads to muscle pathology concurrently with APA.So far, genome-wide APA has not been reported in OPMD individuals, and it is unclear to what extent PABPN1 activity contributes to OPMD pathology.
To address this question, we developed a wet lab and computational procedure reporting changes in 3 0 UTR length of mRNA isoforms called APA-shift.We show that this method is highly robust and specifically beneficial to detect APA-shift in muscle transcripts.Applying the

Mouse muscle cells
Stable cells expressing Ala10 or Ala17 PABPN1 under muscle actin promoter were reported. 17Bulk RNA was extracted from fused cultures as detailed by Raz et al. 17

Human
Vastus lateralis muscle was collected from control and OPMD subjects, genetically confirmed, and previously detailed. 18OPMD individuals were recruited via the Dutch neuromuscular database (Computer Registry of All Myopathies and Polyneuropathies).Twelve controls and 10 OPMD subjects were included in this study.Four controls were excluded due to poor sequencing quality.Age range and gender of OPMD and controls subjects are listed in Table S1.
All of the RNAs were stored at À80 C and RNA integrity was determined before cDNA library preparation.The samples and models that were used for RNA sequencing (RNA-seq) and downstream analysis are listed in Table S2A.

Library preparation and RNA-seq
The generation of cDNA for the RNA-seq data was carried out with the mcSCRB-seq protocol, 19 with the following modifications to the workflow: instead of single cells, we used 1-100 ng isolated total RNA, 7.5% PEG8000 was replaced with 0.5% polyvinylpyrrolidone, and the oligo-dT primer was replaced with the SCIFI_ LIG384_001 primer (Table S2C, also described in Datlinger et al. 20 ).
All samples were barcoded during reverse transcription using a well barcode (Table S2C).After reverse transcription, the fulllength cDNA was either unamplified (1C) or underwent 7 cycles of amplification (7C) using 23 Kapa HiFi HotStart mix (Roche).cDNA was then pooled, and sequencing libraries were generated using the Kapa Hyper Plus kit (Roche) using a standard protocol.Because the 1C libraries were not preamplified, we added more cycles, namely 14 cycles, and the p7 primer was replaced with i7 Illumina barcode per pool.After an additional double solid-phase reversible immobilization size selection, the Illumina library was quantified using Qubit and checked on an Agilent Lab-on-a-Chip for the size distribution.Illumina paired-end sequencing of the library was done on a NovaSeq 6000 using paired-end 150-bp sequencing and v1.5 chemistry.After sequencing, read 1 will contain the cDNA information and read 2 will contain only the unique molecular identifier (UMI).
For the pipeline to run effortlessly, the UMI in the read 2 FASTQ file was added to the beginning of the same read identifier in the read 1 FASTQ file.The modified read 1 FASTQ file starts with the UMI and is followed by the cDNA sequence, which was used in the RNA-seq pipeline.For the demultiplexing the pooled samples we used the well barcode from the oligodT primer using Ultraplex (https://github.com/ulelab/ultraplex). 21A summary of quality control analysis of all of the samples is found in Table S2B.

RNA-seq analysis and APA-shift calculation
To process the RNA-seq data to establish the APA-shift calculation in both our human and mouse studies, we established a data preprocessing pipeline (Figure S1).All of the mouse and human reads in FASTQ format are first filtered using Cutadapt (version 2.10) to remove all of the remaining adapter sequences.Then, the reads were aligned to the Ensembl transcriptome version 104 using STAR (version 2.7.5a) including UMI-based deduplication using UMI-Tools (version 1.1.1)to generate transcriptome-based alignment in binary alignment map (BAM) format so that we could visualize using the Integrative Genomics Viewer (IGV).By inspecting these alignment files in 3 0 UTR using IGV, we could observe clear enriched signals indicating both proximal and distal APA sites (Figure 1A).Confirmed by the visual inspection in IGV, we used the same Ensembl gene annotation version 104 to create a customized 3 0 UTR annotation gene transfer format (GTF) file for all Ensembl transcripts with annotated 3 0 UTR.In this GTF file, 3 0 UTRs are split in the middle to create a proximal and a distal region.With this APA proximal and distal annotation file and the human and mouse transcriptome-based alignment file as input, we quantified the APA enrichment signal at both proximal and distal regions using featureCounts (version 2.0.1) with option ''-M'' to include multiple mapped reads.
Transcripts were excluded according to the exclusion criteria in Table S2.Raw counts were used to calculate the proximal to distal ratio (P/D) in R, using the following equation: log (1 þ P) -log (1 þ D).APA-shift, in a log scale, was calculated between OPMD and control as P/D [OPMD] À P/D [control].A ratio larger than 0 indicates a shift to proximal and smaller than 0 a shift to distal.Statistical significance for APA-shift was considered with a p < 0.05 corrected for false discovery rate (FDR).Codes for the bioinformatics pipeline are found in https://github.com/lumc-sasc/pub-PABPN1_APA-shift_Raz.
For human data, we incorporated age and gender as covariates in a model matrix.The P/D ratios were corrected on data corrected for age and gender using a linear model in 'limma,' and APA-shift was calculated on the corrected data.A list of transcripts with APA-shift in mouse, cell, and human samples is found in Table S3).Visualization of results was carried out with ggvolcanoR. 22o assess whether splitting the 3 0 UTR in the middle is a robust choice, we examined APA-shift results by extending or shortening the annotated 3 0 UTR.Four different shortening or extension factors were examined.A shortening refers to moving the starting position of 3 0 UTR toward the 3 0 end, whereas extension refers to moving the starting position of 3 0 UTR toward the 5 0 end.The 3 0 end position of a transcript was unchanged.The resulted shortened or extended 3 0 UTR were split half and half, defining the proximal or distal region.Per direction, 10% and 1% moving factors were analyzed.Counts normalized to the mean were used to calculate expression levels.

qRT-PCR
qRT-PCR was conducted on RNA extracted from the control and OPMD muscles.For cDNA synthesis, 500 ng RNA was reverse transcribed using the QuantiTect Reverse Transcription Kit (QIAGEN) and random primers, following the manufacturer's instructions.Subsequently, qPCR amplification was performed with the QuantiNova SYBR Green kit (QIAGEN) using 5 ng RNA, with technical duplicates, using a standard amplification protocol at a melting temperature of 60 C. Samples with Ct values above 35 were excluded from the analysis to eliminate potential noise.The average Ct values from the technical duplicates was used for ddCt calculation.PABPN1 Exon-4-5 (ENSG00000100836) was normalized to hypoxanthine ribosyltransferase, and isoforms 201 (ENST00000216727), 202 (ENST00000397276), and 207 (ENST00000556821) to exon 3-4.Isoforms 001 and 004 were targeted with specific primer sets, designed to cover different exons, thereby amplifying only mRNA molecules.Primer sets were designed with the NCBI Primer design tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/), and the primers sequence is in Table S2D.

Other resources
PAS-seq from A17.1 (5 þ 5) and OPMD (4 OPMD individuals and 4 control subjects) muscles was generated as detailed by deKlerk et al. 3 BAM files of PAS-seq were used to visualize the genomic localization of reads peak.Transcripts with APA site utilization in A17.1 are from Table S2. 3

Statistical and enrichment analyses
Statistical analyses were conducted in R 4.2.3 and GraphPad Prism 9.3.1.The statistical significance of APA-shift was assessed by applying the Benjamini-Hochberg method to adjust the p values for controlling the FDR.
The list of transcripts encoding for muscle proteins (called here muscle transcripts; Table S4) was downloaded from AmiGO2 Gene Ontology (GO) using the GO database (AmiGO 2 -GO Wiki, www.geneontology.org).Gene set enrichment analysis (GSEA) 23 version 4.3.2 was used for gene set enrichment analysis and DAVID 24 version 2023q1 for GO analysis considering the Bonferroni-corrected p value in human and A17.1 mice.
The visualization of reads at the 3 0 UTR was carried out with the IGV version 2.16.0.

APA-shift calculation captures real-time APA dynamics in muscle genes
Aiming to detect APA-shift in OPMD conditions, we developed a computational pipeline that quantifies global changes in PAS utilization at the 3 0 UTR of transcripts from a standard poly(A) library preparation of the 3 0 UTR.We generated libraries from the A17.1 mouse model, which overexpresses the expanded PABPN1 in skeletal muscles, and APA was previously demonstrated by others and us. 3,12RNA-seq from these libraries was used to develop and optimize our pipeline.Since long transcripts are generated from the distal PAS and short transcripts from a proximal PAS at the 3 0 UTR, we defined proximal and distal regions based on a split in the middle of the 3 0 UTR and calculated the ratio between the sum of reads from proximal and distal regions (Figure 1A).The P/D ratio (Figure 1A) and the ratio between OPMD conditions and control resulted in a calculated value we called APA-shift.We considered significant APA-shift with p < 0.05, FDR.Since multiple transcripts could belong to proximal and/ or distal regions, and a split in the middle is an arbitrary choice, we assessed whether moving the 3 0 UTR start point toward 5 0 of the transcript (called extending) or toward 3 0 of the transcript (called shortening) will affect global APA-shift results (Figure 1B).For the shortening by 1% or 10%, and the extension by 1%, after APA-shift recalculation in all 3 0 UTR transcripts, the percentage of transcripts with APA-shift was unchanged (Figure 1B).In contrast, the extension of the 3 0 UTR by 10% included longer non-3 0 UTR sequences in the proximal region resulted in a reduced proportion of transcripts with APA-shift (Figure 1B).This is expected, as our library preparation was focused on short sequences starting from the poly(A) tail.This analysis shows that shifting the 3 0 UTR split point around the middle of the 3 0 UTR did not affect the percentage of transcripts with APA-shift on a transcriptome scale.Therefore, we continued with a split of the middle of the annotated 3 0 UTR.
In the A17.1 most transcripts showed a shift to proximal (Figure 1C), agreeing with enhanced proximal APA in A17.1. 3,12In a previous study, we used a difference protocol, using PAS-seq, to determine APA site utilization in A17.1. 3The overlap between significant transcripts showing APA site utilization and transcripts with APA-shift was highly significant-77.3% of the APA-shift transcripts overlapped with PAS-seq (Figure 1D).We then also compared signal reads at the proximal and distal 3 0 UTR regions in our study to the PAS-seq.We selected two transcripts for which PAS-seq was validated 3,25 : Arih2 (ENSMUST00000013338) has two PAS at the 3 0 UTR, and Arih1 (ENSMUST00000171975) has multiple PAS at the 3 0 UTR (Figure 1E).For both the localization of signal reads at proximal and distal regions was highly similar between the two procedures (Figure 1E).Thus, APA-shift can reproduce results obtained by a PAS-seq sequencing platform.
This confirms that our simple APA-shift protocol is valid to report changes in PAS utilization.We continued with the APA-shift analysis using the option where the 3 0 UTR is divided in the middle into the proximal and distal regions.
To capture real-time changes in APA, we optimized the library preparation protocol and compared the standard protocol using 7 amplification cycles (called here 7C) to a protocol using 1 amplification cycle (called here 1C).As expected, the percentage of mapped reads was higher with the 7C protocol; however, the percentage of reads after UMI-based deduplication was higher with the 1C protocol (Table S2B; 64% 5 8.4%, and 84% 5 4.1%, respectively).
The 1C protocol resulted in a library depth $2-fold smaller than the 7C, but the average transcript size was 13% larger (Figure 2A).A principal-component analysis (PCA) showed similar differences between A17.1 and FVB samples (Figure S2A) in 7C and 1C protocols.However, with the 7C protocol, the average transcript size did not differ between FVB and A17.1, but with the 1C protocol the average transcript size was significantly smaller in A17.1 (Figure 2A).The same as with the 7C (standard) protocol also with the 1C protocol, an APA-shift to proximal was prominent (Figure 2B).Results from permuted half-split by shortening the extension of the 3 0 UTR were similar between 7C and 1C (Figure S3).Since our APA-shift calculation depends on a split of the 3 0 UTR in two, we expect a correlation with the 3 0 UTR length, which was confirmed by a linear regression analysis (Figure S2D).Strikingly, a stronger correlation was found with the 1C (Figure S2D).Thus, the 1C protocol could be more beneficial for APAshift calculation.
To further elucidate the differences between 1C and 7C protocols, we compared APA-shift directions.The proportion of transcripts with a shift to distal was significantly smaller in 1C compared with the 7C protocol (Figure 2C).Moreover, the overlap of transcripts between 1C and 7C was very high; the shift to proximal was 91%, but only 53% for the transcripts with a shift to distal (Figure 2D).To further investigate differences between 1C and 7C protocols, we focused on the prominent shift to proximal transcripts, defined as APA-shift >1 (log2), because some of the shift to distal transcripts could be false positives.The mean APA-shift value and the percentage of a prominent shift to proximal transcripts were higher in 1C compared to 7C (Figure 2E).
Because the A17.1 shows muscle pathology, we investigated whether the APA-shift of muscle transcripts differed between 7C and 1C.The proportion of muscle transcripts with 3 0 UTR was higher in 1C; moreover, muscle transcripts with APA-shift were more abundant in the 1C protocol (Figure 2F).This indicates that the 1C protocol is beneficial for muscle transcripts.All of the subsequent analyses were carried out with the 1C dataset.
Reduced levels of Pabpn1 isoforms correlate with APAshift in A17.1 In A17.1 the expanded PABPN1 cDNA is overexpressed, suggesting that expression of the expanded PABPN1 leads to APA-shift.PABPN1 overexpression was confirmed by analysis of sequencing reads in the last exon (Figure 3A).However, sequencing reads at the 3 0 UTR was slightly lower in the A17.1 (Figure 3A).The last exon is common between the endogenous Pabpn1 and the expanded PABPN1 transgene, but reads from the 3 0 UTR are only from endogenous Pabpn1 because the transgene lacks Pabpn1 3 0 UTR.In mice, Pabpn1 has eight annotated transcript isoforms, but five isoforms were found in the 1C dataset (Table S5).Levels of isoforms 201, 208, and 209 were significantly lower in A17.1 (Figure 3B), but expression levels of isoforms 202 and 207 were higher in A17.1 (Figure 3B; Table S5).The isoforms 201, 202, 207, and 208 encode for a protein containing RNA-binding motifs, but the isoform 208 lacks the first exon, containing the poly(alanine) tract (Figure 3C).The isoform 209 is a noncoding transcript.Considering the differences in expression levels of the five isoforms, we assessed a correlation between each isoform and APA-shift values.A negative correlation between isoforms 201, 208, and 209 expression levels and proximal APA-shift indicated that reduced levels in A17.1 correlate with APA-shift to proximal (Figure 3D).Over 70% of the transcripts with proximal APA-shift showed a significant correlation with isoforms 201 and 208, whereas the correlation with a shift to distal was lower (Figure 3D).In contrast to isoforms 201 and 208, a weaker positive correlation was found between isoform 207 and an APA-shift to proximal (Figure 3C).Isoform 202 showed a low correlation with an APA-shift to proximal (Figure 3D).Interestingly, reduced levels of isoform 209 showed the highest correlation with APA-shift, because over 90% of APA-shift transcripts showed a significant correlation (Figure 3E).The correlation between reduced levels of PABPN1 iso-forms and APA-shift agrees with APA site utilization in PABPN1 knockout cells. 12Together, our studies here suggest that reduced Pabpn1 level is the cause for APA-shift in A17.1.However, with this model we could not resolve the question of whether the expanded PABPN1 regulates APA-shifts the same as the normal PABPN1.

Low expression levels of expanded PABPN1 does not induce APA-shift
To investigate whether expression of the expanded PABPN1 causes APA-shift, we used a previously reported mouse muscle cell model for OPMD. 17In this cell model, the expanded PABPN1 (A17) and the normal allele (A10) were overexpressed at similar levels in differentiated muscle cells. 17PABPN1 overexpression was at low levels, driving only limited aggregation in myonuclei. 17Therefore, this cell model is a relevant model to assess the effect of the expanded PABPN1 on PABPN1 activity, independent of its aggregation.Using the 1C protocol on RNA from A10 and A17, similar expression levels of PABPN1 transgene alleles were found (Figure 4A).In contrast to the A17.prominent shift to proximal was not found in A17 cells (Figure 4B).To validate these results, we repeated the entire procedure using the 7C protocol, which resulted in similar low APA-shift as in C1 (Figure S4).Moreover, with both 1C and 7C protocols, the expression of A17 did not lead to a significant APA-shift (Figure S4).This demonstrates that in the A17 cell model, APA is not activated genome-wide.Interestingly, the expression of the expanded PABPN1 did not cause a reduction in endogenous Pabpn1 expression levels (Figure 4A).Isoforms 201 and 208 showed higher levels in A17, but the fold change was small (1.7fold) (Figure 4C).A correlation between isoforms 201 or 208 expression levels and APA-shift (p < 0.05, no FDR) was minimal, with only 10% of transcripts with APA-shift showing a significant correlation (Figure 4D).In contrast to the results in A17.1, isoform 209 did not correlate with APA-shift (Figure 4D).
Since APA-shift in muscle transcripts was prominent in A17.1 tibialis anterior muscle, we assessed whether in the cell model muscle transcripts were enriched among the APA-shift (p < 0.05, no FDR).The percentage of muscle transcripts in the 1C dataset was similar in mice and muscle cell lines (Figure 4E).However, in the A17 muscle cell culture, the proportion of muscle transcripts with APA-shift was significantly lower than the proportion of muscle transcripts in the entire dataset (Figure 4E).In agreement with little similarities between A17.1 and A17 effect on APA-shift, an overlap between A17 and A17.1 of transcripts with APA-shift was insignificant (Figure 4F).Together, the results here indicate that expression of the expanded PABPN1 does not cause APA-shift.
APA-shift in OPMD is enriched in muscle-associated pathways Thus far, APA as a readout for PABPN1 activity was investigated in OPMD models, but not in OPMD individuals.We applied the 1C protocol with the APA-shift analysis on RNA from vastus lateralis muscle of OPMD (N ¼ 10) and control (N ¼ 8) subjects.To our surprise, significant APAshift (p < 0.05, FDR) was not massive in OPMD, and a prominent shift to proximal was not found (Figure 5A).Visualization of reads peak at the 3 0 UTR, agreeing with genomic localization of reads peak in using human samples PAS-seq or the poly(A) database (Figure S4).A PCA of APA-shift values in A17.1 showed a clear separation between control and OPMD models, whereas in humans and the cell model, a separation based on genotype was not found (Figure S4).Despite the low number of transcripts with a significant APA-shift in OPMD, enrichment analysis showed an exclusive enrichment for myogenesis (Figure 5B), with most of the muscle transcripts showing a shift to proximal (Table S5).The overlap of muscle genes with APA-shift between OPMD and A17.1 was significant, and most of the genes showed an APA-shift to proximal (Figure 5C).The gene list is depicted in Figure 5A.The similarity between OPMD and A17.1 APA-shift was also found in GO enrichment analysis.Similar terms related to muscle pathology were enriched in OPMD and A17.1 (Figure 5D).
In a previous study, we showed reduced PABPN1 levels in OPMD. 13Here, we investigated whether PABPN1 isoforms may affect the APA-shift in OPMD.In humans, the three isoforms 201 (ENST00000216727), 202 (ENST00000397276), and 207 (ENST00000556821) are the most abundant in skeletal muscles (https://www.proteinatlas.org/ and Uhle ´n et al. 26 ).The isoforms 201 and 002 encode for the full-length PABPN1 but differ in UTR sequences (Figure 5E).Isoform 207 lacks the majority of the first exon, including the poly(alanine) expansion (Figure 5E).Consistent with our previous study, also in this study we detected reduced PABPN1 levels in OPMD.Using primers specific to isoform 201 or isoform 207, reduced PABPN1 levels was found for isoform 207 (Figure 5F).This suggests that APA-shift in OPMD is driven by reduced isoform 207 levels.

Discussion
Here, we describe a procedure that assesses APA-shift in an optimized library preparation protocol, and we demonstrate that this procedure robustly reports APA-shift in OPMD.Genome-wide alterations in transcripts expression levels due to APA have been linked to a range of pathological conditions. 8Cell types can be distinguished by patterns of APA marking cell specificity. 10Also in pathological conditions, APA has been found in a wide range of tissues, including different cancer types, [27][28][29][30] heart, 31 and brain 32 cells of the immune response. 33Here, we show APA-shift in skeletal muscles from OPMD individuals.Thus, our study corroborates other studies suggesting that APA is an important regulatory layer in human pathologies.
Identification of APA site utilization requires specific RNA-seq protocols, such as PAS-seq, which were instrumental in distinguishing between the proximal and distal PASs. 3,34Proximal PASs represent APA sites in the A17.1 OPMD mouse model. 3,12However, PAS-seq is costly and hence less suitable for patient material.6][37] Since APA is cell-type or species-specific, 10 application of this procedure in less studied cells, such as muscle cells, could be tricky. 6The C1 protocol in combination with an APA-shift computational pipeline reports PABPN1 function as a quantitative readout.Compared to the standard protocol, in the 1C protocol, RNA dynamics are closer to the in vivo situation.We show that the 1C protocol is more suitable for muscle transcripts and detecting APA-shifts.However, the 1C sequencing depth is limited and not as deep as the standard protocol, and is more dependent on a similar amount and quality of RNA input across samples.The latter is a specific limitation for human tissues that are stored for a long time.We also acknowledge the limitation of the APA-shift pipeline; since it is independent of PAS genomic information it provides an estimation for PAS utilization.However, our results with APA-shift in the mouse OPMD model showed nearly 80% transcript overlap to our previous in-dependent study, demonstrating APA-shift reliability.Despite these limitations, the APA-shift pipeline is simple and can be applied to any organism without prior knowledge of PAS or APA sequence.This pipeline results in a single value for significant APA-shift direction and can be applied to existing RNA-seq data.
The functional consequences of APA at the 3 0 UTR are multilayered: (1) APA resulting in shorter 3 0 UTR leads to higher mRNA stability 3,12 ; (2) transcripts with longer 3 0 UTR seem to retain in the nucleus as compared to transcripts with shorter 3 0 UTR 38 ; (3) translation efficiency of transcripts with APA at the 3 0 UTR is lower compared to transcripts with long 3'UTR.In agreement with higher nuclear export of short transcripts, we recently showed that transcripts with APA at the 3 0 UTR show higher abundance in heavy polysomes. 6lthough APA site utilization is regulated by multiple nuclear RNA-binding proteins, reduced PABPN1 levels have been implicated in APA site utilization in different cancers. 30,39Reduced PABPN1 levels lead to muscle wasting, 40,41 and have been also found in OPMD individuals. 13,42Although genome-wide APA has been previously demonstrated in a mouse model for OPMD that overexpresses the expanded PABPN1, 3,12 in OPMD muscles we found APA-shift mainly in muscle transcripts.The limited APA-shift we found in OPMD could be attributed to the C1 protocol that was not genome-wide in RNA from human muscles.Also, in lung cancer cells, APA was found in a small number of genes regulating the cell cycle. 39Together, it is possible that APA is initially found in a small subset of transcripts that may drive cell damage.In skeletal muscles, the muscle transcripts with APA-shift could contribute to muscle weakness and atrophy.
Our results here show that the expression of the pathogenic, alanine-expanded PABPN1 does not cause APAshift.Instead, reduced expression levels of a subset of PABPN1 isoforms correlate with APA-shift in both the A17.1 mouse model and OPMD.It is possible that (over) expression of the expanded PABPN1 leads to reduced expression of the normal transcript isoforms.It was previously suggested by others and us that PABPN1 aggregation leads reduced levels of the functional PABPN1, 43,44 but the exact molecular mechanism is unknown.In the mouse, the noncoding Pabpn1 isoform 209 was expressed at levels close to the protein-coding isoforms and showed the strongest correlation with APA-shift.Noncoding RNAs can regulate RNA expression levels. 45A model for the PABPN1 self-regulation mechanism has been proposed, in which high PABPN1 protein expression levels lead to reduced expression via mRNA decay. 46The human noncoding PABPN1 isoforms are not abundant in skeletal muscles (https://www.proteinatlas.org).OPMD is mostly considered a gain-offunction disease due to the presence of nuclear aggregates. 47Our study suggests that PABPN1 loss of function also contributes to OPMD.Interestingly, we identify that levels of the isoform lacking the alanine expansion region are reduced in OPMD.The role of the truncated isoform is unknown, but potentially it could open specific therapeutical options for OPMD.

Conclusion
The combination of 1C protocol and APA-shift quantification allowed us to identify and report APA-shift in OPMD individuals.APA-shift in OPMD is predominantly enriched in muscle genes and genes affecting neurodegeneration pathways.The mutated PABPN1 does not affect APA-shift, suggesting that its expression does not compromise PABPN1 activity.Instead, lower expression of PABPN1 isoforms seems to compromise PABPN1 activity that in skeletal muscles muscle transcripts are initially affected.Considering the involvement of APA in a broad range of human pathologies, investigating the potential role of RNA-binding proteins regulating APA in muscle transcripts could offer valuable insights into the molecular mechanisms underlying muscle wasting, not only in OPMD but also in other related pathologies.A. Summary of data sets that were used in this study, transcripts' number that was used for APA-shift calculation and the number of significant APA-shift transcripts in each experiment.
B. RNAseq quality control in 7C and 1C protocols in OPMD mouse and cell line models, and in human muscles (controls and OPMD).
C. Primers used for library preparation.

D. Primers used for RT-qPCR
Table S3: Transcripts with APA-shift in mouse, cell line and human samples.

Figure 1 .
Figure1.APA-shift calculation (A) (Left) IGV from a 3 0 UTR transcript example for the RNA-seq.The split point between proximal and distal regions is denoted with a dashed line.(Right) A summary of major steps in the APA-shift pipeline calculation.(B) Shortening and extending split point 3 0 UTR regions effect on APA-shift.A schematic presentation of 3 0 UTR splitting into proximal and distal regions: in the middle (depicted by dashed black line), or extension toward 5 0 sequences (in green) or shortening toward 3 0 sequences (in red) by 1% and 10% relative to the middle split, are depicted under the dot plot.Dot plot shows the percentage of transcripts with APA-shift.(C) Volcano plot of APA-shift in A17.1 versus FVB mouse using a generic protocol.Mean log fold change and p values are from n ¼ 5 per genotype.Transcripts with a switch to proximal are highlighted in red and a switch to distal in blue.(D) IGV plots of the sum of reads at the 3 0 UTR for 2 example significant genes, Arih2 and Arih1.3 0 UTR from PAS-seq (de Klerk et al.3 ) and APA-seq in FVB and A17.1.The 3 0 UTR is depicted by the black horizontal bar, and the vertical dashed line indicates the 0 position.The average APA-shift fold change for Arih1 ENSMUST00000171975 and Arih2 ENSMUST00000013338 are depicted.(E) Venn diagram shows the overlap of transcripts with significant APA-shift between PAS-and APA-seq.p value for overlap significance was determined with a chi-square test.

Figure 4 .
Figure 4. APA-shift in differentiated (fused) A17 muscle cell model is not associated with PABPN1 levels (A) Bar charts show Pabpn1 levels measured from 1C RNA-seq last axon, representing both endogenous Pabpn1 and the transgenes Ala17 or Ala10 PABPN1.Read counts (CPM) from the 3 0 UTR (sum of reads in all isoforms) represent the endogenous Pabpn1.Statistical difference was assessed with a Student's t test.(B) Volcano plot of APA-shift in A17 versus A10 muscle cells in the 1C RNAseq.Mean log fold change and p values are from N ¼ 3 per cell line.Dashed horizonal lines denote 0.05 or 0.05 FDR p value cutoff.(C) Bar chart shows levels of Pabpn1 transcripts isoform expression levels in A10 and A17 muscle cell model.Read counts are from the 3 0 UTR.Statistical analysis was assessed with ANOVA, and did not show statistical difference.(D) Heatmap of Spearman correlation (r) between APA-shift ratio (log 2) in the APA-shift transcripts (p < 0.05) and Pabpn1 isoforms (Iso) expression levels (log2).Correlation analysis was carried out on transcripts with significant APAshift for 3 Pabpn1 isoforms.The percentage of transcripts with significant correlation (p < 0.05) is depicted at the right of the heatmap.(E) Bar chart shows the percentage of the transcripts encoding for muscle genes in the entire library (all transcripts) and the APA-shift transcripts in the OPMD mouse and cell datasets.Numbers of transcripts are depicted in each bar.Statistical difference was assessed with a chi-square test.(F) Venn diagram shows transcript overlap for APA-shift transcripts in cell and mouse OPMD models.Statistical significance of overlap was assessed with a chi-square test.NS, not significant.

Figure 5 .
Figure 5. APA shift in OPMD muscles (A) Volcano plot of APA-shift in subjects with OPMD versus control's vastus lateralis in 1C RNA-seq.Mean log fold change and p values are from N ¼ 10 and 8 per genotype, respectively.Significant transcripts (p < 0.05 FDR) with a switch to proximal are highlighted in red and a switch to distal in blue.Genes overlapping with A17.1 are highlighted in dark red or dark blue, and gene names are depicted.(B) Enrichment analysis of OPMD APAshift transcripts.Visualization of GSEA enrichment.Myogenesis was found significant, and most transcripts showed positive correlation with OPMD.(C) Venn diagram shows the overlap of genes with significant APA-shift between OPMD and A17.1 in yellow and green, respectively, and among genes with proximal (red) or distal (blue) APA-shift.The overlap significance was assessed with a chi-square test.OPMD APA-shift genes overlapping with A17.1 and enriched in skeletal muscles are depicted.(D) Dot plot visualization of functional term enrichment for the proximal APAshift genes in OPMD and in A17.1.Redundancy was removed using annotation clustering; the most significant term within an enrichment cluster is depicted.Similar terms are marked with the same color.(E) A schematic presentation of PABPN1 isoforms 201, 202, and 207.Exons are depicted in full boxes and introns in dashed lines.The first exon encoding for poly(alanine) is depicted in red, exon encoding for the coil-coil domain is in yellow, and the exons encoding for the RNA-binding domain are in blue.UTR regions are depicted in white.Primer sets that were used for qRT-PCR are denoted with arrows: exon 4-5 in black arrows; isoform 001 in blue; isoform 002 in green, and isoform 004 in orange.(F) Bar chart shows normalized ddCt values of PABPN1 isoforms in control and OPMD.Statistical difference was assessed with ANOVA; *p < 0.05; **p < 0.0005.

Figure S1 :
Figure S1: A flow-chat summary of the bioinformatic steps from RNA-seq to APA-shift calculation 1 mouse model, APA-shift was not genome-wide, and a ****p < 0.00001).(F) Bar chart shows the percentage of muscle genes encoded transcripts in the entire library (all transcripts) and the significant APA-shift transcripts in 7C and 1C protocols.The numbers of muscle transcripts is depicted in the bar.A statistical difference was assessed with a chi-square test and is depicted above the bars.