Dissecting the heterogeneity of the alternative polyadenylation profiles in triple-negative breast cancers

Background: Triple-negative breast cancer (TNBC) is an aggressive malignancy with high heterogeneity. However, the alternative polyadenylation (APA) profiles of TNBC remain unknown. Here, we aimed to define the characteristics of the APA events at post-transcription level among TNBCs. Methods: Using transcriptome microarray data, we analyzed APA profiles of 165 TNBC samples and 33 paired normal tissues. A pooled short hairpin RNA screen targeting 23 core cleavage and polyadenylation (C/P) genes was used to identify key C/P factors. Results: We established an unconventional APA subtyping system composed of four stable subtypes: 1) luminal androgen receptor (LAR), 2) mesenchymal-like immune-activated (MLIA), 3) basal-like (BL), 4) suppressed (S) subtypes. Patients in the S subtype had the worst disease-free survival comparing to other patients (log-rank p = 0.021). Enriched clinically actionable pathways and putative therapeutic APA events were analyzed among each APA subtype. Furthermore, CPSF1 and PABPN1 were identified as the master C/P factors in regulating APA events and TNBC proliferation. The depletion of CPSF1 or PABPN1 weakened cell proliferation, enhanced apoptosis, resulted in cell cycle redistribution and a reversion of APA events of genes associated with tumorigenesis, proliferation, metastasis and chemosensitivity in breast cancer. Conclusions: Our findings advance the understanding of tumor heterogeneity regulation in APA and yield new insights into therapeutic target identification in TNBC.


2
To investigate interactions between tandem 3′UTRs and mRNAs, we constructed coexpression networks [5]. We screened for differentially expressed tandem 3′UTRs and mRNAs among subtypes. We computed the Pearson correlation and chose pairs (only 3′UTR-mRNA) with significant correlation to build the network (adjusted p < 0.05). Only those with Pearson correlation coefficient γ > 0.75 or γ < −0.75 are represented. The co-expression networks were visualized using Cytoscape 2.8.2 [6]. The Bioconductor package 'limma' [7] was used to examine differential expression of co-expressed mRNAs. μg/ml puromycin. After cells were selected for 72 hours with puromycin, a minimum of 2 × 10 7 cells were harvested (Day 0) and the remaining cells were split into triplicate flasks (at least 2 × 10 7 cells per flask) and cultured for an additional 7, 14 and 21 days before genomic DNA extraction and analysis, respectively. Cells were collected to obtain genomic DNA. The shRNAs encoded in the genomic DNA were amplified, and adaptors with indexes for deep sequencing were incorporated into PCR primers. Sample quantification was performed on a Qubit fluorometer (Life Technologies) to ensure that samples were pooled at the same quantity. Deep sequencing was performed using the 3 MiSeq Personal Sequencer (Illumina). shRNA barcodes were retrieved and deconvoluted from each sequencing read. Then, the number of reads for each unique shRNA for a given sample was normalized as follows: For each gene G with k barcodes (shRNA), each with average shRNA count c in the Day 0 group and d in the Day 7, 14 or 21 group, an enrichment score (ES) was computed as the second lowest ranked value of Subsequently, the p-value for each gene G was computed on the basis of ES as and s(k) was the second lowest ranked value of k randomly chosen values from all barcodes in all genes and N was the number of permutation trials performed. N was set at 10,000.

Cell culture
The human breast cancer cell lines MDA-MB-231 and MDA-MB-468 and the human embryonic kidney cell line HEK293T were obtained from the Shanghai Cell Bank Type Culture Collection Committee (CBTCCC, Shanghai, China) and cultured in DMEM with 10% fetal bovine serum (FBS). The identities of the cell lines were confirmed by the CBTCCC using DNA profiling (short tandem repeat, STR). All cell lines were maintained in our laboratory and subjected to routine quality control (e.g., mycoplasma, morphology) by HD Biosciences every 3 months. Cells were 4 passaged for less than 6 months.

Stable cell line construction
pLKO.1 lentiviral plasmids encoding shRNAs targeting CPSF1 and PABPN1 and nontargeting control were used. We used the following shRNA sequences: CPSF1 shRNA1: with puromycin (1 μg/ml) for subsequent use. Untreated cells were used as "mock" to provide a reference for the treated cell.

Western blotting
Whole-cell extracts were obtained using SDS lysis buffer (Beyotime) with protease and phosphatase inhibitors (Bimake). The cell lysates were boiled in 5× SDS-PAGE loading buffer for 5 minutes.

Cell proliferation
The cells of interest (2 × 10 3 cells per well) were seeded in 96-well, clear-bottomed plates with 100 μl of complete culture medium for 7 days. The IncuCyte ® ZOOM Live-Cell Analysis System (Essen) was used to monitor proliferation and determine cell confluence.

Apoptosis assay and cell cycle analyses
Cell apoptosis was assessed using the FITC Annexin V Apoptosis Detection Kit I (BD Pharmingen) followed by flow cytometry (FACStation, BD Biosciences) according to the manual. For the cell cycle assay, cells were stained with propidium iodide (Beyotime) and analyzed by flow cytometry as described [8].

RNA-seq data analysis
Total RNA was purified using TRIzol reagent (Invitrogen). RNA integrity was evaluated using Agilent 2100 Bioanalyzer. Samples with an RNA Integrity Number (RIN) greater than 9 were used 6 for cDNA library construction. RNA-seq libraries were constructed using the TruSeq Stranded mRNA LTSample Prep Kit (Illumina) according to the manufacturer's instructions. The paired-end reads with 150 nt at each end, sequenced using the Illumina HiSeq X-Ten platform, were aligned to the human genome (hg19) using HISAT2 [9]. The fragments per kilobase of transcript sequence per million mapped paired-end reads (FPKM) value of each gene was calculated using cufflinks [10].
Differentially expressed genes (DEGs) were computed using the Bioconductor package 'DESeq' (version 3.8). P-value < 0.05 and fold change > 2 or < 0.5 were set as the threshold for significantly different expression. KEGG pathway analysis [11][12][13] of DEGs was performed using R based on hypergeometric distribution.

Statistical analysis
The Pearson χ 2 test was used to compare qualitative variables, and Fisher's exact test was performed when necessary. We used linear models for microarray and RNA-seq data (LIMMA) [17,18]  Cox analyses. Median follow-up was estimated using the reverse Kalplan-Meier method [19].
We constructed a linear regression to examine the correlation between the expression of core cleavage and polyadenylation (C/P) factors and SUI for each transcript. In this model, the SUI was employed as a response variable, whereas the expression levels of core C/P factors were considered as predictors. We conducted model selection based on the Akaike Information Criterion (AIC), for which the maximal model is Bayesian change point analysis of (A) FOXA1, (B) EIF4A2, (C) ERBB2 and (D) TEKT4. 11

Figure S3
Flow chart of tandem 3′UTR generation.

Figure S9
The enrichment analysis of genes with shortening tandem 3'UTRs in the Suppressed (S) subtype.   The heatmap of gene expression fold change of core 3′ processing factors across alternative polyadenylation (APA) subtype. Each rectangle represents the mean log 2 fold change between cancer and paired normal tissue of one factor in one APA subtype. A factor is considered differentially expressed if the false-discovery rate from 'limma' [17,18]