Global proteomics dataset of miR-126 overexpression in acute myeloid leukemia

A deep proteomics analysis was conducted on a primary acute myeloid leukemia culture system to identify potential protein targets regulated by miR-126. Leukemia cells were transduced either with an empty control lentivirus or one containing the sequence for miR-126, and resulting cells were analyzed using ultra-high performance liquid chromatography (UHPLC) coupled with high resolution mass spectrometry. The mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD001994. The proteomics data and statistical analysis described in this article is associated with a research article, “miR-126 regulates distinct self-renewal outcomes in normal and malignant hematopoietic stem cells” (Lechman et al., 2016) [1], and serves as a resource for researchers working in the field of microRNAs and their regulation of protein levels.


a b s t r a c t
A deep proteomics analysis was conducted on a primary acute myeloid leukemia culture system to identify potential protein targets regulated by miR-126. Leukemia cells were transduced either with an empty control lentivirus or one containing the sequence for miR-126, and resulting cells were analyzed using ultra-high performance liquid chromatography (UHPLC) coupled with high resolution mass spectrometry. The mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD001994. The proteomics data and statistical analysis described in this article is associated with a research article, "miR-126 regulates distinct self-renewal outcomes in normal and malignant hematopoietic stem cells" (Lechman et al., 2016) [1], and serves as a resource for researchers working in the field of microRNAs and their regulation of protein levels. &

Value of the data
First global proteomics dataset of miR-126 overexpression in the context of primary human leukemic cells.
Enforced expression data sheds first light on miR-126 driven protein regulation for use by leukemia researchers.
Targets highlighted by proteomics data provide the community with candidates for proteins under (direct) control of miR-126.

Data
The dataset described in this article embodies the first global proteomics dataset investigating the biological impact of miR-126 enforced expression in human AML cells. The data files shared here provide the computational workflow that was applied to filter the data in Perseus [2], and to determine significantly regulated proteins using Limma [6]. Furthermore, the experimental workflow and an overview of the technical and biological reproducibility of the analyses are presented.

Experimental design, materials and methods
To assess the protein-level regulation of direct targets of miR-126, we conducted a proteomics analysis to compare AML cells transduced with either a miR-126 overexpression (126OE) or control (CTRL) vector ( Fig. 1A and B). A primary AML culture system, 8227 (described in [1]), was subjected to viral transduction and cells were subsequently analyzed for their global protein expression levels using mass spectrometry. Deep proteome coverage was obtained through the use of SCX fractionation, and protein quantitation was conducted using a label-free quantitation (LFQ) approach [3].
Two weeks postviral transduction, three biologically independent sets of 8227 cells transduced with either 126OE and CTRL vectors (also containing the mOrange gene to enable detection of transduced cells) were flow sorted for mOrange þ cells, counted and subjected to sample preparation as described in [1]. Briefly, cells were lysed, boiled at 95°C and sonicated, to subsequently be digested in a 2-step digestion protocol with Lysyl Endopeptidase C (MS grade, Wako) and Trypsin (MS grade, Promega). Resulting peptide samples were simultaneously desalted and fractionated using Strong Cation Exchange StageTips (2251, Empore 3M) packed in-house [4]. Five fractions were eluted using 50, 75, 125, 200 and 300 mM ammonium acetate in 20% Acetonitrile, 0.5% formic acid respectively, and the final fraction was eluted using 5% ammonium hydroxide, 80% Acetonitrile. After concentrating the samples in an Eppendorf Speedvac, the eluted fractions were re-constituted in 1% TFA, 2% Acetonitrile for Mass Spectrometry (MS) analysis.

Mass spectrometry acquisition
Each SCX fraction was analyzed on an Orbitrap Fusion (Thermo Fisher Scientific), connected to a Thermo EasyLC 1000 UHPLC system in a single-column setup, and peptides were eluted over a 140 min gradient on a 50 cm C18 reverse-phase analytical column (Thermo Fisher EasySpray ES803). Detailed MS settings are described in [1], and mass spectrometry performance was monitored for consistency throughout the analysis of standard QC samples generated from complex HEK293T lysates. Each sample was run in technical duplicate, and the reproducibility of the analyses is depicted in Fig. 2. All raw files were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD001994 [5].

Label-free quantitative proteomics analysis
MaxQuant version 1.5.2.8 [3] was used to analyze the resulting .raw files and generate the labelfree quantitation (LFQ) values. A minimum of 3 unique peptides per protein was required, and Oxidation (M), Acetyl (protein N-term), Gln-4 pyro-Glu and Glu-4pyro-Glu were set as variable modifications. False discovery rate was kept constant at 1%, and "match between runs" was enabled.
The resulting table, containing all identified proteins and LFQ values was processed in Perseus (version 1.5.0.9, workflow attached in Supplementary materials) [2]. After removing contaminants and reverse hits, 8848 proteins remained, of which 4837 proteins were quantified in all samples. Protein ratios for each biological replicate were calculated, and this final table was processed in Limma (R Statistical Framework [6]) to determine those proteins that are significantly regulated according to the moderated t-test. Limma input, the R script and results are attached in this manuscript, and the final results used for downstream analysis can be found as Table S4 in [1].