Transcriptomic data of MCF-7 breast cancer cells treated with G1, a G-protein coupled estrogen receptor (GPER) agonist

Besides short-term non-genomic effects, the G-protein coupled estrogen receptor (GPER) also mediates long-term genomic effects of estrogen. The genomic effects of GPER activation are not completely understood. G1 is a selective GPER agonist, which is popularly used for addressing the effects of GPER activation. Here, we present transcriptomic (RNA-seq) data on MCF-7 cells treated with 100 nM, or 1 µM G1 for a period of 48 h. The data are available from GEO (accession number GSE188706).


Specifications
Biological sciences Specific subject area Transcriptomics Type of data

Value of the Data
• This dataset captures the transcriptomic profiles of MCF-7 cells treated with vehicle, or the GPER agonist G1. • It will contribute to a better understanding of the genomic effects of G1 in breast cancer cells. • Investigators interested in GPER biology can use, and independently analyze, the data for insights into the genomic effects of GPER activation.

Data Description
Total RNA was isolated from MCF-7 cells treated with vehicle (0.1% ethanol), 100 nM G1, or 1 μM G1 for 48 h, and RNA-seq was performed. Quality assessment data for the RNA samples, including RNA concentration, RIN value, and the fragment length distribution, is provided as Supplementary data 1 and 2. The Raw FASTQ files are submitted in GEO with accession number GSE188706. HISAT2 was used to map the generated raw reads. Table 1 shows the summary of the read quality data including reads passed after filtering, reads failed due to low quality, reads failed due to too many Ns, reads with adapter trimmed using fastP, bases trimmed due to adapters using fastP, overall alignment rate using HISAT2 (in %).

Treatment
2 × 10 5 MCF-7 cells were seeded in 35 mm dishes in M1 medium. After 48 h, the spent M1 medium was removed. The monolayer of cells was washed with DPBS, and treated with M1 medium containing 0.1% ethanol (vehicle control), 100 nM G1, or 1 μM G1. After 48 h of treatment, the cells were washed with DPBS before isolation of total RNA.

RNA samples
Cells were lysed in RNA extraction reagent prepared in-house. Lysates from two replicate dishes were pooled for purification of each RNA sample using the method described by Chomczynski and Sacchi [1] with modifications. Quality assessment data for the RNA samples is provided as Supplementary data 1 and 2. The RNA samples were then subjected to library preparation and sequencing on the Illumina plaform.

Library preparation
1 μg of total RNA was used to enrich mRNA using NEB Magnetic mRNA Isolation Kit (NEB, USA). Using fragmentation buffer, the enriched mRNA was fragmented (approximately 200 bp) and reverse transcribed into double-stranded cDNA using random primers. 1.8X Ampure beads (Beckman Coulter, USA) were used to purify the double-stranded cDNA fragments. The purified cDNA was ligated to the adaptor. The adaptor-ligated DNA was purified using Ampure beads, and enriched with specific primers compatible with Illumina platforms. The transcriptome library was constructed with the NEB Ultra II RNA Library Prep Kit (NEB, USA). The final enriched library was purified, quantified by Qubit (Thermo Fisher Scientific, USA), and anlayzed by 2100 Bioanalyzer (Agilent, USA). The library was sequenced with the Illumina NextSeq 500 paired end technology.

Reads quality check and trimming
The raw reads were in FASTQ format. The quality of the reads were assessed using fastqc tool [2] . The adapters, and low quality reads were filtered out from the FASTQ files using fastp tool [3] . Three criteria were set to filter out the low quality reads: removed reads lower than Q30 phred score; discarded reads shorter than 15 bp; Illumina adapter clipping. The fastqc tool was used to re-assess the filtered reads prior to mapping. The FASTQ files after the quality trimming and assessment were used for mapping. The number of read pairs passed after quality trimming and assessment are shown in Table 1 .

Mapping reads against the human genome
The Ensemble Homo sapiens GRCh38 genome was used as reference genome for mapping the clipped reads ( https://asia.ensembl.org/Homo _ sapiens/Info/Index ). Prior to mapping, indexing of reference genome was done using HISAT2 indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index. Subsequently, clean reads were mapped using the HISAT2 tool against the index file [4] . The mapped output files (sam files) were converted into binary files (bam files) using Samtools [5] .

Quantification of mapped reads
The Subread package with the featureCounts tool was used for quantification of mapped reads [6] . Mapped reads were counted at the feature (gene) level with the help of Homo sapiens GRCh38 annotation file (gtf).

Ethics Statements
The work was carried out on a cell line, and did not involve animals or human subjects.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.