A transcriptomic dataset comparing two methods of hepatocyte differentiation from human induced pluripotent stem cells

A variety of methods have been reported for the differentiation of hepatocyte-like cells (HLCs) from human induced pluripotent stem cells (iPSCs) using various growth factors or small molecules. However, direct comparison of the differentiation efficiency and the quality of the final HLCs between different methods has rarely been reported. To fill this data gap, we compared two hepatocyte differentiation methods, termed Method 1 and Method 2, and published the major findings in a research article entitled “Phenotypical, functional and transcriptomic comparison of two modified methods of hepatocyte differentiation from human induced pluripotent stem cells” (Li et al., 2022). The current data article describes the transcriptomic dataset comparing the two methods. HLCs were collected at early maturation (day 17) and late maturation (day 21) stages of the differentiation and total RNA were isolated. Global gene expression profiling of the HLCs was conducted using Affymetrix GeneChip PrimeView Human Gene Expression Arrays. Primary human hepatocytes (PHHs) were also included for comparison. The microarray dataset has been deposited in the Gene Expression Omnibus of the National Center for Biotechnology Information with accession number GSE187011. Detailed interpretation and discussion of the data can be found in the corresponding research article (Li et al., 2022). This dataset is useful in providing a molecular basis for the differences observed between the two differentiation methods, offering new insights into gene regulations in hepatogenesis in vitro, and suggesting ways to further improve hepatocyte differentiation in order to obtain more mature HLCs for biomedical applications.


Induced pluripotent stem cells Hepatocyte differentiation Hepatocyte-like cells Transcriptomics Microarray a b s t r a c t
A variety of methods have been reported for the differentiation of hepatocyte-like cells (HLCs) from human induced pluripotent stem cells (iPSCs) using various growth factors or small molecules. However, direct comparison of the differentiation efficiency and the quality of the final HLCs between different methods has rarely been reported. To fill this data gap, we compared two hepatocyte differentiation methods, termed Method 1 and Method 2, and published the major findings in a research article entitled "Phenotypical, functional and transcriptomic comparison of two modified methods of hepatocyte differentiation from human induced pluripotent stem cells" (Li et al., 2022). The current data article describes the transcriptomic dataset comparing the two methods. HLCs were collected at early maturation (day 17) and late maturation (day 21) stages of the differentiation and total RNA were isolated. Global gene expression profiling of the HLCs was conducted using Affymetrix GeneChip PrimeView Human Gene Expression Arrays. Primary human hepatocytes (PHHs) were also included for comparison. The microarray dataset has been deposited in the Gene Expression Omnibus of the National Center for Biotechnology Information with accession number GSE187011. Detailed interpretation and discussion of the data can be found in the corresponding research article (Li et al., 2022). This dataset is useful in providing a molecular basis for the differences observed between the two differentiation methods, offering new insights into gene regulations in hepatogenesis in vitro , and suggesting ways to further improve hepatocyte differentiation in order to obtain more mature HLCs for biomedical applications.
Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Value of the Data
• The data reflect direct transcriptomic comparison of different methods of hepatocyte differentiation from human iPSCs [1] . • The data are interesting to the general biomedical research community and are especially beneficial to researchers in the fields of developmental biology and regenerative medicine. • The data may offer new insights into gene regulations in hepatogenesis, and suggest ways to further improve hepatocyte differentiation in order to obtain more mature HLCs for biomedical applications.

Data Description
The dataset consists of 4 sample types collected from two hepatocyte differentiation methods, Method 1 and Method 2, at two time points, day 17 and day 21. An additional sample type, The raw image DAT files generated by the scanner, which contain pixel intensity values, were postprocessed using the Affymetrix GeneChip Command Console software v.4.0 to generate cell intensity CEL files, which contain information of probe-level intensity values. The values of individual probes belonging to each probe set in the CEL files were then summarized using the robust multi-array average (RMA) algorithm [2] embedded in the Affymetrix Expression Console software v.1.3, which comprises of convolution background correction, quantile normalization, and median polish summarization, to generate CHP files. CHP files contain information of probe set or gene-level intensity values.
For each sample, both the CEL and the CHP files are included in the dataset. All the samples were assessed for data quality using the Affymetrix Expression Console software v.1.3 and all quality control metrics (including spike-in controls during target preparation and hybridization and housekeeping gene controls) were found within boundaries. Metrics of labeling and hybridization controls are shown in Fig. 1 .
The data for the HLCs and for the PHHs were collected at different times, therefore a batch effect was observed at the cell intensity (CEL) level. However, the batch effect was largely eliminated at the probe set (CHP) level after normalization and summarization using RMA ( Fig. 2 ).
Results of unsupervised exploratory data analysis are shown in Fig. 3 . There is clear separation between the different cell types, the different differentiation methods, and to a lesser extent between the different time points, as shown in the principal component analysis (PCA) plot ( Fig. 3 A) and the hierarchical clustering analysis (HCA) dendrogram ( Fig. 3 B). A total of 1807 differentially expressed genes (DEGs) were identified between the two differentiation methods ( Fig. 3 C); in comparison, 13,497 DEGs were found between the two cell types ( Fig. 3 D). Details of supervised data analysis could be found in the related research article [1] .

Experimental Design
HLCs were differentiated from a human iPSC line using two methods, Method 1 and Method 2. The resultant HLCs were collected at early maturation (day 17) and late maturation (day 21) stages of the differentiation. Cryopreserved PHHs were included in the study for comparison with the HLCs. Total RNA was isolated from the cells and global gene expression profiling of the samples was conducted using Affymetrix GeneChip PrimeView Human Gene Expression Arrays.

Human iPSC Line and Single-Cell Culture of iPSCs
Human iPSC cell OARSAi002-A was previously generated in our laboratory from cord blood of a healthy non-Hispanic white male using self-replicative RNA reprogramming technology [3] . Cells were maintained as single-cell culture (no colony formation) on COAT-1 pre-coated 6-well tissue culture plate in Cellartis DEF-CS Culture System (Takara Bio USA, Mountain View, CA) at 37 °C, 5% CO 2 incubator. Details of cell culture and passage were reported previously [1] .

Hepatocytes Differentiation from iPSCs
iPSCs were first induced to definitive endoderm using the STEMdiff Definitive Endoderm Kit from STEMCELL Technologies (Vancouver, Canada) following the manufacturer's instructions. Hepatic specification and maturation were then carried out using the two methods, Method 1 and Method 2, modified from previously published methods. Details of the differentiation methods were reported previously [1] .

RNA Extraction and Quality Assurance
Cells were harvested on day 17 and day 21 of hepatocyte differentiation and stored at −80 °C before RNA extraction. Cells were lysed in RLT buffer (Qiagen, Valencia, CA) and homogenized using QIAshredder (Qiagen). Total RNA was extracted from the cell lysates using EZ1 RNA Cell Mini Kit (Qiagen) on EZ1 Advanced XL automated RNA purification instrument (Qiagen) following the manufacturer's instruction. An on-column DNase digestion step was included to remove potentially available contaminating DNA. Total RNA concentration and purity (260/280) were subsequently measured using a NanoDrop 20 0 0 UV-Vis spectrophotometer (NanoDrop Products, Wilmington, DE). The integrity of RNA samples was further assessed using the Agilent 2100 Bioanalyzer with the RNA 60 0 0 Nano Reagent Kit (Agilent Technologies) to obtain the RNA integrity number.

RNA Processing and Microarray Experiment
All reagents and instruments used in the microarray experiment were obtained from Affymetrix (Santa Clara, CA). Total RNA samples were processed using GeneChip 3 IVT PLUS Reagent Kit and hybridized onto GeneChip PrimeView Human Gene Expression Arrays following protocols from the manufacturer. Briefly, single-stranded complementary DNA (cDNA) was generated from 100 ng total RNA using reverse transcriptase and a T7-linked oligo(dT) primer, which was then converted to double-stranded cDNA using DNA polymerase and RNase H. Subsequently, complementary RNA (cRNA) was synthesized through in vitro transcription (IVT) with biotinylated UTP and CTP, using T7 RNA polymerase as the enzyme and the second strand of the double-stranded cDNA as the template.
The biotin-labeled cRNA was then purified and a fraction of 12 μg was fragmented by Mg 2 + at 94 °C. Fragmented cRNA was then hybridized onto the microarray chips in the GeneChip Hybridization Oven 645 at 45 °C for 16 h. After hybridization, the microarray chips were stained and washed on the GeneChip Fluidics Station 450. Finally, the chips were scanned using GeneChip Scanner 30 0 0 7G, and the scanned image (DAT) files were further preprocessed using Affymetrix GeneChip Command Console software (v. 4.0) to produce cell intensity (CEL) files. Arrays were quality-checked using the Affymetrix Expression Console software (v. 1.3) prior to data processing and analysis in the next step.

Microarray Data Processing and Unsupervised Data Analysis
The robust multi-array average (RMA) algorithm integrated in the Affymetrix Expression Console software (v.1.3) were used to summarize values of individual probes belonging to one probe set in the CEL files to generate the CHP files. Unsupervised principal component analysis (PCA) and hierarchical cluster analysis (HCA) on normalized data from all samples were then performed using the ArrayTrack software system developed by the U.S. FDA [4] to explore similarities and differences among the samples. Subsequently, one-way analysis of variance (ANOVA) was conducted using the Affymetrix Transcriptome Analysis Console (TAC) software (v.4.0) to identify differentially expressed genes (DEGs). For each comparison between two experimental groups, the selection of DEGs were based on the fold change (FC) of each annotated gene coupled with its corresponding p -value using the cutoff values of |FC| > 2.0 and p < 0.05.

Ethics Statements
This manuscript adheres to the Elsevier Ethics in publishing standards.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Expression data from hepatocyte-like cells (HLCs) differentiated from human induced pluripotent stem cells (iPSCs) (Original data) (Gene Expression Omnibus).