Transcriptome datasets of neural progenitors and neurons differentiated from induced pluripotent stem cells of healthy donors and Parkinson's disease patients with mutations in the PARK2 gene

Parkinson's disease (PD) is a complex systemic disorder caused by neurodegenerative processes in the brain that are mainly characterized by progressive loss of dopaminergic neurons in the substantia nigra. About 10% of PD cases have been linked to specific gene mutations (Zafar and Yaddanapudi, 2022) including the PARK2 gene that encodes a RING domain-containing E3 ubiquitin ligase Parkin. PD-Parkin patients have a younger onset, longer disease duration, and more severe clinical symptoms in comparison to PD patients with unknown causative PD mutations (Zhou et al., 2020). Induced pluripotent stem cells (iPSCs) are considered to be a powerful tool for disease modeling. To evaluate how mutations in PARK2 contribute to PD development, iPSC lines were obtained from three healthy donors and three PD patients with different mutations in the PARK2 gene. iPSC lines were differentiated consequently into neural progenitors (NPs) and then into terminally differentiated neurons (DNs). The data presented in this article were generated on an NextSeq 500 System (Illumina) and include transcriptome profiles for NPs and DNs of healthy donors and PD patients with mutations in the PARK2 gene. Top10 up- and down-regulated differentially expressed genes in NPs and DNs of patients with PD compared to healthy donors were also presented. A comparative transcriptome analysis of neuronal derivatives of healthy donors and PD patients allows to examine the contributions of the PARK2 gene mutations to PD pathogenesis.

Parkinson's disease (PD) is a complex systemic disorder caused by neurodegenerative processes in the brain that are mainly characterized by progressive loss of dopaminergic neurons in the substantia nigra. About 10% of PD cases have been linked to specific gene mutations (Zafar and Yaddanapudi, 2022) including the PARK2 gene that encodes a RING domain-containing E3 ubiquitin ligase Parkin. PD-Parkin patients have a younger onset, longer disease duration, and more severe clinical symptoms in comparison to PD patients with unknown causative PD mutations (Zhou et al., 2020). Induced pluripotent stem cells (iPSCs) are considered to be a powerful tool for disease modeling. To evaluate how mutations in PARK2 contribute to PD development, iPSC lines were obtained from three healthy donors and three PD patients with different mutations in the PARK2 gene. iPSC lines were differentiated consequently into neural progenitors (NPs) and then into terminally differentiated neurons (DNs). The data presented in this article were generated on an NextSeq 500 System (Illumina) and include transcriptome profiles for NPs and DNs of healthy donors and PD patients with mutations in the PARK2 gene. Top10 up-and down-regulated differentially expressed genes in NPs and DNs of patients with PD compared to healthy donors were also presented. A comparative transcriptome analysis of neuronal derivatives of healthy donors and PD patients allows to examine the contributions of the PARK2 gene mutations to PD pathogenesis.
© Value of the Data • The transcriptomic datasets generated are useful for identifying genes involved in the PD pathogenesis and determining mechanisms of PD onset associated with mutations in PARK2. PARK2 was selected for analysis as mutations in this gene have been shown to cause autosomal recessive early onset PD [2,3] . • These data are valuable for researchers who investigate gene network in the process of neuronal differentiation and molecular mechanisms involved in PD development. Analysis of healthy NPs and DNs vs. PD NPs and DNs can be important for research in disease modeling, autologous iPSCs implantation [4] and genetic methods of disease correction. • These data may be used to perform multilevel comparative transcriptomic analysis of NPs and neurons in healthy donors and PD patients with mutations in various PARK2 exons, as well as to evaluate transcriptional features that healthy or PD NPs acquire during differentiation into mature neurons.

Data Description
PD affects at least 1% of the world population over 60. About 10% of PD cases have been linked to specific gene mutations, mainly in young people [1] . Research on contribution of gene mutations to PD pathogenesis is thus highly significant. One of PD-associated genes, PARK2 , is located in a region susceptible to form gaps, breaks, and rearrangements [5] , so the different mutations in this gene are in the focus of interest. To date, there are few reports on the transcriptome of neural derivatives differentiated from PD patients' iPSCs. We obtained the iPSC-derived neural cells at different differentiation stages: neural progenitors (NPs) and terminally differentiated neurons (DNs) from both healthy donors and PD patients with the PARK2 gene mutations. According to publications, PARK2 mutations most commonly occur in exons 3-6 [6] . Our data covers PD patients with deletions of the 2nd and 8th exons. iPSCs from three healthy donors and three PD patients with different mutations in PARK2 were differentiated into uncommitted NPs, and then into mature DNs ( Table 1 , Supplementary Figs. S1-S4). Whole transcriptome profiles of these cell populations were generated using NextSeq 500 System (Illumina, USA). The datasets contain raw sequence data converted into the FASTQ format. Raw transcriptome sequence reads were deposited into the NCBI GEO database (Accession number GSE181029). Reads were trimmed for quality (Supplementary Table S1); paired reads were trimmed using Trimmo-    . Results were aggregated to gene level using the R package tximport [9] . R packages FactoMineR [10] and rgl (0.108.3) [11] were used for PCA analysis and data visualization, respectively. Fig. 1 . (A and B) visualizes the principal component analysis (PCA) of NP and DN transcriptome profiles, respectively. Fig. 1 A demonstrates that "healthy" NPs (blue dots) form a separate compact cluster. Differentially expressed genes were identified using the R package DESeq2 [12] Table 2 . presents Top10 differentially expressed genes in NPs of PD patients as compared to HD. Fig. 1 B Table 3 Top10 up-and down-regulated differentially expressed genes in PD DNs compared to HD DNs.  shows that DNs from HD and PD patients form two different clusters. Table 3 presents Top10 differentially expressed genes in DNs of PD patients as compared to HD.

Ethics statement
The study complies with the Declaration of Helsinki and was performed following approval by the Ethic Committee of the Research Center of Neurology. Written informed consent was obtained from every patient and healthy donor.

Derivation of IPSPDPS8 and IPSPDPS2d cell lines
iPSCs were derived from human skin fibroblasts of patients carrying mutations in the PARK2 gene using CytoTune TM -iPS 2.0 Sendai Reprogramming Kit (Thermo Fisher, USA). The mutations were localized using MLPA method with subsequent sequencing. The obtained iPSCs expressed the necessary pattern of specific pluripotency-associated genes: SSEA-4, Oct-4 ( Supplementary  Fig. S1) and possessed a normal karyotype. The iPSCs could produce the derivatives of three embryonic germ layers. Spontaneously differentiated iPSCs were stained with antibodies for the markers of the derivatives of three germ layers (ectoderm-TUBB3, mesoderm-Desmin, entoderm-AFP) (Supplementary Fig. S2). iPSC lines were cultured in XF Medium (Sartorius, Germany) on Matrigel-coated substrates (Corning, USA). Cells were investigated with an AxioImager Z1 fluorescence microscope equipped with an AxioCam HRM camera using AxioVision 4.8 software (Zeiss, Germany). Immunofluorescence staining was performed according to a previously described method [13] .

Transcriptome data profiling
Total RNA from NP and DN cultures in triplicate was extracted using the RNeasy Micro Kit (Qiagen, USA) followed by treatment with DNAse I (Qiagen, USA). The reaction was purified with the PureLink RNA Mini Kit (ThermoFisher, USA). RNA samples quality was checked using 2100 bioanalyzer (Agilent, USA). Enrichment of polyadenylated RNA and library preparation was performed with NEB Next Ultra II Directional RNA Library Prep (NEB, USA) according to the manufacturer's protocol. Samples were sequenced on the NextSeq 500 System (Illumina, USA) with the NextSeq 500/550 High Output Kit v2.5 (75 Cycles).

RNAseq data analysis
Raw sequence data were converted to the FASTQ format using the bcl2fastq software (Illumina). Reads were trimmed for quality (Supplementary Table S1); paired reads were trimmed using Trimmomatic (v. 0.35) [7] the first 1 and last 1 bases. Trimmed RNAseq reads were quantified against Homo Sapiens GRCh38.13 genome annotation at the transcript level using Salmon (v.1.4) [8] . Results were aggregated to gene level using the R package tximport [9] . Differentially expressed genes were identified using the R package DESeq2 [12] . R packages FactoMineR [10] and rgl (0.108.3) [11] were used for PCA analysis and data visualization, respectively. All software packages and libraries used can be accessed via the GitHub repository: https://github. com/ksenia1602/scripts _ for _ articles/tree/main/scripts _ Data _ in _ Brief _ artcicle _ NP _ ND _ from _ IPS .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.