Whole transcriptome data analysis of zebrafish mutants affecting muscle development

Formation of the contractile myofibril of the skeletal muscle is a complex process which when perturbed leads to muscular dystrophy. Herein, we provide a mRNAseq dataset on three different zebrafish mutants affecting muscle organization during embryogenesis. These comprise the myosin folding chaperone unc45b (unc45b−/−), heat shock protein 90aa1.1 (hsp90aa1.1−/−) and the acetylcholine esterase (ache−/−) gene. The transcriptome analysis was performed in duplicate experiments at 72 h post-fertilization (hpf) for all three mutants, with two additional times of development (24 hpf and 48 hpf) for unc45b−/−. A total of 20 samples were analyzed by hierarchical clustering for differential gene expression. The data from this study support the observation made in Etard et al. (2015) [1] (http://dx.doi.org/10.1186/s13059-015-0825-8) that a failure to fold myosin activates a unique transcriptional program in the skeletal muscles that is different from that induced in stressed muscle cells.


Value of the data
This dataset comprises the transcriptome analysis of the zebrafish mutants unc45b À / À, hsp90aa1.1 À / À and ache À / À during embryonic development.
It provides the list of regulated genes and associated Gene Ontology analysis of skeletal muscle cells under cellular stress and defective chaperoning activity.
It is anticipated that this dataset can serve as a reference point for other analysis on myopathies. Table 1 Description of the genotype and stage of development of the zebrafish embryos collected in the study. The mutants acheÀ / À , unc45b À /À and hsp90a À / À are caused by recessive mutations. The genotype of each sample is indicated. The mutant samples symbolized by ( À / À ) are homozygous mutant. The wild type siblings are annotated by WT and are constituted of embryos without any overt phenotype with a mixed genotype ( þ /þ or þ / À ). Biological duplicates are indicated by the number 1 and 2.

Production of mutants and experimental design
Homozygous mutant zebrafish embryos were produced from incrosses of identified heterozygote mutant carriers in the AB genetic background for the lines unc45bþ/À [2], hsp90aa1.1þ/À (referred here after as hsp90a) [3] and acheþ/À [4]. Wild type and mutant siblings were identified at 72 hpf for hsp90aÀ /À and acheÀ /À, and at 24 hpf, 48 hpf and 72 hpf for unc45bÀ /À by their morphology under the binocular. Embryos were collected from several incrosses in two independent collections. About 20-50 manually dechorionated embryos of each genotype were collected in fish water and homogenized in 200 mL Trizol (Thermo Fisher) after removing fish water with a pipette. The extraction of total RNA was performed as described in the manufacturer's protocol, with the modification that an additional extraction with chloroform was performed before precipitation with isopropanol. Total RNA pellets were resuspended in 50 ml RNase-free water (Ambion). RNA integrity was checked by loading about 100 ng total RNA on a RNA6000 Nanochip using an Agilent 2100 Bioanalyser (Agilent Technologies). Samples showed no sign of degradation (RNA index number49). The list of genotypes and stages of samples collected in the study are provided in Table 1.

Library preparation, quality control and data analysis
Sequencing libraries were prepared with the TruSeq RNA Library Prep kit v2 (Illumina), following manufacturer's protocol. Briefly, total RNA (1 mg) for each sample was used for poly(A) RNA selection using poly-dT coated magnetic beads followed by fragmentation. First strand cDNA synthesis was performed with the Superscript II (Thermo Fisher) using random hexamer primers. The cDNA fragments were subjected to end-repair and dA-tailing, and finally ligated to specific double stranded barcoded adapters. Libraries were amplified by 12 cycles of PCR. The quality and concentration of the resulting sequencing libraries were determined on a DNA1000 chip using an Agilent 2100 Bioanalyser (Agilent Technologies). The mRNASeq libraries were sequenced at 7 pM on a HiSeq1000 device (Illumina) to generate 50 bp paired-end reads. Cluster detection and base calling were performed using RTA v.1.13 and quality of reads assessed with CASAVA v.1.8.1 (Illumina). The mapping was performed with TopHat version 1.4.1 [5], setting the distance between mates to 180 bp and a standard deviation of 80 bp. Other TopHat options were -butterfly-search -coverage-search -microexon-search -a 5 -p 5 -library-type fr-unstranded and using the known exon-exon junctions from Ensembl release 75. Quantification of the mapped reads was determined with HTSeq version 0.5.3p3 [6] using the options -stranded¼no -mode ¼union and using the gtf file from Ensembl release 75. unc45b-/-2_48 hpf unc45b-/-1_72 hpf unc45b-/-2_72 hpf unc45b-/-1_24 hpf unc45b-/-2_24 hpf unc45bWT 1_24 hpf unc45bWT 2_24 hpf unc45bWT 1_48 hpf unc45bWT 2_48 hpf unc45bWT 1_72 hpf unc45bWT 2_72 hpf The principal component analysis of the regularized log transformed (rlog) data from DESeq2 [7] shows that the biological duplicates are consistent and that the variance is mainly a factor of the stage and genotype (Fig. 1). The 9453 genes with rlog expression consistently49 in at least one set of duplicate were subjected to hierarchical clustering with Pearson's correlation and the completelinkage methods using the R packages hclust and gplots (Fig. 2). The hierarchical clustering of 24 selected genes involved in various biological processes such muscle development and neurogenesis shows specific patterns of expression depending on the genotype (Fig. 3).
The differential gene expression of the binary comparisons between the mutant and the respective controls were made with the R package DESeq2. A MA plot for the analysis of the differential gene expression between control and unc45b À / À embryos at 72 hpf is shown (Fig. 4). For each gene, the mean expression was plotted against the logarithm of the fold change. Genes with significant deregulation are shown in red (adjusted p-value o0.01).
Zebrafish genes up-regulated or down-regulated in the comparative analysis between the unc45bÀ /À and control larvae at 72 hpf (adjusted p-valueo0.01) were subjected to pathway enrichment analysis. First known human gene orthologues with at least 30% of identity or an orthology confidence score of 1 were obtained from Ensembl Compara [8] via Biomart (release 75) [9]. Then unique Ensembl human or zebrafish identifiers were used to obtain their associated Gene Ontology terms using the R package biomaRt [10]. Finally, enrichment of GO terms was assessed by computing p-values from a Fisher's exacttest (gene universe set to 18,000). A comparison between the top enriched GO terms with human and zebrafish genes up-regulated in the unc45bÀ /À compared to control at 72 hpf is given in Tables 2 and 3. The data show that pathways involved in muscle physiology and hypoxia are enriched in the set of genes up-regulated in unc45bÀ /À compared to the control at 72 hpf.  Table 3 Zebrafish GO term enrichment. Zebrafish genes up-regulated in unc45b À / À at 72 hpf were directly used to query GO terms and assess pathway enrichment with a Fisher's exact-test.

Ethic approval
All experiments were made in accordance with the German animal protection standards and were approved by the Government of Baden-Württemberg, Regierungspräsidium Karlsruhe, Germany (Aktenzeichen 35-9185.81/G-137/10).