Global analysis of the haematopoietic and endothelial transcriptome during zebrafish development

Highlights ► Transcriptome of developing blood and vascular endothelial cells in zebrafish is described. ► 388 Novel genes expressed by blood and endothelial cells are identified. ► tmem88a and trim2a are novel genes required for primitive erythropoiesis and myelopoiesis.


Introduction
Zebrafish are widely used in studies investigating haematopoietic and vascular development. They have several advantages over other vertebrate model systems, including access to hundreds of externally fertilised, transparent embryos that allow the visualisation of developmental processes in vivo. There are also a number of haematopoietic and vascular mutants previously found in large scale ENU mutagenesis screens (reviewed by Baldessari and Mione, 2008), and transgenic lines are available including the Tg(fli1a:egfp) y1 line used in this study (Baldessari and Mione, 2008;. For genes where mutants are not available, antisense morpholino oligonucleotides (morpholinos) can be used to knock down genes of interest. Finally, and importantly, there is a high degree of conservation of genes known to be important for vascular and haematopoietic development between zebrafish and higher organisms (Jing and Zon, 2011).
During early vertebrate embryo development blood and endothelial cells are found closely associated. In mammals they are initially found in the blood islands of the extraembryonic yolk sac (Park et al., 2005), while during segmentation in zebrafish they are found intra-embryonically in the intermediate cell mass (ICM) of the ventral mesoderm (Detrich et al., 1995). In view of this close relationship it has been suggested that blood and endothelial cells have a common 0925-4773 Ó 2012 Elsevier Ireland Ltd. http://dx.doi.org/10.1016/j.mod.2012.10.002 precursor cell, the haemangioblast (Sabin, 1920). Although there has been evidence to support this hypothesis from in vitro studies, it has only recently been shown that the haemangioblast exists in vivo (Park et al., 2005;Vogeli et al., 2006).
The factors controlling haemangioblast formation and the development of angioblasts (vascular endothelial cell precursors) and haematopoietic stem cells are incompletely understood. Several transcription factors are important for the formation of the haemangioblast. Stem cell leukaemia (scl, also known as tal1) null mice die in utero due to the complete absence of blood (Shivdasani et al., 1995). In zebrafish, morpholino knockdown of scl phenocopies the null mouse, but these embryos also have impaired vascular gene expression in the dorsal aorta and loss of intersegmental vessel (ISV) formation (Patterson et al., 2005). The Ets-1 related protein (etsrp, also known as etv2) was identified in a screen for novel genes affected in the cloche mutant, that lacks both blood and endothelial cells (Sumanas et al., 2005). Morpholino knockdown of etsrp leads to impaired vasculogenesis and myelopoiesis (Sumanas et al., 2008;Sumanas and Lin, 2006). Fli1, like etsrp, is an ETS transcription factor that is also important for haemangioblast formation. It has been suggested to act at the top of a transcriptional network driving blood and endothelial development by regulating other genes required for haemangioblast formation including scl and etsrp . The VEGF signalling pathway is also critical for vascular development. Loss of Vegf or its receptor Flk1 in mice leads to death in utero due to failure to form the vasculature (Carmeliet et al., 1996;Shalaby et al., 1995). For erythrocyte development Gata1 is a master regulator. Gata1À/À mice die in utero due to the failure of pro-erythrocytes to differentiate into mature erythrocytes (Fujiwara et al., 1996).
The identification of genes involved in blood and endothelial development is of significant therapeutic interest. We therefore sought to determine the transcriptome of developing haematopoietic and vascular endothelial cells in vivo. Previous studies have attempted to answer this question using microarrays (Covassin et al., 2006;Kalé n et al., 2009;Sumanas et al., 2005;Wallgard et al., 2008;Weber et al., 2005;Wong et al., 2009). Here, because fli1 is one of the earliest factors involved in haemangioblast formation, we have used a fluorescence-activated cell sorting (FACS) technique (Covassin et al., 2006) to isolate gfp positive (gfp+) and negative (gfpÀ) cells from transgenic Tg(fli1a:egfp) y1 embryos prior to high-throughput sequencing. This transgenic line utilises the fli1a promoter to drive gfp expression in blood and vascular endothelial cells, pharyngeal arch and neural crest derivatives . Using this technique we have identified 388 novel genes expressed in the enriched population of blood and endothelial cells. Using morpholino knockdown we confirm that two of the genes identified, tme-m88a and trim2a, are novel genes required for erythropoiesis and myelopoiesis in zebrafish.

Isolation of vascular and haematopoietic cells from whole embryos
To identify genes involved in the development of endothelial and blood cells, we isolated gfp+ cells from dissociated 26-28 hpf Tg(fli1a:egfp) y1 transgenic zebrafish, where the fli1a promoter drives gfp expression in endothelial and haematopoietic cells and pharyngeal arch tissue . This time-point was chosen because the intersegmental vessels are forming by angiogenesis and the haematopoietic stem cells are starting to arise from the ventral floor of the aorta (Bertrand et al., 2010;Isogai et al., 2003).

Global analysis of genes enriched in gfp positive cells by massively parallel sequencing
The transcriptome of developing zebrafish blood and vascular endothelial cells was defined by undertaking high-throughput sequencing of cDNA made from sorted gfp+ and gfpÀ cell populations derived from about 3000 Tg(fli1a:egfp) y1 embryos. We found that 754 protein-coding genes were enriched threefold or greater in the gfp+ compared to the gfpÀ population of cells in both biological replicates ( Fig. 1 and Supplementary Table 1). This group includes genes expected to be enriched such as scl, etsrp, fli1a, gata1a, haemoglobins and vegf receptors (Supplementary Fig. 2 and Table 1). Some genes known to be important for vascular development and/or haematopoiesis (like ephrinb2, ephB4, jag2, notch1, notch3 and unc5b) were not enriched in the gfp+ libraries (Adams et al., 1999;Hadland et al., 2004;Krebs et al., 2000;Lu et al., 2004;Van de Walle et al., 2011;Wang et al., 1998). This is because these genes are also strongly expressed in other tissues including neural tissues (ZFIN).
To identify the biological functions of the 754 genes we used the Panther classification system (Thomas et al., 2003) Genes involved in blood circulation and gas exchange (2.77-fold, p = 5.14E À04 ), immunity and defence (1.45-fold, p = 5.27EÀ05), transport (1.42-fold, p = 1.91EÀ04) and intracellular protein traffic (1.4-fold, p = 1.82EÀ03) were found to be the most significantly enriched compared to the whole zebrafish genome whereas neuronal activities (À2.28-fold, p = 8.57EÀ06), nucleoside, nucleotide and nucleic acid metabolism (À1.31-fold, p = 2.42EÀ04) and sensory perception (À1.59-fold, p = 0.024) were significantly under-represented (Table 1). One third of the genes, however, had an unclassified biological function ( Supplementary Fig. 3). Use of ZFIN and PubMed revealed that 43% of the genes were already known to be expressed in either blood or endothelial cells in zebrafish, Xenopus, mouse, chick or humans, 6% in other tissues and organs like pharyngeal arch, pronephric duct, neural crest or heart, while 51% had an unknown expression pattern (Supplementary Table 1).

Validation of massively parallel sequencing genes
Eighty-five genes without a known role in blood or endothelial cell development or angiogenesis were chosen for validation. Using the remaining total RNA that was used to make the initial libraries, eighty-one of the eighty-five genes could be validated by qRT-PCR (Table 2 and Supplementary Table 4). Because gfp expression in Tg(fli1a:egfp) y1 zebrafish is found in pharyngeal arch tissue and neural crest derivatives as well as in blood and endothelial cells , we also selected forty-one genes to screen by whole mount in situ hybridisation. Of these, seventeen had restricted expression in both vascular endothelial and blood cells ( Fig. 2 and Supplementary Fig. 4), 10 just in blood (Fig. 3), five in endothelial cells alone, one in pronephric duct and one in pharyngeal arch, endothelial cells and tailbud ( Fig. 4 and Supplementary Table 2). The remaining seven genes had widespread expression (data not shown).

2.4.
Tmem88a and trim2a morphants have reduced erythrocyte and myeloid cell formation Finally, as confirmation of the usefulness of this approach to identify genes required for the development of blood or endothelial cells, we inhibited gene function using antisense morpholino oligonucleotides. Ten genes with strong localised expression by in situ hybridisation were selected for study (Table 2). Loss of function of eight of these genes had no morphological effect and they had no vascular or blood phenotype when examined at 24 and 48 hfp despite gene knockdown (data not shown and Supplementary Fig. 5). Embryos lacking tmem88a or trim2a had normal vascular patterning at 24 and 48 hpf but reduced numbers of erythocytes and myeloid cells as judged by O-dianisidine and peroxidase staining respectively at 48 hpf (Fig. 5). For each gene this phenotype was observed with a translation-blocking and a splice-blocking morpholino. Surprisingly there was normal expression of ga-ta1a at 24 hpf in the tmem88a and trim2a morphants but reduced expression of embryonic haemoglobin hbbe1 along with myeloid markers mpx and lyz using qPCR (Supplementary Fig. 6). This suggests that the effect on erythrocyte development by these genes is downstream of gata1. The identification of 2 novel genes involved in primitive blood cell development confirms the validity of our approach used.

Discussion
Our experiments have identified 754 protein-coding genes that are enriched at least threefold in the gfp+ population of Tg(fli1a:egfp) y1 zebrafish. The increased sensitivity of highthroughput sequencing is emphasised by the fact that a previous study using microarray technology and a twofold enrichment criterion identified only one-quarter the number of genes enriched in this study (Covassin et al., 2006). Of the 754 protein-coding genes enriched threefold in the gfp+ population of cells, 43% were already known to be expressed in blood or endothelial cells by cross-referencing the genes with ZFIN and PubMed. A number of previous studies have been performed to isolate and characterise blood and/or vascular endothelial cells (Covassin et al., 2006;Kalé n et al., 2009;Sumanas et al., 2005;Wallgard et al., 2008;Weber et al., 2005;Wong et al., 2009). The data from the current study complements these and provides a catalogue of the transcriptome of developing blood and vascular endothelial cells. Some genes expressed in blood or vascular endothelial cells, such ephrinb2, ephB4, jag2, notch1, notch3 and unc5b (Adams et al., 1999;Hadland et al., 2004;Krebs et al., 2000;Lu et al., 2004;Van de Walle et al., 2011;Wang et al., 1998) were not enriched in our study, possibly because they are also highly expressed in other tissues, particularly neural tissues.
Eighty-five genes without a known role in the development of blood or vascular endothelial cells or angiogenesis were chosen for validation by qRT-PCR. Eighty-one (95%) of these were validated in both biological replicates. As a key regulator of the transcriptional network driving blood and endothelial development, fli1a would be expected to be a good marker for isolating a pure population of blood and endothelial cells but it is also expressed in pharyngeal arch and neural crest derivatives. Our experiments therefore yield an enriched population of blood and endothelial cells rather than a population that is pure. . In view of this, the expression pattern of forty-one of these novel genes were determined by whole mount in situ hybridisation. Thirty-four (83%) had a restricted expression in blood and/ or endothelial cells thus confirming the strength of our approach for identifying novel blood and vascular endothelial genes. Finally ten of these genes were then knocked down using antisense morpholino oligonucleotides. This combined approach of sequencing and then knocking down selected genes after validation has identified two novel genes, trim2a and tmem88a that are required for primitive erythropoiesis and myelopoiesis. Both these cells are initially derived from the posterior blood island (Bertrand et al., 2007) where we have shown by in situ hybridisation that both tmem88a and trim2a are expressed. The exact underlying mechanism is still to be determined but our data suggests that the effect is downstream of gata1a because gata1a expression by qPCR is normal in both trim2a and tmem88a morphants.

Trim2a
Trim2a is a member of the TRIM (tripartite motif) family of proteins first identified in a screen of genes up-regulated after induced seizure activity in the hippocampus of mice (Ohkawa et al., 2001). One function of this protein family is to promote ubiquitination of certain proteins via a RING domain. A gene trap screen in mice has recently reported that Trim2 deficiency causes accumulation of neurofilament light chain and neurodegeneration (Balastik et al., 2008), with no mention of a haematopoietic defect. It is possible that the mouse mutation functions as a hypomorph, because the gene trap integration occurs in intron 6 while the RING domain is found in exon 2 (Balastik et al., 2008); in contrast our splice blocking morpholino induces loss of exon 2.

Tmem88a
The second novel haematopoietic gene identified in this study is tmem88a. TMEM88 was originally identified in a screen for proteins that bind to dishevelled (Lee et al., 2010). This interaction negatively regulates the canonical Wnt signalling pathway, so loss of tmem88a should increase Wnt signalling in the zebrafish embryo, and this in turn would be expected to increase numbers of haematopoietic stem cells rather than cause the observed phenotype (Staal and Clevers, 2005). Future work will investigate the mode of action of tmem88a.

Conclusion
We have used high-throughput sequencing to catalogue the transcriptome of an enriched population of blood and vascular endothelial cells in the developing zebrafish embryo, and verified our approach by showing that two novel genes thus identified, trim2a and tmem88a, are required for primitive erythropoiesis and myelopoiesis. This provides a valuable resource in efforts to understand endothelial cell development and haematopoiesis in vertebrate embryos. Zebrafish were maintained under standard conditions (Nusslein-Volhard and Dahm, 2002) and staged according to Kimmel et al. (1995). All procedures complied with the UK Home Office requirements. The embryos obtained from in-crossing transgenic Tg(fli1a:egfp) y1 zebrafish were grown to 26-28 h post fertilisation (hpf) prior to dechorionating with pronase. They were then washed in calcium free Ringer's solution for 15 min during which time the yolks were removed by gently pipetting up and down. The embryos were transferred to a 50 mm petri dish (Sterilin) and incubated at 28.5°C in 1· PBS (pH 8), 0.25% trypsin and 1 mM EDTA (Invitrogen) until the embryos were a single cell suspension (approximately 30-40 min). To aid dissociation the solution was agitated by pipetting up and down every 10 min. The digest was stopped by adding CaCl 2 to a final concentration of 2 mM and foetal calf serum (FCS) to 10%. The cells were centrifuged at 400g for 5 min and washed once in PBS before resuspending in Leibovitz's L15 medium without phenol red (Invitrogen), 1% FCS and 0.8 mM CaCl 2 . Single cell suspensions were sorted at room temperature using the 488 nm laser on a Cytomation MoFlow high performance cell sorter (Dako). The separated cells were collected in Leibovitz's L15 medium without phenol red, 20% FBS and 0.8 mM CaCl 2 . The sorted cells were centrifuged at 400g for 5 min before re-suspending in 1 ml Trizol (Invitrogen) and storing at À80°C until RNA extraction was performed according to the manufacturer's protocol. The maximum time from starting the dissociation until the cells were re-suspended in Trizol was 2 h.

Gene knockdown by morpholino oligonucleotide injection
One-cell Tg(fli1a:egfp) y1 embryos were injected with 1 nl of custom (translation-or splice-blocking) morpholino oligonucleotide plus zebrafish p53 morpholino, or standard control morpholino (5 or 10 lg/ll each, Gene Tools, USA). See Supplementary Table 3  under brightfield and epifluorescence microscopy at 24 and 48 hpf for defects in morphology and the development of the vascular system, heart or haematopoietic cells. Morphologically abnormal embryos were excluded from analysis. The effectiveness of knockdown for splice-blocking morpholinos was confirmed by RT-PCR. The primer sequences can be found in Supplementary Table 4.

4.4.
Illumina RNA-Seq library preparation, sequencing and analysis Illumina RNA-Seq libraries were made with 3 lg of total RNA according to the manufacturer's protocol. The only deviation from the protocol was to use the E-gel clone well system (Invitrogen) for fragment size selection. There were two biological replicates for both groups (gfp+ and gfpÀ sorted cells). 36 base pair single end sequencing was undertaken using an Illumina GA IIx DNA sequencer and the reads were mapped to the Zv8 zebrafish genome and visualised on the UCSC genome browser (http://genome.ucsc.edu/). (Fujita et al., 2011) Fragments per kilobase of transcript per million mapped reads (FKPM) and differential expression levels between experimental groups were determined using Cufflinks (Trapnell et al., 2010). Fold changes were calculated from the FKPM values. For genes where there were no reads in the GFP negative library a value was added to both the GFP positive and negative FKPM values prior to calculating the fold change. This value was calculated using the following formula: 1 + 2 p (average GFP positive FKPM value + GFP negative FKPM value). The predicted molecular and biological functions of genes expressed at threefold greater levels in GFP positive cells compared with GFP negative cells in both biological replicates were determined using the Panther Ontology (Thomas et al., 2003). Sequencing results were validated by quantitative RT-PCR (qRT-PCR) using the remaining total RNA used to make the libraries and by whole mount in situ hybridisation. The raw data (fastq files) have been submitted to the Sequence Read Archive (SRA059568).

4.5.
Quantitative RT-PCR cDNA was transcribed from 0.5 lg of total RNA using Superscript II (Invitrogen) according to the manufacturer's instructions and diluted to 50 ll for RT-PCR. Quantitative RT-PCR was performed in duplicate in 10 ll reactions using 2.5 ll of cDNA, 1· Lightcycler Mastermix (Roche) and 0.5 mM forward and reverse specific primers on a Lightcycler LC480 (Roche) according to the manufacturer's instructions. Primer pairs were designed using NCBI primer-BLAST and are found in Supplementary Table 5. Expression levels were compared to a standard curve and values normalised to ef1a and expressed as the fold change relative to the GFP negative group.

Whole mount in situ hybridisation
Probes were made by PCR amplifying either the whole open reading frame (ORF, if less than 2 kb) or about 1 kb of the 3 0 end of the ORF (if greater 2 kb) of the genes of interest using Sahara mix (Bioline) according to the protocol provided. The primer pairs used for each gene are in Supplementary  Table 6. PCR-amplified transcripts were TOPO cloned (Invitrogen) and Sanger sequenced to determine the orientation of the transcript. Sense and anti-sense in situ probes were made using the appropriate DIG RNA labelling kit (Roche). Whole mount in situ hybridisation was performed as described (Nusslein-Volhard and Dahm, 2002).

Imaging
A Leica APO dissecting microscope mounted with a Coolpix 4500 camera (Nikon) was used to image and photograph embryos after in situ hybridisation and O-dianisidine staining.
Authorship JEC, EP & AE designed and performed experiments and analysed data, JEC wrote the paper, CRB analysed data, AS performed the high-throughput sequencing, NWM and JCS designed experiments, analysed data and wrote the paper.