Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genome-wide mining seed-specific candidate genes from peanut for promoter cloning

  • Cuiling Yuan,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft

    Affiliations Key Laboratory for Tobacco Gene Resources, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, Shandong, China, Shandong Peanut Research Institute, Qingdao, Shandong, China, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China

  • Quanxi Sun ,

    Roles Conceptualization, Data curation, Funding acquisition, Project administration, Writing – original draft, Writing – review & editing

    kongyingzhen@caas.cn (YK); squanxi@163.com (QS)

    Affiliation Shandong Peanut Research Institute, Qingdao, Shandong, China

  • Yingzhen Kong

    Roles Conceptualization, Funding acquisition, Project administration, Writing – review & editing

    kongyingzhen@caas.cn (YK); squanxi@163.com (QS)

    Affiliation Key Laboratory for Tobacco Gene Resources, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, Shandong, China

Abstract

Peanut seeds are ideal bioreactors for the production of foreign recombinant proteins and/or nutrient metabolites. Seed-Specific Promoters (SSPs) are important molecular tools for bioreactor research. However, few SSPs have been characterized in peanut seeds. The mining of Seed-Specific Candidate Genes (SSCGs) is a prerequisite for promoter cloning. Here, we described an approach for the genome-wide mining of SSCGs via comparative gene expression between seed and nonseed tissues. Three hundred thirty-seven SSCGs were ultimately identified, and the top 108 SSCGs were characterized. Gene Ontology (GO) analysis revealed that some SSCGs were involved in seed development, allergens, seed storage and fatty acid metabolism. RY REPEAT and GCN4 motifs, which are commonly found in SSPs, were dispersed throughout most of the promoters of SSCGs. Expression pattern analysis revealed that all 108 SSCGs were expressed specifically or preferentially in the seed. These results indicated that the promoters of the 108 SSCGs may perform functions in a seed-specific and/or seed-preferential manner. Moreover, a novel SSP was cloned and characterized from a paralogous gene of SSCG29 from cultivated peanut. Together with the previously characterized SSP of the SSCG5 paralogous gene in cultivated peanut, these results implied that the method for SSCG identification in this study was feasible and accurate. The SSCGs identified in this work could be widely applied to SSP cloning by other researchers. Additionally, this study identified a low-cost, high-throughput approach for exploring tissue-specific genes in other crop species.

Introduction

Peanut (Arachis hypogaea L., which is also referred to as groundnut) is one of the most important oil crop species worldwide and plays important roles in human nutrition [1]. Peanut seeds, which are rich in oleic acid, linoleic acid, proteins and other nutrients, are ideal bioreactors for the production of foreign recombinant proteins or other beneficial metabolites.

As important molecular tools, promoters are usually used in gene functional analysis [24] and are also widely used for plant quality improvement [58]. Seed-specific promoters (SSPs), which can drive the expression of foreign genes specifically in seeds, are of great importance for genetic engineering of seeds. SSPs have been widely applied in plant molecular pharming, such as that involving golden rice [8], purple endosperm rice [9], purple embryo maize [7] and fish oil canola [10]. The use of SSPs can avoid constitutive expression, which can harm plants [1113]. Moreover, repetitive use of the same promoter when expressing multiple foreign proteins simultaneously is considered inadvisable owing to the likelihood of transcriptional silencing [1416]. Therefore, additional peanut SSPs are needed to overexpress or knock down specific genes, regulate seed development, and modify seed content, especially to produce foreign recombinant proteins or secondary metabolites.

To date, few SSPs from peanut are available, and those that are available were identified from known genes expressed specifically in the seed [1719]. Tissue-specific gene expression provides fundamental information for SSP mining. Several methods have been developed to analyze gene expression differences, such as subtractive hybridization [20], suppression subtractive hybridization [21], differential display reverse transcription PCR [22], and cDNA microarrays [23,24]. However, these methods are limited by their specific shortcomings; for example, only known genes can be recognized by microarray chips [23]. With the decreasing cost of transcriptome sequencing, comparative transcriptome sequencing has been widely used to analyze differences in gene expression [2528]. The diploid peanut ancestors Arachis duranensis (AA) and Arachis ipaensis (BB) are considered the donors of the A and B subgenomes of the allotetraploid cultivated peanut Arachis hypogaea [1]. The release of A. duranensis and A. ipaensis genome sequences [1] made it convenient to obtain genetic information from cultivated peanut. Comparative transcriptome sequencing combined with peanut genome information is a powerful means of genome-wide mining of SSCGs for promoter cloning.

In this study, we described a genome-wide comparative transcriptome sequencing-based approach to identify SSCGs for SSP cloning in peanut. A total of 337 SSCGs were identified from peanut, and the top 108 SSCGs according to their Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) were characterized. On the basis of semiquantitative RT-PCR analysis, 94 SSCGs were expressed in a seed-specific manner, and 14 SSCGs were expressed in a seed-preferential manner. One novel SSP was cloned and characterized to verify its seed specificity in transgenic Arabidopsis. Our results could be widely used in the identification of future peanut SSPs.

Materials and methods

Plant materials and RNA extraction

Plants of the cultivated peanut ‘Shitouqi’ were grown at the Laixi experimental station of the Shandong Peanut Research Institute during the summer of 2016. Leaves, roots, stems, pegs and pod shells were collected at the pod-maturing stage. Developing seeds were collected between 20 and 80 days after flowering. All tissues were flash frozen in liquid nitrogen and then stored at -80°C for transcriptome sequencing.

Total RNA was isolated from different tissues using TRIzol (Life Technologies, Carlsbad, CA, USA) reagent. The quality and quantity of each RNA sample were assayed using a NanoDrop device (Thermo Fisher, MA, USA).

Illumina sequencing and in silico analysis

The RNA extracted from seeds at different development stages was mixed together as Sample I (seed), while the RNA from the leaves, roots, stems, pegs and pod shells were pooled in equimolar amounts as Sample II (nonseed). Both samples were treated and sequenced using an Illumina HiSeqTM 2500 instrument at Gene Denovo Biotechnology Company (Guangzhou, China). Transcript reads containing adaptor sequences were cleaned, and low-quality reads were filtered and removed. The transcript reads of each sample were then mapped to the A. duranensis and A. ipaensis reference genomes [1] by TopHat2 [29].

The gene expression levels were normalized using FPKM methods. To mining SSCGs, the FPKM value of each transcript in Sample I was divided by the value in Sample II using Excel software. The FPKM values of the SSCGs that were less than 10 in Sample I or greater than 10 in Sample II as well as yield values greater than 50 were considered SSCGs. The SSCGs were subsequently listed according to their FPKM value.

GO annotation, chromosomal location and cis-acting element analysis

Functional annotation and Gene Ontology (GO) analyses of the SSCGs were carried out using BLAST2GO (http://www.geneontology.org/). All SSCG sequences and chromosomal location information were obtained from the PeanutBase database (www.peanutbase.org). These genes were mapped onto the chromosome using the MapInspect software program (http://mapinspect.software.informer.com). To identify cis-acting elements, the 2500 bp promoter regions upstream of the ATG initiation codon of the SSCGs were identified using the New PLACE server (https://sogo.dna.affrc.go.jp/cgi_bin/sogo.cgi?lang=en&pj=640&action=page&page=newplace) [30].

Phylogenetic analysis

To study the phylogenetic relationship of the selected SSCGs, multiple alignments of their DNA sequence were performed using the computer program ClustalW. Unrooted phylogenetic trees were constructed in accordance with the neighbor-joining (NJ) method using MEGA 6.0 software, and the bootstrap test was carried out with 1000 iterations.

Expression analysis of SSCGs in A. duranensis and A. ipaensis

The FPKM data of the 108 selected SSCGs within 20 distinct tissues were retrieved from the work of Clevenger et al. [31]. The FPKM normalized read count data of the SSCGs were log2-transformed and displayed in the form of heat maps via HemI [32].

Semiquantitative RT-PCR analysis in cultivated peanut

To confirm the tissue expression specificity in cultivated peanut further, RNA extracted from the leaves, roots, stems, pegs, pod shells and seeds were collected at the pod-maturing stage. Three independent RNA preparations were used for semiquantitative RT-PCR. Twenty-six amplification cycles were used to evaluate and quantify the differences among transcript levels. RT-PCR was performed using the peanut Actin gene as an internal control [33]. PCR was performed using 2*Easy Taq PCR SuperMix (TransGen Biotech, Beijing, China). The PCR conditions were as follows: one initial denaturation step of 94°C for 3 min; 26 cycles of 94°C for 30 s, 58°C for 30 s and 72°C for 30 s; and one final extension step of 72°C for 10 min. Three independent RNA preparations were used for semiquantitative RT-PCR. The primers used for these experiments are listed in S3 Table.

Isolation of an SSP

Peanut genomic DNA was isolated from young leaves of the ‘Shitouqi’ cultivar using a DNAquick Plant System Kit (Tiangen, Beijing, China). Using AHSSP29-specific primers (S3 Table), we performed PCR with PrimeSTAR GXL DNA Polymerase (Takara, Dalian, China). The PCR products were separated by electrophoresis through a 1.5% agarose gel and purified using a gel extraction kit (TransGen Biotech, Beijing, China). All purified PCR products were subcloned into a pEASY-blunt simple vector (TransGen Biotech, Beijing, China). The DNA sequences were sequenced by the Shanghai Sangon Biotechnology Company (Shanghai, China).

The promoter fragment AHSSP29 of SSCG29 was excised from the pEASY-blunt simple vector with the restriction enzymes HindIII and BamHI (Thermo Fisher, MA, USA) and ligated into the corresponding restriction sites of the plant transformation vector pBI121 to produce an AHSSP29::β-glucuronidase (GUS) construct.

Generation of transgenic Arabidopsis plants

The recombinant binary plasmid was transferred to Agrobacterium tumefaciens strain GV3101, and kanamycin-resistant colonies were selected on medium containing 50 μg ml-1 kanamycin. A selected colony was grown to stationary phase at 28°C, and the cells were concentrated by centrifugation and then resuspended in a dipping solution that comprised 5% sucrose, 0.03% Silwet-77, and 10 mM MgCl2 [34]. The seeds were harvested and subsequently stored at room temperature. For screening, the seeds were sterilized in 75% (v/v) ethanol for 3 min and then 2.6% NaClO for 10 min, followed by several washes with sterile water. The transformants were screened on one-half-strength Murashige and Skoog (MS) medium that contained 50 μg ml-1 kanamycin.

Transgene detection in the transgenic progeny of Arabidopsis and GUS histochemical staining

Kanamycin-resistant transgenic Arabidopsis plants were identified using GUS gene-specific primers (S3 Table). The positive transgenic plants were then selfed, after which homozygous T2 progeny were obtained.

The GUS activity was measured as described previously [35]. The samples were incubated with GUS staining buffer (0.1% Triton X-100, 2 mM 5-bromo-4-chloro-3-indolyl-β-D-glucuronide (X-Gluc), and cyclohexyl ammonium salt in 100 mM sodium phosphate buffer, pH 7.0) at 37°C overnight and then decolorized with 70% ethanol.

Results

Genome-wide mining of SSCGs via comparative transcriptome sequencing

To mining SSCGs, two samples of the cultivated peanut ‘Shitouqi’ (Sample I for seed samples and Sample II for nonseed samples) were used for transcriptome sequencing via an Illumina HiSeqTM 2500 system. Approximately 10 Gb of sequence data (approximately 76.79 million reads from Sample I and 78.93 million reads from Sample II, each 300 bp in length) were obtained; after filtering the adaptor sequences and low-quality reads, approximately 75.37 and 77.81 million reads were used for transcriptome assembly, respectively (S1 Table). All of the reconstructed genes were aligned to the reference genome of A. duranensis and A. ipaensis [1] and were subsequently annotated. A comparative transcript profile was established based on the FPKM values of the assembly transcripts. Three hundred thirty-seven SSCGs were ultimately identified and designated sequentially as SSCG1 to SSCG337 according to their FPKM value. The detailed information of these SSCGs, including their gene symbol, chromosomal location, FPKM value and putative function(s), is listed in Table 1 and S2 Table. GO annotation was performed using BLAST2GO, and the 337 SSCGs were categorized with particular GO annotations (S1 Fig, Table 2). Expectedly, these SSCGs were enriched in metabolic process (120) and catalytic activity (108) GO terms, which suggested the presence of vigorous metabolic activity in the seed, in which fatty acids such as oleic acid are converted into linoleic acid by fatty acid desaturase [36]. To identify promoters that are strongly or specifically expressed in the seed, the most abundant top 108 SSCGs were chosen for further analysis. With the decreasing cost of transcriptome sequencing and the release of the peanut ancestor genome, comparative transcriptome sequencing has become an efficient approach for mining tissue-specific genes from peanut and other less studied crop species.

thumbnail
Table 1. List of 108 SSCGs identified from A. duranensis and A. ipaensis by comparative transcriptome sequencing.

https://doi.org/10.1371/journal.pone.0214025.t001

thumbnail
Table 2. GO classification of 337 SSCGs from A. duranensis and A. ipaensis.

https://doi.org/10.1371/journal.pone.0214025.t002

Characterization of the top 108 SSCGs from A. duranensis and A. ipaensis

SSPs are usually isolated from seed storage proteins and/or other proteins related to seed development, such as Brassica napus Napin, which was isolated from a 2S storage protein [37], indicating that gene characterization may reflect the specificity of its promoter. To predict the activity of their promoters, we therefore characterized the 108 SSCGs. Among the top 108 SSCGs, 96 had putative functions, and 12 had unknown functions. The 96 SSCGs were classified into 14 groups according to their annotations, and 54 of those SSCGs were involved in lipid metabolism and seed maturation or coded for nutrient reservoir proteins, allergens, and seed storage proteins (Fig 1C), which revealed that these top 108 SSCGs might perform functions within peanut seeds.

thumbnail
Fig 1. Characterization of the top 108 SSCGs from A. duranensis and A. ipaensis.

(A) Chromosomal distribution of the 108 SSCGs. The chromosome numbers are shown at the top of each chromosome (black bars). The names on the left of each chromosome correspond to the approximate location of each SSCG. (B) Numbers of SSCGs on each chromosome of A. duranensis and A. ipaensis. (C) Functional classification of the 108 SSCGs.

https://doi.org/10.1371/journal.pone.0214025.g001

As shown in Fig 1A, SSCGs were randomly dispersed across 10 chromosomes. In A. duranensis, chromosome A6 contained the greatest number of SSCGs (15), while chromosome A4 contained the fewest SSCGs (1). In A. ipaensis, 13 SSCGs were distributed on chromosome B6, whereas only 3 SSCGs were found on chromosomes B1 and B3 (Fig 1B). Several SSCGs were located on the chromosomes in clusters; for example, 6 SSCGs (SSCG2, SSCG3, SSCG7, SSCG36, SSCG42, SSCG75) were within the 1.26–1.8 cM region on chromosome A6 (Fig 1A); functional prediction revealed that these SSCGs encoded nutrient reservoir proteins (Table 1). SSCG14 and SSCG23, both of which coded for seed linoleate 9S-lipoxygenase, were located at the same locus of chromosome B8. These results suggested that these clustered genes might function together in coordination.

In this study, we identified 39 orthologous gene pairs between A. duranensis and A. ipaensis based on phylogenetic relationships (S2 Fig, Table 3), among which 36 orthologous gene pairs were found at the syntenic locus on the A. duranensis and A. ipaensis chromosomes (Fig 1A, Table 3). The orthologous genes from A. duranensis and A. ipaensis exhibited similar functions; for example, both SSCG63 (A9) and SSCG103 (B9) encode the AWPM-19-like family protein, and both SSCG87 (B9) and SSCG100 (A9) encode the papain family cysteine protease (Tables 1 and 3). Although the sequences of some orthologous gene pairs are highly similar, their promoter sequences were sometimes quite different. For example, SSCG43 (Araip.213GN) and SSCG94 (Aradu.440M4) had the same sequence, but their promoter sequences were quite different. Whether the promoters of orthologous gene pairs displayed the same specificity needs to be further determined. The location of 2 SSCGs in the A genome (SSCG21 and SSCG93) did not correspond to the same location of their orthologous genes in the B genome (SSCG12 and SSCG89). Interestingly, SSCG53, located on chromosome B7, had the same sequence as its orthologous gene, SSCG54, on chromosome B10.

thumbnail
Table 3. Orthologous gene pairs of the top 108 SSCGs from A. duranensis and A. ipaensis.

https://doi.org/10.1371/journal.pone.0214025.t003

Expression patterns of the top 108 SSCGs

To confirm the tissue expression specificity of the top 108 SSCGs, we first analyzed the expression profiles using the expression information provided by Clevenger et al. [31]. The heat map results showed that all the top 108 genes were expressed in the seed; most were expressed only in the seed, whereas the rest were preferentially expressed in the seed (Fig 2). The expression patterns of the orthologous genes from the A and B genomes were similar. For example, SSCG12 and SSCG21 were highly expressed during the Pt6, Pt7, Pt8 and Pt10 seed stages but weakly expressed in other tissues, such as mainstem leaves, the reproductive shoot tip, nodule roots, stamens and the aerial gynophore tip. SSCG78 was expressed in the early seed development stage (SeedPt5-7), while SSCG106 was expressed in the late seed development stage (SeedPt7, 8, 10). Their promoters could be used to express genes at different seed development stages. Notably, SSCG1-12 was extremely highly expressed in the seeds, and specifically, SSCG1 and SSCG6 were abundantly expressed during all five seed development stages (Fig 2). Functional prediction analysis revealed that these SSCGs encoded nutrient reservoir proteins or allergen proteins (Table 1), whose transcripts are considered widely expressed specifically in mature peanut seed [38,39].

thumbnail
Fig 2. Expression profiles of the top 108 SSCGs in 20 different tissues of A. duranensis and A. ipaensis.

The FPKM data of 20 distinct tissues for the top 108 SSCGs were retrieved from the work of Clevenger et al. [31]. The FPKM value of each gene was log2-transformed and displayed in the form of heat maps by HemI. The color scale in the lower right represents the relative expression level: green represents a low level, and red indicates a high level. Twenty different tissues are shown on top of the heat map. The SSCGs are listed on the right of the heat map.

https://doi.org/10.1371/journal.pone.0214025.g002

We further examined the tissue expression specificity of the SSCGs in cultivated peanut via semiquantitative RT-PCR. Because the orthologous gene pairs had similar sequences, they were considered a single gene, and to investigate their expression patterns, primers were designed based on their same sequence. As shown in Fig 3, similar to the heat map results, most of these 108 SSCGs were expressed specifically and/or preferentially in the seed. Ninety-four out of the 108 SSCGs were expressed exclusively in the seed, accounting for 87%. Only a few SSCGs (SSCG13, 25, 41, 44, 51, 52, 58, 70, 75, 83, 84, 86, 88, 98) were also weakly expressed in other tissues, such as the roots, stems, pegs, pod shells and leaves.

thumbnail
Fig 3. Semiquantitative RT-PCR analysis of the top 108 SSCGs in different tissues of cultivated peanut.

Orthologous genes were considered a single gene. Expression patterns were detected in the roots (Rt), stems (St), leaves (Lf), pegs (Pg), pod shells (Ps) and seeds (Sd) using the Actin gene as an internal control.

https://doi.org/10.1371/journal.pone.0214025.g003

Overall, based on the expression pattern analysis above, the SSCGs described in this study are potential resources for seed-specific and/or preferential promoter cloning.

Cis-acting elements in the promoter regions of the top 108 SSCGs

Gene expression specificity was mediated by cis-elements in the promoter region [40,41]. To identify the regulatory cis-elements in the promoter region of SSCGs, we extracted the 2500 bp promoter sequence upstream of the start codon of the top 108 SSCGs. The results showed that there were 92 promoters containing RY REPEAT motifs and 33 promoters containing GCN4 motifs. Thirty-seven promoters contained more than three RY REPEAT motifs, and there were five motifs in SSCG28 (Aradu.DWL7L) and SSCG99 (Aradu.UJ6Z9) and six in SSCG74 (Aradu.9S6MI). Twenty-nine promoter sequences contained both motifs (Table 4). The RY REPEAT (CATGCA) [42] and GCN4 (TGAGTCA) [43,44] motifs are commonly located within seed- and/or embryo-specific promoter sequences. These results implied that most of the promoters of the top 108 SSCGs were seed specific.

thumbnail
Table 4. Numbers of two elements, RY REPEAT and GCN4 elements, in the promoter region of the top 108 SSCGs from A. duranensis and A. ipaensis.

https://doi.org/10.1371/journal.pone.0214025.t004

Characterization of an SSP

To verify promoter tissue specificity, we isolated a 2771 bp promoter fragment (Arachis Hypogaea Seed-Specific Promoter 29, AHSSP29) from the cultivated cultivar peanut ‘Shitouqi’ according to the reference sequence of SSCG29 (Aradu.YC8MH) in its ancestor A. duranensis. SSCG29 encodes a vicilin-like seed storage protein. Several cis-acting elements, including one GCN4 motif [43,44], two RY REPEATs [42], and three 2SSEEDPROTBANAPAs [45], which commonly exist in SSPs, were detected in the AHSSP29 sequence (Table 5). AHSSP29 was then substituted with the CamV35S promoter in a pBI121 vector to produce a AHSSP29::GUS construct, which was subsequently transformed into Arabidopsis. GUS histochemical assays revealed GUS staining in all parts of the seed (Fig 4A–4C), with the exception of the seed testa. GUS staining was hard to observe in seed wrapped in a testa (Fig 4A), while GUS activity was clearly visible in the germinating seed that lacked a testa (Fig 4B and 4C). Definitive staining was also observed in the cotyledons and hypocotyls of the seedlings (Fig 4D), which are components of the seed. No GUS activity was detected in the leaves, stems, flowers, roots and siliques at any time during the plant life cycle (Fig 4E–4H). Nontransformed Arabidopsis plants did not display GUS activity in their mature seeds or any parts of the plants. These results suggested that the AHSSP29 promoter was an SSP.

thumbnail
Fig 4. Functional characterization of the putative promoter AHSSP29 in transgenic Arabidopsis.

(A) Mature seed wrapped in a testa. (B-C) Germinating seed without a testa. (D) Young seedlings with two true leaves. (E) Adult plant. (F) Stem and flower of adult plants.(G-H) Siliques.

https://doi.org/10.1371/journal.pone.0214025.g004

thumbnail
Table 5. Putative cis-acting elements in the AHSSP29 promoter sequence.

https://doi.org/10.1371/journal.pone.0214025.t005

Discussion

SSPs are valuable tools for the genetic engineering of seed, especially for seed bioreactor research. Peanut seeds are ideal bioreactors for the production of foreign recombinant proteins and other nutrient metabolites. However, only a few seed-specific and/or seed-preferential promoters have been identified from peanut [1719,46]. Expressing multiple foreign genes using the same promoters is ill advised [1416]. Therefore, additional SSPs are urgently needed. In this study, we established an effective method for the genome-scale mining of SSCGs via comparative transcriptome sequencing of a mixture of nonseed tissue and seed tissue. A total of 337 SSCGs were identified, and 108 SSCGs in A. duranensis and A. ipaensis were further characterized. At least 94 SSCGs were confirmed via semiquantitative RT-PCR to be expressed specifically in the seed in cultivated peanut, and the rest were preferentially expressed in the seed. This study provided a valuable resource for seed-specific and/or seed-preferential promoter cloning.

Among the 108 identified SSCGs, most functioned in relation to seed development or coded for allergen proteins or storage proteins (Fig 1C, Table 1). For example, SSCG1-7 and SSCG9, which encoded allergen proteins, were homologous genes and were extremely highly expressed according to their FPKM values (Table 1), heat map results (Fig 2) and semiquantitative RT-PCR analysis (Fig 3). Peanut allergen proteins were reported to be expressed exclusively in the seed [39] and accounted for a considerable amount of the total seed protein in peanut [47]. This finding is in accordance with the abundant expression of SSCG1-7 and SSCG9 in the peanut seed. These results indicated that these SSCGs were expressed specifically in the seed, and these SSCGs that were most abundantly expressed were the focus of our subsequent promoter cloning.

Studies have shown that several cis-acting elements in promoter sequences are responsible for mediating gene expression specificity. For example, the cis-acting elements RY REPEAT and GCN4 are conserved among many SSPs [42,43]. These cis-acting elements were also present throughout most of the SSCGs in this study, which implied that the promoters in most of the SSCGs might drive gene expression in a seed-specific manner. Several promoters of these SSCGs have been characterized as SSPs. For example, the promoter of an SSCG5 paralogous gene, which encodes an allergen protein, was isolated and characterized as an SSP [19]. Together with the novel SSP AHSSP29 of SSCG29 (Aradu.YC8MH) identified in this study, which contained 2 RY REPEAT and 1 GCN4 elements, the results indicate that the SSCG mining strategy in this study seemed effective and accurate. Once these promoters are isolated and characterized, they could be widely used for allergen reduction via gene editing technologies and for other research on seed quality improvement.

Geng et al. [48] introduced a method for tissue-specific promoter cloning by comparing expression levels among three tissues: leaves, roots, and seeds. A total of 316 seed-specific candidate transcript assembly contigs (TACs) were identified. In addition, 64.6% of select TACs were expressed exclusively in the seed and not in the leaves, stems, or roots [48]. However, to date, no SSPs have been identified based on these data, which may be attributed to insufficient transcriptome data and the lack of reference genome information. In our study, only two samples were chosen for transcriptome sequencing: seeds from different development stages and a mixture of nonseed tissue from six tissues (including roots, stems, leaves, flowers, pegs, and pod shells). It is much less expensive to sequence the transcriptome of nonseed tissue mixtures than to sequence each individual tissue. Moreover, it becomes simpler and more accurate to screen SSCGs by comparing two samples rather than by comparing numerous samples. Consequently, 337 SSCGs were identified, and 87% of the top 108 SSCGs were expressed exclusively in the seed and not in the five measured tissues (roots, stems, leaves, pegs, and pod shells). These results indicated that additional tissues were necessary as part of the nonseed sample to compare gene expression differences with seed samples. This SSCG information, such as the gene symbols, can be obtained conveniently from Table 1 and S2 Table. Researchers could easily download SSCGs of interest from the PeanutBase website according to this information. With the decreasing transcriptome sequencing cost and the release of the peanut genome, mining tissue-specific genes from peanut via comparative transcriptome sequencing has become a robust approach. For example, contamination with aflatoxin, which is produced in infected peanut seeds by Aspergillus flavus, is one of the major problems in peanut production. Given that peanut pericarps are barriers against A. flavus, pericarp-specific promoters are a good choice for expressing A. flavus-resistant genes specifically in the pericarp to prevent aflatoxin contamination. Pericarp-specific promoters could be identified by the strategy presented in this study.

Conclusions

We identified 337 SSCGs by comparative RNA sequencing (RNA-seq) between seed and nonseed tissues. The top 108 SSCGs, according to their FPKM, were characterized, among which 94 were expressed specifically in the seed, and 14 were preferentially expressed in the seed. In addition, a novel SSP, AHSSP29, was functionally characterized. The strategy presented in this study could facilitate the future exploration of tissue-specific promoters in other crop species. Additionally, the SSCGs identified in this work could be widely applied for SSP cloning by other researchers.

Supporting information

S1 Fig. GO annotation of 337 SSCGs identified from A.duranensis and A.ipaensis.

The Y-axis represents the number of genes in a category.

https://doi.org/10.1371/journal.pone.0214025.s001

(JPG)

S2 Fig. Phylogenetic relationships of the 108 SSCGs.

The phylogenetic tree was constructed with MEGA 6.0 using the NJ method with 1000 bootstrap replicates based on a multiple alignment of 108 SSCGs from A.duranensis and A.ipaensis.

https://doi.org/10.1371/journal.pone.0214025.s002

(JPG)

S1 Table. Summary of the sequence data from Illimina sequencing.

https://doi.org/10.1371/journal.pone.0214025.s003

(DOCX)

S2 Table. List of SSCGs (SSCG109-337) identified from A.duranensis and A.ipaensis by comparative transcriptome sequencing.

https://doi.org/10.1371/journal.pone.0214025.s004

(DOCX)

References

  1. 1. Bertioli DJ, Cannon SB, Froenicke L, Huang G, Farmer AD, Cannon EK, et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat Genet. 2016;48(4):438–46. pmid:26901068
  2. 2. Xu N, Wang R, Zhao L, Zhang C, Li Z, Lei Z, et al. The Arabidopsis NRG2 Protein Mediates Nitrate Signaling and Interacts with and Regulates Key Nitrate Regulators. Plant Cell. 2016;28(2):485–504. pmid:26744214
  3. 3. Pessina S, Angeli D, Martens S, Visser RG, Bai Y, Salamini F, et al. The knock-down of the expression of MdMLO19 reduces susceptibility to powdery mildew (Podosphaera leucotricha) in apple (Malus domestica). Plant Biotechnol J. 2016; 14(10):2033–44. pmid:26997489
  4. 4. Chang Y, Shen E, Wen L, Yu J, Zhu D, Zhao Q. Seed-Specific Expression of the Arabidopsis AtMAP18 Gene Increases both Lysine and Total Protein Content in Maize. PLoS One. 2015;10(11):e0142952. pmid:26580206
  5. 5. Wei ZY, Zhang YY, Wang YP, Fan MX, Zhong XF, Xu N, et al. Production of Bioactive Recombinant Bovine Chymosin in Tobacco Plants. Int J Mol Sci. 2016;17(5). pii: E624. pmid:27136529
  6. 6. Aluru M, Xu Y, Guo R, Wang Z, Li S, White W, et al. Generation of transgenic maize with enhanced provitamin A content. J Exp Bot. 2008; 59(13):3551–62. pmid:18723758
  7. 7. Liu X, Yang W, Mu B, Li S, Li Y, Zhou X, et al. Engineering of "Purple Embryo Maize" with a multigene expression system derived from a bidirectional promoter and self-cleaving 2A peptides. Plant Biotechnol J. 2018;16(6):1107–1109. pmid:29337409
  8. 8. Paine J, Shipton C, Chaggar S, Howells R, Kennedy M, Vernon G, et al. Improving the nutritional value of Golden Rice through increased pro-vitamin A content. Nat Biotechnol. 2005;23(4):482–7. pmid:15793573
  9. 9. Zhu Q, Yu S, Zeng D, Liu H, Wang H, Yang Z, et al. Development of "Purple Endosperm Rice" by Engineering Anthocyanin Biosynthesis in the Endosperm with a High-Efficiency Transgene Stacking System. Mol Plant. 2017;10(7):918–929. pmid:28666688
  10. 10. Napier JA, Olsen RE, Tocher DR. Update on GM canola crops as novel sources of omega-3 fish oils. Plant Biotechnol J. 2018. pmid:30485634
  11. 11. Hsieh TH, Lee JT, Charng YY, Chan MT. Tomato plants ectopically expressing Arabidopsis CBF1 show enhanced resistance to water deficit stress. Plant Physiol. 2002;130: 618–626. pmid:12376629
  12. 12. Zhong R, Demura T, Ye ZH. SND1, a NAC domain transcription factor, is a key regulator of secondary wall synthesis in fibers of Arabidopsis. Plant Cell. 18: 3158–3170. pmid:17114348
  13. 13. Hood EE, Bailey MR, Beifuss K, Magallanes-Lundback M, Horn ME, Callaway E, et al. Criteria for high-level expression of a fungal laccase gene in transgenic maize. Plant Biotechnol J. 2003;1(2):129–40. pmid:17147750
  14. 14. De WC, Van HH, De BS, Angenon G, De JG, Depicker A. Plants as bioreactors for protein production: avoiding the problem of transgene silencing. Plant Mol Biol. 2000;43(2–3):347–59. pmid:10999415
  15. 15. Naqvi S, Farré G, Sanahuja G, Capell T, Zhu C, Christou P. When more is better: multigene engineering in plants. Trends Plant Sci. 2010;15(1):48–56. pmid:19853493
  16. 16. Abbadi A, Domergue F, Bauer J, Napier JA, Welti R, Zahringer U, et al. Biosynthesis of very-long-chain polyunsaturated fatty acids in transgenic oilseeds: constraints on their accumulation. Plant Cell. 16: 2734–2748. pmid:15377762
  17. 17. Yang P, Zhang F, Luo X, Zhou Y, Xie J. Histone deacetylation modification participates in the repression of peanut (Arachis hypogaea L.) seed storage protein gene Ara h 2.02 during germination. Plant Biol. 2015;17(2):522–7. pmid:25262939
  18. 18. Sunkara S, Bhatnagar-Mathur P, Sharma KK. Isolation and functional characterization of a novel seed-specific promoter region from peanut. Appl Biochem Biotechnol. 2014;172(1):325–39. pmid:24078220
  19. 19. Fu G, Zhong Y, Li C, Yin L, Lin X, Liao B, et al. Epigenetic regulation of peanut allergen gene Ara h 3 in developing embryos. Planta. 2010;231(5):1049–60. pmid:20157727
  20. 20. Zimmermann CR, Orr WC, Leclerc RF, Barnard EC, Timberlake WE. Molecular cloning and selection of genes regulated in Aspergillus development. Cell. 21: 709–715. pmid:6449291
  21. 21. Abid G, Sassi K, Muhovski Y, Jacquemin JM, Mingeot D, Tarchoun N, et al. Identification and Analysis of Differentially Expressed Genes During Seed Development Using Suppression Subtractive Hybridization (SSH) in Phaseolus vulgaris. Plant Mol Biol Rep. 2012;30:719–730.
  22. 22. Park JS, Kim IS, Cho MS, Park S, Sang GP. Identification of differentially expressed genes involved in spine formation on seeds of Daucus carota L. (carrot), using annealing control primer (ACP) system. J Plant Biol. 2006;49: 133–140.
  23. 23. Liu X, Tian J, Zhou X, Chen R, Wang L, Zhang C, et al. Identification and characterization of promoters specifically and strongly expressed in maize embryos. Plant Biotechnol J. 2014;12(9):1286–96. pmid:25052028
  24. 24. Nie DM, Ouyang YD, Wang X, Zhou W, Hu CG, Yao JL. Genome-wide analysis of endosperm-specific genes in rice. Gene. 2013;530(2):236–47. pmid:23948082
  25. 25. Li MY, Wang F, Jiang Q, Ma J, Xiong AS. Identification of SSRs and differentially expressed genes in two cultivars of celery (Apium graveolens L.) by deep transcriptome sequencing. Hortic Res. 2014;1:10. pmid:26504532
  26. 26. Buchananwollaston V, Page T, Harrison E, Breeze E, Lim PO, Nam HG, et al. Comparative transcriptome analysis reveals significant differences in gene expression and signalling pathways between developmental and dark/starvation-induced senescence in Arabidopsis. Plant J. 2005;42(4):567–85. pmid:15860015
  27. 27. Zhao X, Li C, Wan S, Zhang T, Yan C, Shan S. Transcriptomic analysis and discovery of genes in the response of Arachis hypogaea to drought stress. Mol Biol Rep. 2018;45(2):119–131. pmid:29330721
  28. 28. Ezura K, Jiseong K, Mori K, Suzuki Y, Kuhara S, Ariizumi T, et al. Genome-wide identification of pistil-specific genes expressed during fruit set initiation in tomato (Solanum lycopersicum). PLoS One. 2017;12(7):e0180003. pmid:28683065
  29. 29. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg S. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. pmid:23618408
  30. 30. Higo K, Ugawa Y, Iwamoto M, Korenaga T. Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res. 1999;27(1):297–300. pmid:9847208
  31. 31. Clevenger J, Chu Y, Scheffler B, Oziasakins P. A Developmental Transcriptome Map for Allotetraploid Arachis hypogaea. Front Plant Sci. 2016;7:1446. eCollection 2016. pmid:27746793
  32. 32. Deng W, Wang Y, Liu Z, Cheng H, Xue Y. HemI: A Toolkit for Illustrating Heatmaps. PLoS One. 2014;9(11):e111988. pmid:25372567
  33. 33. Chi X, Hu R, Yang Q, Zhang X, Pan L, Chen N, et al. Validation of reference genes for gene expression studies in peanut by quantitative real-time RT-PCR. Mol Genet Genomics. 2012;287(2):167–76. pmid:22203160
  34. 34. Clough SJ, Bent AF. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 1998;16: 735–743. pmid:10069079
  35. 35. Jefferson RA, Kavanagh TA, Bevan MW. GUS fusions: beta-glucuronidase as a sensitive and versatile gene fusion marker in higher plants. EMBO J. 1987; 6: 3901–3907. pmid:3327686
  36. 36. Chi X, Yang Q, Pan L, Chen M, He Y, Yang Z, et al. Isolation and characterization of fatty acid desaturase genes from peanut (Arachis hypogaea L.). Plant Cell Rep. 2011;30(8):1393–404. pmid:21409552
  37. 37. Stålberg K, Ellerström M, Josefsson LG, Rask L. Deletion analysis of a 2S seed storage protein promoter of Brassica napus in transgenic tobacco. Plant Mol Biol. 1993;23(4):671–83. pmid:8251622
  38. 38. Jiang S, Wang S, Sun Y, Zhou Z, Wang G. Molecular characterization of major allergens Ara h 1, 2, 3 in peanut seed. Plant Cell Rep. 2011;1135–43. pmid:21305299
  39. 39. Kang IH, Srivastava P, Ozias-Akins P, Gallo M. Temporal and Spatial Expression of the Major Allergens in Developing and Germinating Peanut Seed. Plant Physiol. 144: 836–845. pmid:17468222
  40. 40. Zhang Y, Sun T, Liu S, Dong L, Liu C, Song W, et al. MYC cis-Elements in PsMPT Promoter Is Involved in Chilling Response of Paeonia suffruticosa. PLoS One. 2016;11(5):e0155780. pmid:27228117
  41. 41. Li N, Chen J, Yang F, Wei S, Kong L, Ding X, et al. Identification of two novel Rhizoctonia solani-inducible cis-acting elements in the promoter of the maize gene, GRMZM2G315431. Sci Rep. 2017; 7:42059. pmid:28163300
  42. 42. Ezcurra I, Ellerström M, Wycliffe P, Stålberg K, Rask L. Interaction between composite elements in the napA promoter: both the B-box ABA-responsive complex and the RY/G complex are necessary for seed-specific expression. Plant Mol Biol. 1999;40(4):699–709. pmid:10480393
  43. 43. Onodera Y, Suzuki A, Wu CY, Washida H, Takaiwa F. A rice functional transcriptional activator, RISBZ1, responsible for endosperm-specific expression of storage protein genes through GCN4 motif. J Biol Chem. 2001;276(17):14139–52. pmid:11133985
  44. 44. Washida H, Wu CY, Suzuki A, Yamanouchi U, Akihama T, Harada K, et al. Identification of cis-regulatory elements required for endosperm expression of the rice storage protein glutelin gene GluB-1. Plant Mol Biol. 1999;40(1):1–12. pmid:10394940
  45. 45. Stålberg K, Ellerstöm M, Ezcurra I, Ablov S, Rask L. Disruption of an overlapping E-box/ABRE motif abolished high transcription of the napA storage-protein promoter in transgenic Brassica napus seeds. Planta. 1996;199(4):515–9. pmid:8818291
  46. 46. Tang G, Xu P, Liu W, Liu Z, Shan L. Cloning and Characterization of 5' Flanking Regulatory Sequences of AhLEC1B Gene from Arachis Hypogaea L. PLoS One. 2015;10(10):e0139213. pmid:26426444
  47. 47. Knoll JE, Ramos ML, Zeng Y, Holbrook CC, Chow M, Chen S, et al. TILLING for allergen reduction and improvement of quality traits in peanut (Arachis hypogaea L.). BMC Plant Biol. 2011;11:81. pmid:21569438
  48. 48. Geng L, Duan X, Liang C, Shu C, Song F, Zhang J. Mining Tissue-specific Contigs from Peanut (Arachis hypogaea L.) for Promoter Cloning by Deep Transcriptome Sequencing. Plant Cell Physiol. 2014;55(10):1793–801. pmid:25231965