De novo transcriptome assembly of the Chinese pearl barley, adlay, by full-length isoform and short-read RNA sequencing

Adlay (Coix lacryma-jobi) is a tropical grass that has long been used in traditional Chinese medicine and is known for its nutritional benefits. Recent studies have shown that vitamin E compounds in adlay protect against chronic diseases such as cancer and heart disease. However, the molecular basis of adlay's health benefits remains unknown. Here, we generated adlay gene sets by de novo transcriptome assembly using long-read isoform sequencing (Iso-Seq) and short-read RNA-Sequencing (RNA-Seq). The gene sets obtained from Iso-seq and RNA-seq contained 31,177 genes and 57,901 genes, respectively. We confirmed the validity of the assembled gene sets by experimentally analyzing the levels of prolamin and vitamin E biosynthesis-associated proteins in adlay plant tissues and seeds. We compared the screened adlay genes with known gene families from closely related plant species, such as rice, sorghum and maize. We also identified tissue-specific genes from the adlay leaf, root, and young and mature seed, and experimentally validated the differential expression of 12 randomly-selected genes. Our study of the adlay transcriptome will provide a valuable resource for genetic studies that can enhance adlay breeding programs in the future.

Gene expression studies help select significant candidate genes for transcriptome profiling to gain better understanding of cellular pathways and metabolism in plants [42]. Genomewide transcriptome studies for adlay are currently lacking. In this study, we combined Illumina short-read sequencing with single-molecule real-time (SMRT) sequencing to generate a complete and full-length transcriptome of adlay. Transcriptome analysis revealed several tissue-specific and stage-specific gene expression patterns. Comparative analysis of gene families between species revealed that adlay was closest in relation to sorghum. Our comprehensive transcriptome data set provides a valuable resource for further genetic research on adlay,

Plant material and RNA preparation
Coix lacryma-jobi L. (voucher # IT101241, var. ma-yuen Stapf) was obtained from the National Institute of Agricultural Sciences (NAS, Wanju, Korea). Ninety-eight-day-old leaves, roots, and young seeds and 159-day-old mature seeds were harvested from adlay plants and stored at -80˚C until used. RNA-sequencing (RNA-Seq) and isoform sequencing (Iso-Seq) were performed on leaf, root, young seed, and mature seed tissues (S1 Fig). Three biological replicates were used for RNA-seq. Total RNA was extracted from leaf, root, and the two seed stages (young and mature) using RNeasy Plant Mini kit (Qiagen, Inc., USA). RNA purity was determined by assaying 1 μl total RNA on a NanoDrop8000 Spectrophotometer (ThermoFisher, USA). Total RNA integrity was checked using an Agilent Technologies 2100 Bioanalyzer with minimum integrity option.

Long-read sequencing for Iso-Seq
To produce full-length isoforms, long-read sequencing was performed using Pacific Biosciences (PacBio, USA) SMRT sequencing. Libraries were prepared from cDNAs, and cycle optimization was performed to determine the optimal number of cycles for large-scale PCR. We prepared three fractions of cDNAs (1-2 Kbp, 2-3 Kbp, and 3-6 Kbp) using the BluePippin size selection system (Sage Science, USA). The SMRTbell library was constructed using the SMRTbell™ Template Prep Kit (PN 100-259-100; PacBio). The DNA/Polymerase Binding Kit P6 (PacBio) was used for DNA synthesis after the sequencing primer annealed to the SMRTbell template. MagBead Kit (PacBio) was used to attach the cDNA library to MagBeads before sequencing. MagBead-bound cDNA complexes result in increased number of reads per SMRT cell. The polymerase-SMRTbell-adaptor complex was then loaded into zero-mode waveguides (ZMWs). The SMRTbell library was sequenced using 8 SMRT cells using C4 chemistry (DNA sequencing Reagent 4.0; PacBio) and 1 × 240-minute movies were captured for each SMRT cell using the PacBio RS II sequencing platform.
with random hexamers were reverse transcribed into first-strand cDNAs using reverse transcriptase, random hexamer primers, and dUTPs in place of dTTPs. A single A-base was added to the cDNA fragments and the adapter was subsequently ligated to the cDNA. The products were purified and enriched by polymerase chain reaction (PCR) to create the final strand-specific cDNA library. The quality of the amplified libraries was verified by capillary electrophoresis using Bioanalyzer 2100 (Agilent, Germany). Quantitative PCR (qPCR) was performed using SYBR Green PCR Master Mix (Applied Biosystems, ThermoFisher, USA). We pooled together equimolar amounts of libraries that were index-tagged. The cBot2 system was used for automated cluster generation in the flow cell. (Illumina Inc., USA). The flow cell was loaded on HiSeq 2500 sequencing system (Illumina Inc.) and cDNA was sequenced at a read length of 2 × 100 bp.

De novo transcriptome assembly
All transcriptome data were processed using next generation sequencing quality control toolkit [43] to remove nonsense sequences such as adaptors, primers, and low quality sequences (Phred quality score of less than 20). Raw data were processed to remove ribosomal RNA using riboPicker v0.4.3 [44]. The processed reads were then assembled using Trinity [45]. Assembly statistics were calculated using in-house Perl scripts. Assembled transcripts were clustered using CD-HIT-EST v4.6.1 [46] to reduce sequence redundancy. Sequence identity threshold and alignment coverage (for shorter sequences) were both set at 90% to generate clusters. Such clustered transcripts were defined as reference transcripts in this work. For functional annotation, RNA-Seq unigenes were screened using a fragments per kilobase of transcript per million mapped reads (FPKM) criterion of � 1 for each unigene, and nonredundant Iso-Seq unigens were screened using different libraries (<2, 2-3, and >3 kb) with considerably larger average lengths (�340 bp).

Functional annotation
All the assembled genes were annotated by the Basic Local Alignment Search Tool (BLAST) program [47] using databases such as NCBI non-redundant protein database (NR, https:// www.ncbi.nlm.nih.gov/), UniProtKB/Swiss-Prot (Swissprot, https://web.expasy.org/), Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/), and Gene Ontology Consortium (GO, http://www.geneontology.org/) with an Expect Value (E-value) cutoff of 10 −5 . The best-aligned sequences were selected to annotate genes. In the event of alignment conflicts between databases, Swissprot alignments were preferentially selected. For functional categorization, GO terms were produced by Blast2GO program [48] with an E-value threshold of 10 −5 and GO terms were classified into three major categories using the Web Gene Ontology Annotation Plot (WEGO) [49].

Gene expression analysis
Expressed genes from each tissue were aligned with the adlay transcriptome assembly using Bowtie2 [50]. Genes that aligned with non-redundant assembled transcript sequences were quantified as FPKMs (at 90% sequence similarity by CD-HIT-EST). Quantification was performed using RNA-Seq by Expectation Maximization v1.2.25 (RSEM) [51]. Differential expression analysis was performed using DESeq2 packages (Bioconductor) [52]. Differentially expressed genes during seed development (young and mature seed stages) were screened from three biological replicate experiments based on log fold change values > 1 (p values < 0.001). For specificity classification, each tissue/organ was classified into two categories based on the FPKM values. A ten-fold increase in FPKM values was used as a criterion for tissue specificity and a five-fold increase was used as a criterion for tissue enrichment. ClueGo plug-in tool [53] in Cytoscape v3.3.0 was used to identify over-representation of GO categories, such as biological processes. We also screened for potential transcription factor families in the Plant TF Database [54] (PlantTFDB 4.0; http://planttfdb.cbi.pku.edu) using BLASTX with an E-value threshold of 10 −5 .

Gene family identification
OrthoMCL [55] (version 2.0.3) was used to identify orthologous and paralogous gene clusters of adlay with three other members of the Poaceae family, rice (O. sativa, NCBI accession number GCF_001433935), sorghum (S. bicolor, GCF_000003195), and maize (Z. mays, GCF_00000 5005). Protein sequences that were shorter than 10 amino acids or contained >20% stop codons were removed. All-versus-all analysis was performed with BLASTP version 2.2.25+ (E value threshold 10 −5 ). The protein pairs obtained from BLASTP analysis were processed using the OrthoMCL program (http://orthomcl.org/orthomcl/).

Quantitative real-time PCR analysis
The quality of RNA isolated from C. lacryma-jobi was checked on ethidium bromide-stained agarose gels, and RNA concentration was calculated based on the measured optical density of the samples at 260 and 280 nm using a DropSense96C Spectrophotometer (Trinean, Belgium). One microgram of total RNA was used for cDNA synthesis, which was performed using a SuperScript™ III first strand RT-PCR kit (Invitrogen, USA) with an oligo (dT) 20 primer. Quantitative real-time PCR (qRT-PCR) was performed on synthesized cDNA using gene-specific primers (S1 Table). PCR was optimized and performed using the Roche LightCycler 480 II and SYBR Green Real-Time PCR Master Mix (Bio-Rad,Inc., CA). Reaction conditions included an initial denaturation at 95˚C for 30 s, 40 cycles of denaturation at 95˚C for 10 s, and annealing and extending at 55˚C for 15 s. The relative expressions of specific genes were quantified using the 2 -ΔΔCt method [56].

Verification and identification of prolamin-coding genes
Genes that encoded for prolamin proteins (coixins in Coix lacryma-jobi) were analyzed by BLAST using the NCBI non-redundant database. Longest open reading frames (ORFs) and amino acid sequences of selected genes that showed similarity to prolamins were predicted by TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki). The theoretical isoelectric point (pI) and molecular weight of the predicted protein sequences were calculated using the pI/Mw computing tool in ExPASy (https://web.expasy.org/compute_pi/). For phylogenetic analysis, multiple sequence alignments of the predicted amino acid sequences were performed using MUSCLE tool (MEGA 7 software; https://www.megasoftware.net/). Based on the alignments, the phylogenetic tree was generated using the Neighbor-joining method (MEGA 7) with the following parameters: Poisson model, pair-wise gap deletion and 1,000 bootstraps. Protein sequences of the previously reported 31 prolamin genes in C. lacryma-jobi [28,29] were also included in the tree for comparison.
Adlay prolamin was extracted based by the method described in a previous study [29]. Young and mature seeds were finely pulverized using liquid nitrogen. Hundred milligrams of pulverized adlay seeds were mixed with 1 ml of 0.5 M NaCl at room temperature for 1 hour and centrifuged at 4˚C, 12,000 × g for 5 minutes. The supernatant was removed and the pellet containing prolamin was washed three times with cold, sterilized water and dried at room temperature. The pellet was mixed with 500 μl of 10 mM 1,4-dithiothreitol (DTT) in 55% v/v isopropanol for 1 hour at room temperature. After centrifugation, 200 μl supernatant and 800 μl cold acetone were mixed and stored at -20˚C for more than 3 hours to precipitate prolamin. For two-dimensional gel electrophoresis (2-DGE), the precipitated prolamin was centrifuged at 12,000 × g for 5 minutes at 4˚C and the supernatant was removed. The pellet was mixed with 50 μl rehydration buffer (7 M urea, 2 M thiourea, 2% CHAPS detergent, 0.5% IPG buffer) containing 1 M DTT and prolamin was quantified using the Bradford protein assay [57]. The rehydration buffer (350 μl) containing 110 mg of prolamin was loaded onto an immobilized pH gradient (IPG) strip (pI 6-11, 18cm, GE Healthcare Life Sciences, USA). The IPG strip was loaded into the IPGphor system (GE Healthcare Life Sciences) for 25 hours for isoelectric focusing. Subsequently, the IPG strip was placed on a 15% sodium dodecyl sulphate polyacrylamide (SDS-PAGE) gel and 2D-gel electrophoresis was performed for 23 hours. The gel was stained by Coomassie Brilliant Blue R250 for 3 hours and de-stained for 3 hours. Gels were analyzed and spot volumes were measured using Image Master Platinum 6.0 (GE Healthcare Life Sciences).

Verification and identification of vitamin E-coding genes
Genes that encoded for vitamin E-related proteins were screened by BLAST using 31,177 genes in the NCBI non-redundant database. A heat map was generated using DNAStar's ArrayStar 4 (http://www.dnastar.com). Plant samples were lyophilized at -80˚C for 4 days and pulverized into a very fine powder using a planetary mono mill (Pulverisette 6; Fritsch, Germany). Tocopherols and tocotrienols were identified by gas chromatography-time-of-flight mass spectrometry according to a previously described method [58]. Lipophilic compounds were extracted from 0.1 g of samples by adding 3 ml of ethanol containing 0.1% ascorbic acid (w/v); 0.05 ml of 5 α-cholestane (10 μg/mL) was used as an internal standard (IS). The extracts were lyophilized and then derivatized with 30 μl N-methyl-N-trimethylsilyltrifluoro-actamide (Sigma, USA) and 30 μl pyridine. The derivatized extracts were analyzed by a 7890A gas chromatograph (Agilent, USA) with a Pegasus HT TOF mass spectrometer (LECO, USA). Tocopherols and tocotrienols were quantified using calibration curves that plotted five concentrations of the commercial standards ranging from 0.01 to 10.0 μg and a fixed amount (0.5 μg each) of IS.

De novo assembly of the adlay transcriptome
We performed RNA-Seq and Iso-Seq on tissues obtained from adlay leaf, root, and young and mature seeds. RNA-Seq data were obtained from 17.3 million to 21.5 million reads per tissue sample. In total, more than 230 million reads showed high quality read rates (Q30 values) of over 90% (S2 Table). The high-quality reads were assembled de novo using the Trinity assembler, which generated 111,850 unigenes that were more than 300 basepairs (bp) long. The contigs, after removal of redundant transcripts by the CD-HIT-EST program [46] were distributed as follows: N10 (3,675), N20 (2,760), N30 (2,203), N40 (1,761), and N50 (1,367).
A unigene is a uniquely assembled transcript that denotes a hypothetical gene, which may be represented by multiple isoforms as several different forms of the same protein. Iso-Seq data produced 110,645 high-quality isoforms from three different libraries (S3 Table), which generated 31,177 non-redundant unigenes that were more than 300 bp in length. In total, our analysis generated two unigene sets: 111,850 from RNA-Seq and 31,177 from Iso-Seq. Comparison of the two unigene sets revealed similar GC contents (S4 Table). However, the distribution ratio of unigenes was higher for Iso-Seq compared with RNA-seq with increasing unigene lengths (S2 Fig). Although the number of unigenes obtained from RNA-Seq and their total size was higher for RNA-seq, unigenes obtained from Iso-Seq were better in terms of minimum length, average length, and N50 length (S4 Table). Similar results have been reported for Salvia miltiorrhiza [59].
In our subsequent analyses, we used the Iso-Seq unigene set mainly as a reference for RNA-Seq data. We did not construct an integrated unigene set in this study because of dissimilar characteristics between RNA-Seq and Iso-Seq gene sets, such as transcript lengths and mRNA abundance. We plan to create the integrated gene set using the reference-guided method when the adlay genome sequencing is completed.

Functional annotation
For functional annotation, the assembled 111,850 unigenes obtained from RNA-Seq of leaf, root, young seed, and mature seed tissue samples were screened using an FPKM criterion of � 1, which resulted in 57,901 unigenes. The RNA-Seq unigene set of 57,901 and the Iso-Seq unigene set of 31,177 were annotated using BLAST searches against NCBI/NR, Swissprot, KEGG, and GO databases. The annotated RNA-Seq genes were distributed in NR (59.4%), Swissprot (42.4%), KEGG (17.3%), and GO (36.6%) with duplicates. A total of 60.6% of genes showed significant protein matches in at least one of the four databases (S3A Fig). The annotated Iso-Seq genes were distributed in NR (90.8%), Swissprot (72.5%), KEGG (15.0%), and GO (60.4%). A total of 91.3% of genes showed significant protein matches in at least one of these databases (S3B Fig). The annotation patterns showed that the Iso-Seq unigene set contained more full-length genes than the RNA-Seq unigene set. Non-significant genes that may represent novel genes, long non-coding RNAs, or less conserved 5' or 3' untranslated regions [60,61] were not evaluated in this annotation and require further analysis.
For further functional categorization, GO terms were classified into three major categories: biological process, molecular function, and cellular components (S4 Fig). In the cellular components category, most genes were assigned to the cell and cell part sub-categories. In the molecular function category, the predominant sub-categories were binding and catalytic and in the biological process category, the predominant sub-categories were cellular process and metabolic process. This gene distribution pattern was similar to that seen in other medicinal plants [62,63]. Of the 78 transcription factors that were screened, we found the most number of genes assigned to three transcription factors, MYB, bHLH, and AP2-EREBP (S5 Fig). BLAST analysis of adlay transcripts with different species revealed that adlay's sequences mostly matched the sequences of Sorghum bicolor (S6 Fig). This relationship has been previously observed during analysis of specialized metabolic pathways and repetitive sequences [64,65].

Gene expression in RNA-Seq
To investigate tissue-specific gene expression in adlay plants, we studied the expression of genes in leaf, root, and young and mature seed tissues. Based on three experimental replicates, we classified the expressed genes into two categories, tissue-specific and tissue-enriched (S5 Table). Ten tissue-specific genes were identified in each of the four tissues (S7 Fig). To validate tissue-specific genes, we randomly selected three genes from each tissue to perform qRT-PCR. The results were consistent with our tissue-specific gene expression data except for photosystem I subunit and defensin-like protein, which appeared to show expression in multiple tissues (Fig 1).
To identify the putative biological processes involved during seed ripening, we selected 25 genes that showed the highest upregulation and downregulation in expression during each of the seed development stages (S8 Fig). GO studies identified eight biological processes from the up-regulated gene set and 32 biological processes from the down-regulated gene set during the seed development process (S9 Fig). Up-regulated genes were prominent in organic biosynthetic processes or amino acid metabolic processes and a high number of down-regulated genes were seen in photosynthesis-related metabolic processes. We identified three heat shock-related proteins (HSPs) and two late embryogenesis-abundant (LEA) proteins that were increased in relative transcript abundance in mature seeds compared with young seeds. This may be the reason for increased desiccation and stress tolerance seen in mature seeds, which is consistent with previous studies that have shown a role for LEA and HSP proteins in stress tolerance in plants [66,67]. Amino acid metabolism has been shown to be the major pathway during the kernel development in maize [68]. Our results show an increased number of putative genes involved in amino acid metabolism during seed maturation in adlay. Up-regulated unigenes were largely involved in organic or amino acid metabolic (or biosynthetic) processes and downregulated unigenes were mostly involved in photosynthesis-related metabolic (or biosynthetic) processes. Therefore, mature adlay seeds may be rich in amino acids, which may explain why they have long been used as a nourishing food for humans. Our study showed that RNA-Seq is a valuable tool to understand tissue-specific pathways in plants. To study the relationship between adlay and other plant species, we categorized adlay transcripts into gene families and compared them with gene families from three other monocot plant species, rice (O. sativa), sorghum (S. bicolor), and maize (Z. mays). We found that a total of 8,747 gene families were shared by all four species, but 2,419 gene families were specific to adlay (Fig 2A). These adlay-specific genes included transposons, transposon elements, 19KDa alpha coixin-like, and alpha-coixin family of proteins. Using the 2,419 adlay-specific genes, we showed that adlay's sequences matched the sequences of Sorghum bicolor (49%), Zea mays (35%), and Oryza sativa (12%). Therefore, adlay was most closely related to S. bicolor in the Poaceae family (S6 Fig). GO analysis of these specific gene families revealed that a majority of them were involved in metabolic processes and biological regulation (Fig 2B and S4 Fig). Gene family classification showed that adlay was closest in relation to sorghum compared with the other two plant species because they shared the most number of gene families. This is consistent with our BLAST results that also showed maximum sequence similarity between adlay and sorghum (S6 Fig). Also, phylogenetic and Ka/Ks analysis using several common transcription factor genes, such as heat shock factor and AP2-EREBP, also confirmed that adlay is evolutionarily closer to sorghum than maize and rice (S10 Fig and S6 Table), which is consistent with other studies based on chloroplast DNA sequences [69,70]. The identified gene families in this study will be valuable for understanding the biological response mechanism as well as facilitating molecular breeding in adlay.

Assembly verification using prolamin genes
Adlay is known to contain large amounts of seed-storage proteins known as prolamins [25]. Therefore, we selected prolamin-associated genes from our assembled gene set and compared them with known prolamin genes. From a total of 39 genes that were identified to encode prolamin proteins based on BLAST results (S7 Table), 33 genes contained full length ORFs and six genes had 5' partial ORFs without a start codon. The identified genes were similar to the previously reported 31 adlay prolamins (https://www.ncbi.nlm.nih.gov/genbank/) in nucleotide length, isoelectric point (pI), and molecular weight (S7 and S8 Tables). Most of the putative prolamin-encoding genes showed high similarity with known adlay prolamin genes at the amino acid sequence level except for three genes (S9 Table).
To analyze the storage protein content during adlay seed development, we performed twodimensional gel electrophoresis (2-DGE) on young ( Fig 3A) and mature seed (Fig 3B) tissues. Two experimental replicates showed that prolamin spots were mainly expressed in the molecular weight range of 14.4-31.0 kDa (Fig 3). We found that 11 spots were two times larger in volume in mature seeds compared with young seeds. Spots 17,18,19, and 20 were only seen in mature seeds (Fig 3 and S10 Table). These spots may represent putative prolamins involved in seed development. Prolamins are the major seed storage proteins in maize, sorghum, sugarcane, foxtail millet, and adlay [28,29]. A recent study has shown that adlay has two main αprolamin subclasses with molecular weights of 19 and 22 kDa [29]. We predict that spot 21 represents the 19 kDa sub-class and spot 24 potentially represent the 22 kDa α-prolamin subclass. Detailed comparisons of our experimental results with the screened prolamin genes will be conducted in a future study.
Phylogenetic analysis was performed using deduced amino acid sequences of the 39 prolamin-encoding genes from this study with the reported 31 adlay prolamin genes. The phylogenetic tree showed that our prolamin-encoding gene set contained most of the adlay prolamin genes, including both reported and novel genes. A total of 17 prolamin-encoding genes were classified into three groups, Group I to III, which were located separately from other genes (Fig 4 and S11 Fig).

Assembly verification using vitamin E genes
The adlay plant is enriched in various valuable compounds such as vitamin E, squalene, and phytosterols, which may contribute to its nourishing properties [30]. We analyzed the expression patterns of genes that coded for the two main vitamin E subclasses, tocopherols and tocotrienols. Based on FPKM values, we detected 9 differentially-expressed vitamin E pathway enzymes from adlay leaf, root, and seed tissues. BLAST results identified a total of 22 genes associated with the 9 vitamin E pathway enzymes (Fig 5).
Generally, vitamin E consists of eight major isoforms: four saturated tocopherols (α, β, γ, and δ) and four unsaturated tocotrienols (α, β, γ, and δ) and vitamin E synthesis is largely divided into the tocotrienol and tocopherol synthesis pathways. However, we detected six isomers of vitamin E (four tocopherols and two tocotrienols) by gas chromatography-time-offlight mass spectrometry. Adlay leaf tissues showed the highest amounts of α-, β-, and γtocopherols. These results were consistent with the higher levels of homogentisate phytyltransferase (HPT) in leaf tissues compared with those in the other tissues. We detected α-tocotrienol in root and mature seed tissues and γ-tocotrienol in leaf and mature seed tissues. The most prominent vitamin E isoforms in each organ were the α-and γ-tocopherols and tocotrienols, possibly due to high enzyme activity of 2-methyl-6-geranylgeranyl-plastoquinol methyltransferase (MGGBQ-MT) and 2-methyl-6-phytyl benzoquinone methyltransferase (MPBQ-MT) in all tissues (S11 Table). Our results suggested that the adlay leaf and seed tissues were important sources for genes that can potentially be manipulated for the purposes of breeding and genetic engineering.
In conclusion, we performed a de novo assembly of the transcriptome of adlay (C. lacrymajobi) using full-length Iso-Seq and short-read RNA-Seq. Deep transcriptome analysis generated 57,901 genes from short-read sequencing and 31,177 genes from SMRT long-read sequencing. We validated the assembled gene sets via gene expression analyses, gene family studies, qRT-PCR, and quantitative experiments. We also screened the adlay transcriptome for  prolamin-and vitamin E biosynthesis-related genes and performed a comparative analysis of gene families between adlay and other members of the Poaceae family, such as rice, sorghum, and maize. The new adlay gene sequences identified in our study will provide a valuable resource for future genetic and molecular experimentation in adlay.
Supporting information S1 Table. Unigene-specific primers used for tissue-specific qRT-PCR. (PDF) S2 Table. General properties of the reads produced by short-read sequencing using the Illumina Hiseq 2500 sequencing platform. (PDF)

Fig 5. Genes predicted to be involved in the vitamin E synthesis pathway.
Vitamin E synthesis is largely divided into two pathways that are regulated by homogentisate geranylgeranyl transferase (HGGT) and homogentisate phytyltransferase (HPT). The 9 differentially expressed enzymes in vitamin E synthesis are ρhydroxy phenyl pyruvic acid dioxygenase (HPPD), 2-methyl-6-phytyl benzoquinone methyltransferase (MPBQ-MT), 2-methyl-6-geranylgeranyl-plastoquinol methyltransferase (MGGBQ-MT), tocopherol cyclase (TC), γ-tocopherol methyltransferase (γ-TMT), geranylgeranyl diphosphate reductase (GGDR), homogentisate geranylgeranyl transferase (HGGT), homogentisate phytyltransferase (HPT), and S-adenosylmethionine (SAM). Heat map analysis shows the differential tissue-specific expression of 22 genes associated with the vitamin E pathways. The color bar indicates the value of expression in a sample, based on the color key at the upper left corner. Black indicates upregulation and green, downregulation.  Table. Amino acid sequence similarity between 39 prolamin-encoding genes and 31 known prolamin (coixin) genes. (PDF) S10 Table. The calculated spot volume values for prolamin storage protein content during seed development (young vs. mature seeds) obtained from 2-DGE. (PDF) S11 Table. Tocopherol and tocotrienol contents in the adlay leaf, root, and young and mature seed tissues. (PDF) S1 Fig. Medicinal plant adlay (Coix lacryma-jobi). The leaf, root, young and mature seeds. heat shock factor (HSF) genes. Among 8,747 common gene families, encoding 37 HSF genes were selected and tree represents based on amino acid sequence similarity of HSF genes. Multiple sequence alignments of the amino acid sequences were performed using MUSCLE (MEGA 7 software) and the phylogenetic tree was generated using the Maximum Likelihood (ML) method. Scale bar represents the number of amino acid substitution per site. The bootstrap support values (> 50%) are shown near the branches of the tree. Three classes, A, B and C, of HSFs indicate on the right side of tree. Genes belonging to each of six groups of common gene families are marked by blue boxes on tree. Among 12 HSF genes of adlay, six were more closely located with sorghum HSF genes than those of maize and rice whereas the remaining six were closed to maize or both of sorghum and maize. (JPG) S11 Fig. Phylogenetic tree of prolamin protein genes. Tree was based on amino acid sequence similarity between predicted genes (red circles) from this study and known adlay genes (blue circles). Multiple sequence alignments of the amino acid sequences were performed using MUSCLE (MEGA 7 software) and the phylogenetic tree was generated using the Neighbor-joining (NJ) method. The bootstrap support values are shown near the branches of the tree. Three groups of prolamin genes, Group I to III, located apart from other genes are indicated. (JPG)