Genome-wide discovery and characterization of long noncoding RNAs in African oil palm (Elaeis guineensis Jacq.)

Long noncoding RNAs (lncRNAs) are an important class of genes and play important roles in a range of biological processes. However, few reports have described the identification of lncRNAs in oil palm. In this study, we applied strand specific RNA-seq with rRNA removal to identify 1,363 lncRNAs from the equally mixed tissues of oil palm spear leaf and six different developmental stages of mesocarp (8–24 weeks). Based on strand specific RNA-seq data and 18 released oil palm transcriptomes, we systematically characterized the expression patterns of lncRNA loci and their target genes. A total of 875 uniq target genes for natural antisense lncRNAs (NAT-lncRNA, 712), long intergenic noncoding RNAs (lincRNAs, 92), intronic-lncRNAs (33), and sense-lncRNAs (52) were predicted. A majority of lncRNA loci (77.8%–89.6%) had low expression in 18 transcriptomes, while only 89 lncRNA loci had medium to high expression in at least one transcriptome. Coexpression analysis between lncRNAs and their target genes indicated that 6% of lncRNAs had expression patterns positively correlated with those of target genes. Based on single nucleotide polymorphism (SNP) markers derived from our previous research, 6,882 SNPs were detected for lncRNAs and 28 SNPs belonging to 21 lncRNAs were associated with the variation of fatty acid contents. Moreover, seven lncRNAs showed expression patterns positively correlated expression pattern with those of genes in de novo fatty acid synthesis pathways. Our study identified a collection of lncRNAs for oil palm and provided clues for further research into lncRNAs that may regulate mesocarp development and lipid metabolism.


INTRODUCTION
Noncoding RNAs (ncRNAs) constitute a critical part of the eukaryotic transcriptome and play a vital role in gene regulation. Long noncoding RNAs (lncRNAs), an important class of noncoding RNAs, are non-protein coding RNAs longer than 200 bp and function as key regulators of diverse mechanisms in a range of biological processes (Geisler & Coller, 2013;Rinn & Chang, 2012;Wang et al., 2011). Researchers have focused on the identification and characterization of lncRNAs responsible for biological process regulation over the past several decades. LncRNAs were classified into several groups based on their genomic location, including long intergenic noncoding RNAs (lincRNAs), intronic lncRNAs, natural antisense lncRNAs (NAT-lncRNAs) and sense lncRNAs (Kung, Colognori & Lee, 2013;Rinn & Chang, 2012). Researchers have obtained numerous lncRNAs in plants, including NAT-lncRNAs in Arabidopsis (Zhao et al., 2018), lincRNAs in black cottonwood (Shuai et al., 2014), soybean and wheat (Golicz, Singh & Bhalla, 2017;Zhang et al., 2014), and lncRNAs associated with stress responses in cotton, grapevine, and Chinese cabbage (Jain et al., 2017;Wang et al., 2019a;Wang et al., 2019b;Wang et al., 2015;Xu et al., 2017;Yao et al., 2019). Oil palm (Elaeis guineensis) is an important oil crop in tropical and subtropical areas and its complete genome was released in 2013 (Singh et al., 2013). However, no reports on genome-wide lncRNA identification in oil palm are available.
LncRNA loci serve as important regulatory mediators in gene expression and regulate the expression of target genes with cis-acting or trans-acting mechanisms (Kung, Colognori & Lee, 2013;. LncRNAs can interact with multiple protein partners, serve as molecular scaffolds that help assemble and target the chromatin-modifying complex, and interact with miRNA as target mimics (Chu et al., 2011;Liu et al., 2012;Rinn & Chang, 2012;Vance & Ponting, 2014). For lncRNAs with cis-functions, their regulated genes are located close to the lncRNA loci (Herzog et al., 2014;Kim & Shiekhattar, 2016;Li, Zhu & Luo, 2016). NAT-lncRNA loci may regulate their sense genes and influence their expression via diverse transcriptional or post-transcriptional mechanisms (Zhao et al., 2018). NAT-lncRNA loci may compete for RNA polymerase II and regulatory transcription factors with their sense genes and cause transcriptional interference (Faghihi & Wahlestedt, 2009;Wight & Werner, 2013). All these results provided information about screening for lncRNA targets in silico. Adjacent genes of lncRNA loci, target lncRNAs of miRNA and complementary sequences between lncRNAs and target genes were used as criteria to select candidate target genes of lncRNAs (Jiang et al., 2015;Li et al., 2015;Wu et al., 2013;Zhao et al., 2018).
With the advance of next-generation sequencing (NGS) technology, RNA-seq has become the technical platform of choice to identify lncRNAs. Using bioinformatics tools, lncRNA transcripts can be directly assembled from RNA-seq reads. Recent advances in DNA sequencing and transcriptome analysis have provided gigabases of data and genome-wide analysis of lncRNAs has been conducted in many species (Jain et al., 2017;Ma et al., 2018;Wang et al., 2019a;Wang et al., 2019b;Xu et al., 2017;Yao et al., 2019). Approximately 40,000 putative lncRNAs were identified in Arabidopsis by EST, tiling array analyses and RNA-seq data sets (Jin et al., 2013;Liu, Wang & Chua, 2015;Wang et al., 2014). Oil palm is one of the most important oil crops in the world. Many QTL mapping for oil yield, fatty acid composition and vegetative traits in oil palm were reported (Jeennor & Volkaert, 2014;Montoya et al., 2013;Singh et al., 2009), A good reference of oil palm genome, transcriptome profile and lncRNA profile will assist in identifying the causality for phenotypic variations. In this study, we applied strand-specific RNA-seq (ssRNA-seq) technology to identify lncRNA transcripts from an equally pooled RNA sample of oil palm spear leaf and mesocarps of six developmental stages. We also predicted target genes of lncRNAs and analysed the expression patterns of lncRNAs and their target genes based on 18 oil palm transcriptomes downloaded from the National Center for Biotechnology Information (NCBI) website. LncRNAs expression patterns in different tissues and different developmental stages of mesocarp that stores oil were characterized. We also applied association mapping to identify lncRNA loci related to the variation in fatty acid content. Our study provides a resource for studying lncRNAs in oil palm, including NAT-lncRNAs and their target genes, as well as lncRNAs related to fatty acid content.

Plant materials
One African oil palm plant (hybrid of dura and pisifera) sourced from Malaysia, and grown in the oil palm germplasm resources garden at Wenchang, Hainan, China were used for all experiments. The mesocarp for oil palm fruit at six developmental stages (8-week-old, 12-week-old, 16-week-old, 18-week-old, 20-week-old and 24-week-old), kernel, female flower, and spear leaves were collected from a ten-year-old oil palm individual (accession: CRI-005). The mesocarp at six developing stages and spear leaves were used for the following ssRNA-seq. Roots were collected from one-year-old oil palm seedlings that derived from the ten-year-old oil palm seeds, which were easier to sample and not lignified as the ten-year-old oil palm plant. Each sample collection was done with three biological replicates. All samples collected were immediately frozen in liquid nitrogen and stored −80 • C until needed for RNA extraction. All samples (mesocarp, kernel, female flower, spear leaves, and root) were used for the following quantitative reverse transcription PCR (RT-qPCR) and RT-PCR.

RNA extraction, RNA-seq libary construction and strand-specific RNA sequencing
Total RNA for oil palm mesocarps (six developing stages described above, with three biological replicates for each stage) and spear leaves used for ssRNA-seq were extracted by a modified CTAB method (Xiao et al., 2012). RNA degradation and contamination was detected by 1.5% agarose gel electrophoresis. Total RNA concentration and purity was determined using a Nanodrop ND-2000 spectrophotometer (Nanodrop Technologies, USA). RNA integrity was verified using RNA Nano 6000 Kit for Agilent Bioanalyzer 2100.
RNA samples without degradation and contamination were used for rRNA removal. The RNA samples were equally pooled and a total amount of 1.5 ug RNA was treated with Ribo-Zero rRNA Removal Kit (epicentre, USA), and then fragmented by fragmentation buffer. First-strand complementary DNA was synthesized using random hexamers and reverse transcriptase. Second-strand cDNA was prepared using RNase H, DNA polymerase I and dNTPs. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities and 3 ends of DNA fragments were treated with adenylation. After that, NEBNext adaptors were ligated to the ends of the prepared double-stranded cDNA and the library fragments were purified by AMPure XP beads. The fragment size ranged from 150-200 bp were selected and PCR amplified, and then sequenced with an Illumina Hiseq 2000 platform by Biomarker Technology Co., Ltd (Beijing, China). The clean reads from the above lncRNA-seq data were deposited in the European Bioinformatics Institute (EMBL-EBI) at the European Nucleotide Archive (accession number: ERR3412516).

RNA transcript assembly and novel transcriptional unit identification
We used the FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/) to check the sequence quality of raw reads. Base quality value (Q) was estimated by the following formula: Q = −10 ×log 10 P (P represents the error probability during Illumina sequencing). Raw reads were pretreated to remove adaptor sequences and low quality sequences via the Trimmomatic software (Bolger, Lohse & Usadel, 2014). The parameters for Trimmomatic were set as: (1) remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10); (2) remove leading low quality (LEADING:3), (3) remove trailing low quality (TRAILING:3); (4) scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15); (5) drop reads below the 36 bases long (MINLEN:36). Clean reads were obtained by removing reads containing adapter, reads containing poly-N and low quality reads from raw data. After raw reads filtering, Q30 value of clean reads was above 95.88% and all reliable reads were mapped to the oil palm reference genome (Version EG5, https://www.ncbi.nlm.nih.gov/genome/2669) using hisat2 (version 2.1.0) with default parameters (Kim, Langmead & Salzberg, 2015) and de novo assembled using the StringTie software (version 2.0) with default parameters (Pertea et al., 2015). The protein-coding transcripts identified in this study were annotated based on the oil palm gene prediction in NCBI (Gene models based on file downloaded from the NCBI website,https://www.ncbi.nlm.nih.gov/genome/?term=Elaeis+guineensis). After removing all annotated protein-coding genes, transcripts that had more than 200 bp in length, more than one exon and FPKM ≥ 0.1 were selected for further identification of lncRNAs.

lncRNAs identification, classification and characterization
We removed transcripts that were likely to be assembly artifacts according to class code annotated by the gffcompare program and retained transcripts annotated by ''u'', ''i'', ''o'' and ''x'', which represent novel intergenic, intronic, sense-overlapped and cis-antisense transcripts, respectively (Trapnell et al., 2012). The transcripts of candidate lncRNAs must not contain open reading frame encoding more than 50 amino acids and must not encode any transposable elements. Coding Potential Calculator (CPC) (Kong et al., 2007), Coding-Non-Coding Index (CNCI) (Sun et al., 2013), Coding Potential Assessment Tool (CPAT) (Wang et al., 2013) analysis and PfamScan (Pfam 32 database) were applied to analyze transcripts. The assembled transcripts that did not pass the protein-coding-score test (score < 0) of CPC and CNCI analysis were as noncoding sequences. CPAT analyzed the open reading frame length/coverage, Fickett score, and hexamer usage bias of the transcripts and determined the noncoding sequences with default parameter. Based on PfamScan analysis, we eliminated transcript with potential protein-coding ability (E-value cutoff ≤ 0.001). The predicted long noncoding transcripts shared from the four analyses were considered as candidate oil palm lncRNAs. We filtered out transposable elements for the transcripts through PfamScan analysis and comparing with Dfam database (Hubley et al., 2015) through BLAST analysis (E-value cutoff <1e−5).
The different types of lncRNAs include lincRNA, intronic lncRNA, anti-sense lncRNA, sense lncRNA were selected using cuffcompare (Trapnell et al., 2012). The protein-coding genes were derived from protein-coding transcripts in this study, as well as oil palm predicted gene models downloaded from NCBI (Version: EG5). We used StringTie to calculate Fragments Per Million Fragments (FPKM) values of lncRNAs and other proteincoding genes. For gene with different isoforms, total FPKM values of all isoforms were used to represent the gene's FPKM value.

Prediction of lncRNA target genes
LncRNAs participated in regulatory pathways through two ways -in cis and in trans. Target genes for lncRNAs acting in cis were predicted by protein-coding genes overlapped within 2 kb flanking sequences of lncRNAs or overlapped with lncRNAs. The genomic positions of lncRNAs and protein-coding genes were compared to identify cis-acting target genes for lncRNA. We used the LncTar software to predict target genes in trans of lncRNA loci based on mRNA sequence complementary and RNA duplex energy prediction (Li et al., 2014).

LncRNA expression profiles in different oil palm tissues and co-expression analysis
A set of 18 oil palm RNA libraries were downloaded from the NCBI website (https: //www.ncbi.nlm.nih.gov/) and used to calculate the fragments per kilobase of transcript per million fragments mapped (FPKM) for lncRNAs and their corresponding target genes (Table S1). Bowtie2 (Langmead & Salzberg, 2012) were used to map the reads to oil palm genomes and FPKM values were calculated by cufflinks (Trapnell et al., 2012).
Based on the reference of Zhao et al. (2018), Pearson Correlation Coefficients (p.c.c.) were calculated between the expression levels of adjacent protein-coding genes and between the expression levels of lncRNAs and their closest protein-coding genes. LncRNA/proteincoding gene pairs with low abundance (FPKM max <1) were excluded from our analysis. LncRNA/protein-coding gene pairs with Pearson correlation coefficients greater than 0.468 were presented in the heat map (n = 16, P ≤ 0.05). The 18 transcriptome datasets described above were used for analysis of p.c.c. between lncRNAs and genes that belong to the pathways of plastid fatty acid synthesis from pyruvate and triacylglycerol (TAG) synthesis from the reference Xiao et al. (2019).

Validation of the expression of lncRNAs and their target genes by RT-PCR and RT-qPCR
The total RNAs for oil palm root, stem, spear leaf, female flower, kernel, and mesocarp tissues at six developing stages (8-week-old, 12-week-old, 16-week-old, 18-week-old, 20-week-old and 24-week-old) were extracted by a modified CTAB method as the above. The complementary DNA for each sample were synthesized using All-in-One First-Strand Synthesis MasterMix kit (NOVA, Jiangshu, China) with random hexamers, which was used for RT-PCR and RT-qPCR assays to validate the expression of lncRNAs and their target genes. We used Primer 5.0 to design the primers for these genes and listed the primer information in Table S2.
The RT-qPCR mixture contained 1 µl diluted cDNA, 5 µl of 2×FastStart Universal SYBR Green Master (NOVA, Jiangshu, China), and 0.5 µl of each gene-specific primer (10 µM) in a final volume of 10 µl. All PCR reactions were performed using ABI 7900HT machine under following conditions: 2 min at 95 • C, and 40 cycles of 5 s at 95 • C and 30 s at 60 • C in 384-well clear optical reaction plates (Applied Biosystems, USA). The procedure ended by a melt-curve ramping from 60 to 95 • C for 20 minutes to check the PCR specificity. All RT-qPCR reactions were carried out in biological and technical triplicate. A non-template control was also included in each run for each gene. The final Ct values were the means of nine values (biological triplicate, each in technical triplicate).

Single nucleotide polymorphisms (SNPs) detection for lncRNAs and identification of lncRNAs related to variation in fatty acid composition by association mapping
Based on 1,261,501 reliable SNPs markers (minor allele frequency >0.05 and more than 80% oil palm individuals had sequences information for each SNP marker) derived from SLAF sequencing in a diversity panel of 200 oil palm individuals in our previous research (Xia et al., 2019b), SNPs within lncRNA regions were screened by a Perl script. The contents of lauric acid (12:0), myristic acid (C14:0) , palmitic acid (16:0), tripalmitelaidin acid (16:1), Hexadecadienoic acid (16:2), stearic acid (18:0), oleic acid (18:1), linoleic aicd (18:2), and oil for 160 individual out of 200 oil palm used in this study were the same set of data from our previous research (Xia et al., 2019a). Fatty acid composition was examined and measured using gas chromatography (Agilent DB-23, 30 m ×250). The nine values (three biological replicates × three technical replicates) obtained per oil palm individual were averaged for subsequent association mapping. The average values along with standard deviations of different fatty acids and oil contents for the 160 oil palm individuals were listed in Table S3. Since five subgroups for the 200 oil palm individuals were estimated based on cross-validation errors, mixed linear models (MLM) were used. Fixed effects were computed using a Q (population) value matrix, and random effects were computed using a K (kinship) matrix. The Q+K value matrix was added to the MLM model. The Q matrix was obtained using STRUCTURE software (version 2.3.4) (Pritchard, Stephens & Donnelly, 2000), and the K matrix (genetic relationships among the 200 oil palm individuals) was obtained using SPAGeDi software (version 1.5) (Hardy & Vekemans, 2002). Association analysis was performed using Tassel 5.0 (Bradbury et al., 2007). P-values for associations between SNP markers and fatty acid content were computed according to Yu et al. (2006).

Identification and characterization of oil palm lncRNAs
A total of 166,288,038 raw reads were generated from an equally pooled RNA sample of oil palm spear leaf, 8-week-old, 12-week-old, 16-week-old, 18-week-old, 20-week-old and 24-week-old mesocarp tissues, and 165,257,052 clean reads (24.69 Gbp) were obtained after adaptor trimming and sequence quality filtering. Approximately 86.98% clean reads were mapped to the oil palm reference genome, and 96,009 transcripts were assembled. To identify lncRNAs, we filtered out the assembled transcripts shorter than 200 bp and transcripts with protein-coding potential via protein-coding-score test (CPC, CNCI, CPAT) and Pfam protein domain analysis, which identified transcripts with potential protein-coding ability (cutoff E-value ≤ 0.001). Finally, 1,663 transcripts were tested as having no protein-coding potential and were considered candidate lncRNAs. These transcripts were mapped to 1,363 lncRNA loci and 289 lncRNA loci had more than one transcript (Table S4).
The 1,363 lncRNA loci were distributed across all 16 chromosomes (930) and unlinked scaffolds (433) of the oil palm genome (Table 1). Chr1 (129), Chr2 (96) and Chr3 (95) had the largest number of lncRNA loci, while Chr10 (31), Chr13 (32) and Chr15 (30) had the least number of lncRNA loci. Based on the present annotation version of the oil palm genome, the 1,363 lncRNA loci were classified into four classes. LncRNA loci located in intergenic regions (lincRNAs, 703) are the top number of lncRNA classes, followed by lncRNA loci located on the antisense of protein coding genes (NAT-lncRNAs, 581), lncRNA loci overlapping with pseudogene regions (sense-lncRNAs, 47) and lncRNA loci belonging to genic intronic regions (intronic, 32). Based on the genomic locations for NAT-lncRNA loci and their corresponding sense genes, we further classified them into six types (Fig. 1). NAT-lncRNA loci for type I, II, III, IV overlapped in their genic regions with genes on their opposite strands, while type V and type VI had promoter regions and 3 UTR regions overlapped with sense strand genes, respectively.

Target gene identification and coexpression analysis
A total of 875 unique target genes were identified, including natural antisense lncRNAs (NAT-lncRNAs, 712), long intergenic noncoding RNAs (lincRNAs, 92), intronic-lncRNAs (33), and sense-lncRNAs (52) ( Table S6). The flanking genes within a 2-kb distance and/or antisense overlapping genes (865) were identified as candidate targets in cis for lncRNAs, while 11 predicted target genes were predicted to be in trans. One gene LOC105051313, was also identified as a target gene, both in cis (MSTRG.23935) and in trans (MSTRG.23116 ). For NAT-lncRNAs, a total of 712 target genes were found, including target genes overlapping with NAT-lncRNAs on the opposite strand (657), located on the same strand of NAT-lncRNA within the 2-kb region (48), and mRNA sequences complementary to NAT-lncRNA via LncTar analysis (7) ( Table S6). Of these target genes, 72% (514) of target genes had genic regions overlapping with NAT-lncRNAs (class I, II, III, IV), and 144 target genes were located within the 2-kb flanking distance of NAT-lncRNAs, including 44 genes overlapping upstream promoter regions with NAT-lncRNAs (class V) and 100 genes overlapping for on the 3 downstream (class VI). For target genes overlapping with NAT-lncRNA (class I, II, III, IV), comparison of transcripts between NAT-lncRNAs and their target genes indicated that 263 pairs were overlapped and 287 pairs did not overlap ( Table S5). The lincRNAs (703) were the top number type of identified lncRNAs, and we identified 87 target genes for 82 lincRNAs located within 2-kb regions (Table S6). Among these target genes, eight genes belong to lipid metabolism pathways by comparison with the genes in the pathways identified in our previous research (Table S6) (Xiao et al., 2019).
To explore the function of lncRNAs in the regulation of their target genes, we calculated the Pearson correlation coefficients (p.c.c.) between lncRNAs and their target genes. We identified 585 pairs of lncRNA loci and target genes that did not overlap in their transcripts (Table S7). Among these lncRNA/target pairs, 505 lncRNAs had one target gene and 66 lncRNAs had more than one target gene. For these lncRNAs in Table S7, a majority of 312 lncRNAs (250) had low expression (0 < FPKM max < 1) and 84 lncRNAs had no detectable expression (FPKM max = 0) in the 18 transcriptomes, while 47 target genes had low expression and 16 target genes with no detectable expression. Since the existence of low expression levels or no expression among the 585 lncRNAs/target pair, 380 pairs of lncRNA loci and target genes were filtered out. The p.c.c. for the remaining 205 pairs of lncRNAs/target genes were examined and 10% (21) of the gene pairs showed positively correlated expression patterns (p.c.c. score ≥ 0.468, P value < 0.05) are shown in Table S7.

DISCUSSION
LncRNAs play important roles in mediating biological process regulation. In this study, we applied ssRNA-seq to identify lncRNAs in oil palm from mixed tissues of leaves and six different developmental stages of mesocarp. Based on ssRNA-seq data in this study and 18 transcriptomes from other studies, we found that the majority lncRNA loci had low expression. Coexpression analysis between lncRNAs and predicted target genes in cis indicated that 21 lncRNA loci had positive correlations with target genes, while only one lncRNA locus displayed a negative correlation. Based on the data of SNPs and fatty acid content data from our previous research, 28 SNPs belonging to 21 lncRNAs were associated with fatty acid composition. Our study identified a collection of lncRNAs for oil palm and provided clues for further investigations into lncRNAs that may regulate mesocarp development and lipid metabolism.
Data mining for lncRNAs was feasible since a large amount of transcription data was available. In model plants, such as Arabidopsis, maize and rice, a large quantity of lncRNA loci were identified (Li et al., 2014;Zhang et al., 2014;Zhao et al., 2018), some lncRNAs were validated and play critical roles in flowering controls, grain yield, stress response and other biological processes (Jain et al., 2017;Jiang et al., 2019;Kindgren et al., 2018;Ma et al., 2018;Wang et al., 2018). LincRNAs are the most abundant lncRNA types (Table 1), while the majority of lncRNAs had a relatively lower expression than did protein-coding genes (Figs. 2 and 3). These are common phenomena for lncRNAs in many species (Wang et al., 2015;Yao et al., 2019). We used mixed samples to conduct ssRNA-seq, which could cover more lncRNA transcripts. The downloaded 18 transcriptomes showed similar low expression levels for most lncRNAs, while lncRNAs highly expressed in more samples tended to be detected in different transcriptomes. The RT-PCR results fit the transcriptome data, 22 out 71 selected lncRNAs were expressed in the tested tissues and four lncRNAs were expressed specifically in the mesocarp. LncRNAs regulate target genes by serving as target mimics of miRNAs and regulators of transcripion and chromatin modification (De Lucia & Dean, 2011;Jiang et al., 2019;Magistri et al., 2012;Wang et al., 2018;Wu et al., 2013;Zhao et al., 2018). Research in Arabidopsis on NAT-lncRNAs demonstrated that NAT-lncRNAs often positively correlate with their cognate sense genes (Zhao et al., 2018). Coexpression analysis in this study also suggested that lncRNAs may cotranscribed with adjacent target genes and 6% of lncRNA/target pairs showed a positive correlation in expression (Table S7). The positive correlation results for four lncRNAs and their target genes were also validated by RT-qPCR (Fig. 6). However, since a majority of lncRNA loci had low expression levels, our results suggested that a small proportion of target genes may be cotranscribed with lncRNAs in cis.
Oil palm has 90% oil in its mesocarp, the highest level observed in the plant kingdom. Bourgis et al. (2011) used RNA-seq to examine transcriptional changes during oil palm mesocarp development, and found that synthesis of fatty acids and supply of pyruvate in the plastid were the major factors controlling oil storage in the oil palm mesocarp (Bourgis et al., 2011). Researchers have used QTL mapping and association mapping to identify key loci related to the content or composition of fatty acids in oil palm composition (Jeennor & Volkaert, 2014;Montoya et al., 2013;Singh et al., 2009). In our study, we used association mapping and found 21 lncRNAs related to the variation of fatty acid composition and oil content of oil palm mesocarp ( Table 2). The coexpression analysis between the 21 lncRNAs and genes belonging to fatty acid synthesis and TAG synthesis pathways showed that nine lncRNAs had a similar expression patterns as genes in the two lipid metabolism pathways. MSTRG.15295 had similar expression pattern with the majority of correlated genes, including a key transcription factor -WRI1. The WRI1 gene is considered to play an important role in oil accumulation (Bourgis et al., 2011;Cernac & Benning, 2004;Dussert et al., 2013;Focks & Benning, 1998;Troncoso-Ponce et al., 2011). Our results provide candidate lncRNA loci related to the development and oil storage of oil palm mesocarp.

CONCLUSIONS
This study provided an important collection of lncRNAs in oil palm and a set of 581 NAT-lncRNAs and 712 targets were predicted. Based on 18 oil palm transciptomes, we found that A 77.8%-89.6% lncRNA loci had low expression, while only 89 lncRNA loci had medium to high expression in at least one transcriptome. Coexpression analysis between lncRNAs and their target genes indicated that 10% of lncRNAs had positively correlated expression patterns with target genes. Based on SNP markers derived from our previous research, 28 SNPs belonging to 21 lncRNAs were associated with the variation of fatty acid contents. Moreover, twelve lncRNAs showed positively correlated expression patterns with genes in de novo fatty acid synthesis pathways. Our study identified a collection of lncRNAs for oil palm and provided clues for further research investigating lncRNAs that may regulate mesocarp development and lipid metabolism.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by grants from the National Natural Science Foundation of China (No. 31301358), the Scientific and Technological Cooperation Projects of Hainan province (No. KJHZ2015-06) and the Fundamental Scientific Research Funds for Chinese Academy of Tropical Agricultural Sciences (No. 1630152019006). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: National Natural Science Foundation of China: 31301358. Scientific and Technological Cooperation Projects of Hainan province: KJHZ2015-06. Fundamental Scientific Research Funds for Chinese Academy of Tropical Agricultural Sciences: 1630152019006.