Genomic and Transcriptomic Analysis Identified Novel Putative Cassava lncRNAs Involved in Cold and Drought Stress.

Long non-coding RNAs (lncRNAs) play important roles in the regulation of complex cellular processes, including transcriptional and post-transcriptional regulation of gene expression relevant for development and stress response, among others. Compared to other important crops, there is limited knowledge of cassava lncRNAs and their roles in abiotic stress adaptation. In this study, we performed a genome-wide study of ncRNAs in cassava, integrating genomics- and transcriptomics-based approaches. In total, 56,840 putative ncRNAs were identified, and approximately half the number were verified using expression data or previously known ncRNAs. Among these were 2229 potential novel lncRNA transcripts with unmatched sequences, 250 of which were differentially expressed in cold or drought conditions, relative to controls. We showed that lncRNAs might be involved in post-transcriptional regulation of stress-induced transcription factors (TFs) such as zinc-finger, WRKY, and nuclear factor Y gene families. These findings deepened our knowledge of cassava lncRNAs and shed light on their stress-responsive roles.

: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Li [15]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S3: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wang [32]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S3: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wang [32]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S3: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wang [32]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S3: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wang [32]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S4: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wilson [39]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S4: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wilson [39]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S4: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wilson [39]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S4: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Wilson [39]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S5: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Pootakham [41]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S5: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Pootakham [41]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S5: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Pootakham [41]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S5: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in cassava RNA-seq data from Pootakham [41]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S6: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-resistant cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S6: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-resistant cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S6: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-resistant cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S6: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-resistant cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S6: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-resistant cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue). Figure S7: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression. Figure S7: Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue).

Figure S7:
Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue).

Figure S7:
Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue).

Figure S7:
Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue).

Figure S7:
Comparison of expression level between unmatched ncRNAs with known ncRNAs and protein coding genes in CBSV-susceptible cassava RNA-seq data from Amuge [40]. Y-axis represents expression level with normalization by GeTMM. The left graph represents boxplot of expression level in coding genes and unmatched ncRNAs. The middle graph represents scatterplot of expression distribution in coding genes. The right graph represents scatterplot of expression distribution in short and long unmatched ncRNAs, respectively. X-axis in the right graph determined the confidence (probability to be ncRNA) of unmatched ncRNA according to RNAz tool. Black line at y-intercept denotes 95 th percentile rank of expression (continue).

Figure S8
: Expression supporting and confidence of potential novel lncRNAs. Scatter plot represents 2,229 novel lncRNAs (Me-lncRNAs) comprise of lincRNAs, lncNAT and sense/intronic-lncRNA. X-axis indicates the confidence (probability to be ncRNA) of putative lncRNA according to RNAz prediction y-axis represents number of supporting datasets.                               NF-YA9 lncNAT (ncM17949) Figure S21: Direct binding of ncM17949 and its target. Sequence and position of direct binding prediction between lncNAT (ncM17949) and mRNA of target gene Manes.09G025200 encoding for nuclear factor Y, subunit A9 using LncTar with normalized free energy (ndG) < -0.1, which reflects the relative stability of internal base pairs. The top is lncNAT (ncM17949) ATGTGCAGTGGATCAAAGAAGAAACCAACTCATACAGGTGTCTTCATGGGGAATGAATTTCAAAAAGATGGGTATTGCAGTGATCTCTCTGTTTTGCTTGAATTGTCTGCCTCGAATGATCT  GATTGGATTTAAGAGAGAAATCCAAGGAGGCCGTGAGGTTGATGAGCCTGGCTTATGGTATGGGGGAAGAATTGGCTCAAAAAAGATGGGGTTTGAGGAGAGGACTCCCCTCATGATTGCTG  CCTTATTTGGAAGCAAGGATGTCTTGAGTTATATCTTGGAAATGGGCCATGTTGATGTTAATCGGTGTTGTGGGTCTGATGGCGCCACAGCCCTTCACTGCGCTGTTGCAGGTGGTTCTGCA  TCTTCTGTTGAGGTTGTCAAGCTCTTGCTTGCTGCCTCTGCTGATCCTAACGCTGTTGATGCCAATGGAAATTGTGCTAGTGATTTGCTTGTTCCAGTTGCTTGTTTCAGTTTTAATTTGAG  GAGACAGGCACTAGAGCTTGTGCTAAAGGGTGGCTGTACTAGTGATGAATCTTGTGTTTTGGCTGATCAGAATCCTGATGAAATAGATGTGCAACAGCAGCAGGAAGTTTCAACTCCAAGAT  TATTTAAAGATGGGACTGACAAGAAAGAATATCCTGTCGATCTCACTCTTCCTGATATCAATTATGGTATATATGGTACTGATGAATTTAGAATGTATACATTTAAGGTGAAGCCTTGCTCA  AGGGCATATTCCCATGATTGGACAGAATGCCCATTTGTTCATCCTGGTGAGAATGCAAGGCGGCGAGATCCAAGGAAATACCATTATAGTTGTGTCCCATGCCCTGAGTTCCGGAAGGGTTC  ATGCAGGCAAGGTGATGCTTGTGAATATGCACATGGTATTTTTGAGTGCTGGCTTCACCCTGCCCAATATCGAACACGTCTCTGCAAAGATGAGACTAATTGCTCAAGAAGGGTTTGTTTCT  TTGCTCACAAGCCTGAAGAGCTTCGACCCTTGTATGCCTCAACAGGTTCAGCTGTGCCTTCACCAAGATCTCTCTCAGCCAACGGTTCAGCACTAGACATGGGTTCTATCAGTCCACTTGCC  CTCGGTTCTCCATCTGTCATGATACCACCTACATCAACCCCACCCGTGACTCCTACTGGGTCCTTATCTCCTCTGGGTAGTTGGCCCAGTCAGTCTAATATTGTGCCCCTTAACTTGCAGCT  TCCTGGTAGTAGGCTGAAATCTGCACTGAGTGCTAGAGATATGGATATGGATGCAGAGTTGTTTGCGCTTGATAGTCACCATTGCCGGCAACAACTGATGGATGAGATTTCTGGTCTCCCCT  CACCTTCCAGTTGGAACAATGGTTTGTCCACTGCTTCAGCTTTTGCTATCTTTGGTGATCGAACTACGGAGTTGACTAGGCTTGGAGGAGTGAAGCCAACTAATCTTGAGGGTATTTTTGGA  TCTCTTGATCCTACAATTTTGCCTCAATTGCAAGGACTCCCAGGGGATGCCACTGCATCGCAATTACAGTCTCCAACAGAGATTCGGATGCGCCAGAACATTAACCAGCTGCTTCGCTCAAG  CTACCCTACCAACTTCCCATCTTCTCCTGTGAGGACACCATCTTTCAGGATTGACTCATCTGGTGCAGCAGCAGCAGCAGTTTTGAATTCAAGGGCTGCCTTTGCAAATCGGAGCCAGAGTT  TCATTGAGAGAAGTGCTGTGAACCGTCATACTGGGTTTTCTTCACCAACTTCTTCAGCAACTATACTGCCTTCTAATCTTTCAGACTGGGGTTCACCTGATGGCAAATTAGAATGGGGCATG  CAAGGACAAGGGCTTGATAAACTGCGAAAGTCTGCTTCCTTTGGCATTCGAAGCAATGGCAGCAGTTTGGCAGTAGCTGCAGTCTCAGAGCCTGCAACTGTTGACAAGCTAGATGTGTCATG  GGTTCAGTCCTTGGTAAAGGATACCCCTCCTCAGCATTCTGGGCACTTCAGCTTTGAGGAACAGCAACAGCAATGCCTTATTAACACTGGAGGTTCAGAAATGCTTCCAGCTTGGGTAGAGC AATTATACATTGAGCAGGAGCAGATGGTGGCTTAA 5' Figure S29: RNA-seq read alignment on ncM12154 and its target. Read alignment of lncRNA ncM12154 and its predicted trans-regulatory target gene, Manes.01G160800 in control and cold condition from RNAseq dataset of Li et al [15]. The black peak represents read coverage and abundance. Figure S30: RNA-seq read alignment on ncM32367 and its target. Read alignment of lncRNA ncM32367 and its predicted trans-regulatory target gene, Manes.12G010200, Manes.12G159400, Manes.08G073400 and manes.13G029700 in control, cold and drought condition from RNA-seq dataset of Li et al [15]. The black peak represents read coverage and abundance.