Elsevier

Genomics

Volume 112, Issue 3, May 2020, Pages 2166-2172
Genomics

Original Article
Identifying suitable tools for variant detection and differential gene expression using RNA-seq data

https://doi.org/10.1016/j.ygeno.2019.12.011Get rights and content
Under an Elsevier user license
open archive

Highlights

  • We evaluated the RNAseq pipelines based on hg38 genomic assembly.

  • In spliced alignment, the rate of multi-mapped reads are high compared to hg19.

  • The aligners not able to distinguish the origin of the reads and tend to map with multiple location.

  • GATK/STAR variant calling protocol yields more number of GWAS variants from RNAseq data.

  • Transcriptome based quantification outperforms the genome based quantification methods.

Abstract

Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of variants, and pathways involved in disease mechanisms. However, the observations mainly depend on the aligners/tools and the performance of existing RNA-seq tools on hg38 genome assembly has not yet been documented. In this study, we performed a systematic analysis of various spliced aligners, transcript assembling and variant calling tools based on both genomic assemblies (hg19/hg38) from hippocampus brain tissue. This helps to identify the best possible combination tools for hg38 annotation. In order to evaluate the identified variants from various pipelines, we compared them with expression Quantitative Trait Loci (eQTL) and Genome-Wide Association Study (GWAS). In addition, the identified differentially expressed genes (DG) were compared with microarray studies. From our analysis of variant calling, the combination of GATK (Genome Analysis Tool-kit) and STAR (Spliced Transcripts Alignment to a Reference) protocol yields a larger number of GWAS/eQTL variants compared to SAMtools (Sequence Alignment Map). We also identified a higher number of non-coding variants in hg38 compared to hg19 due to enhanced annotation. In the case of various DG pipelines, we found that the Salmon-based hg38 transcriptomic quantification yields a higher number of reported DG compared to other genome-based quantification methods. This study revealed that higher number of reads maps to multiple location of the genome with hg38 compared to hg19, and these spurious multi-mapped reads may affect the gene quantification techniques. We suggest that it is necessary to develop efficient algorithms, which can handle the multi-mapped reads and improve the performance of genome-based alignment quantification.

Keywords

Variant calling
Differential gene expression
hg38
Multi-mapped reads
Brain tissue

Abbreviation

RNA-seq
RNA-sequence
DG
Differentially expressed gene
GATK
Genome analysis tool kit
GWAS
Genome-wide association study
eQTL
Expression quantitative trait loci
STAR
Spliced Transcripts Alignment to a Reference
SAM
Sequence Alignment Map
PCR
Polymerase chain reaction
GEO
Gene Expression Omnibus

Cited by (0)