Abstract
Protein-coding genes often contain long overlapping open-reading frames (ORFs), which may or may not be functional. Current methods that utilize the signature of purifying selection to detect functional overlapping genes are limited to the analysis of sequences from divergent species, thus rendering them inapplicable to genes found only in closely related sequences. Here, we present a method for the detection of selection signatures on overlapping reading frames by using closely related sequences, and apply the method to several known overlapping genes, and to an overlapping ORF on the negative strand of segment 8 of influenza A virus (NEG8), for which the suggestion has been made that it is functional. We find no evidence that NEG8 is under selection, suggesting that the intact reading frame might be non-functional, although we cannot fully exclude the possibility that the method is not sensitive enough to detect the signature of selection acting on this gene. We present the limitations of the method using known overlapping genes and suggest several approaches to improve it in future studies. Finally, we examine alternative explanations for the sequence conservation of NEG8 in the absence of selection. We show that overlap type and genomic context affect the conservation of intact overlapping ORFs and should therefore be considered in any attempt of estimating the signature of selection in overlapping genes.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Baez M, Taussig R, Zazra JJ, Young JF, Palese P, Reisfeld A, Skalka AM (1980) Complete nucleotide sequence of the influenza A/PR/8/34 virus NS gene and comparison with the NS genes of the A/Udorn/72 and A/FPV/Rostock/34 strains. Nucleic Acids Res 8:5845–5858
Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D (2008) The influenza virus resource at the National Center for Biotechnology Information. J Virol 82:596–601
Campitelli L, Ciccozzi M, Salemi M, Taglia F, Boros S, Donatelli I, Rezza G (2006) H5N1 influenza virus evolution: a comparison of different epidemics in birds and humans (1997–2004). J Gen Virol 87:955–960
Chen W, Calvo PA, Malide D, Gibbs J, Schubert U, Bacik I, Basta S, O’Neill R, Schickli J, Palese P, Henklein P, Bennink JR, Yewdell JW (2001) A novel influenza A virus mitochondrial protein that induces cell death. Nat Med 7:1306–1312
Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A (2007) A first look at ARFome: dual-coding genes in mammalian genomes. PLoS Comput Biol 3:e91
Chung BY, Miller WA, Atkins JF, Firth AE (2008) An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 105:5897–5902
Clifford M, Twigg J, Upton C (2009) Evidence for a novel gene associated with human influenza A viruses. Virol J 6:198
de Groot S, Mailund T, Hein J (2007) Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 23:1080–1089
de Groot S, Mailund T, Lunter G, Hein J (2008) Investigating selection on viruses: a statistical alignment approach. BMC Bioinform 9:304
Delport W, Scheffler K, Seoighe C (2008) Frequent toggling between alternative amino acids is driven by selection in HIV-1. PLoS Pathog 4:e1000242
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214
Firth AE (2008) Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 5:48
Firth AE, Atkins JF (2008a) Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 153:1379–1383
Firth AE, Atkins JF (2008b) Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 5:62
Firth AE, Atkins JF (2009) Analysis of the coding potential of the partially overlapping 3′ ORF in segment 5 of the plant fijiviruses. Virol J 6:32
Firth AE, Brown CM (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 21:282–292
Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinform 7:75
Firth AE, Wang QS, Jan E, Atkins JF (2009) Bioinformatic evidence for a stem-loop structure 5′-adjacent to the IGR-IRES and for an overlapping gene in the bee paralysis dicistroviruses. Virol J 6:193
Fisher R (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specified tree topology. Syst Zool 20:406–416
Fitch WM, Bush RM, Bender CA, Cox NJ (1997) Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci USA 94:7712–7718
Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736
Hein J, Stovlbaek J (1995) A maximum-likelihood approach to analyzing nonoverlapping and overlapping reading frames. J Mol Evol 40:181–189
Holmes EC, Lipman DJ, Zamarin D, Yewdell JW (2006) Comment on “Large-scale sequence analysis of avian influenza isolates”. Science 313:1573 author reply 1573
Hughes AL, Westover K, da Silva J, O’Connor DH, Watkins DI (2001) Simultaneous positive and purifying selection on overlapping reading frames of the tat and vpr genes of simian immunodeficiency virus. J Virol 75:7966–7972
Keese PK, Gibbs A (1992) Origins of genes: “big bang” or continuous creation? Proc Natl Acad Sci USA 89:9489–9493
Krakauer DC (2000) Stability and evolution of overlapping genes. Evol Int J Org Evol 54:731–739
Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G (2004) In search of antisense. Trends Biochem Sci 29:88–94
Li WH, Wu CI, Luo CC (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol 2:150–174
Li KS, Guan Y, Wang J, Smith GJ, Xu KM, Duan L, Rahardjo AP, Puthavathana P, Buranathai C, Nguyen TD, Estoepangestie AT, Chaisingh A, Auewarakul P, Long HT, Hanh NT, Webby RJ, Poon LL, Chen H, Shortridge KF, Yuen KY, Webster RG, Peiris JS (2004) Genesis of a highly pathogenic and potentially pandemic H5N1 influenza virus in eastern Asia. Nature 430:209–213
Liang H, Landweber LF (2006) A genome-wide study of dual coding regions in human alternatively spliced genes. Genome Res 16:190–196
McCauley S, Hein J (2006) Using hidden Markov models and observed evolution to annotate viral genomes. Bioinformatics 22:1308–1316
McCauley S, de Groot S, Mailund T, Hein J (2007) Annotation of selection strengths in viral genomes. Bioinformatics 23:2978–2986
Miyata T, Yasunaga T (1978) Evolution of overlapping genes. Nature 272:532–535
Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418–426
Nekrutenko A, He J (2006) Functionality of unspliced XBP1 is required to explain evolution of overlapping reading frames. Trends Genet 22:645–648
Nekrutenko A, Wadhawan S, Goetting-Minesky P, Makova KD (2005) Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay. PLoS Genet 1:e18
Neuhaus K, Oelke D, Fürst D, Scherer S, Keim DA (2010) Towards automatic detecting of overlapping genes—clustered BLAST analysis of viral genomes. In: Proceedings of the 8th European conference on evolutionary computation, machine learning and data mining in bioinformatics (EvoBIO ‘10)
Obenauer JC, Denson J, Mehta PK, Su X, Mukatira S, Finkelstein DB, Xu X, Wang J, Ma J, Fan Y, Rakestraw KM, Webster RG, Hoffmann E, Krauss S, Zheng J, Zhang Z, Naeve CW (2006) Large-scale sequence analysis of avian influenza isolates. Science 311:1576–1580
Palleja A, Harrington ED, Bork P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 9:335
Pamilo P, Bianchi NO (1993) Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. Mol Biol Evol 10:271–281
Pavesi A (2007) Pattern of nucleotide substitution in the overlapping nonstructural genes of influenza A virus and implication for the genetic diversity of the H5N1 subtype. Gene 402:28–34
Pedersen AM, Jensen JL (2001) A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Mol Biol Evol 18:763–776
Pybus OG, Rambaut A, Belshaw R, Freckleton RP, Drummond AJ, Holmes EC (2007) Phylogenetic evidence for deleterious mutation load in RNA viruses and its contribution to viral evolution. Mol Biol Evol 24:845–852
Ribrioux S, Brungger A, Baumgarten B, Seuwen K, John MR (2008) Bioinformatics prediction of overlapping frameshifted translation products in mammalian transcripts. BMC Genomics 9:122
Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV (2002) Purifying and directional selection in overlapping prokaryotic genes. Trends Genet 18:228–232
Sabath N, Graur D (2010) Detection of functional overlapping genes: simulation and case studies. J Mol Evol 71:308–316
Sabath N, Graur D, Landan G (2008a) Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol Direct 3:36
Sabath N, Landan G, Graur D (2008b) A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS ONE 3:e3996
Sabath N, Price N, Graur D (2009) A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives. Virol J 6:144
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Silke J (1997) The majority of long non-stop reading frames on the antisense strand can be explained by biased codon usage. Gene 194:143–155
Smith TF, Waterman MS (1981) Overlapping genes and information theory. J Theor Biol 91:379–380
Suzuki Y (2006) Natural selection on the influenza virus genome. Mol Biol Evol 23:1902–1911
Suzuki Y, Gojobori T (1999) A method for detecting positive selection at single amino acid sites. Mol Biol Evol 16:1315–1328
Swofford DL (2003) PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, MA
Szklarczyk R, Heringa J, Pond SK, Nekrutenko A (2007) Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function. Proc Natl Acad Sci USA 104:12807–12812
Todd D, Weston JH, Soike D, Smyth JA (2001) Genome sequence determinations and analyses of novel Circoviruses from Goose and Pigeon. Virology 286:354–362
Trifonov V, Rabadan R (2009) The contribution of the PB1-F2 protein to the fitness of Influenza A viruses and its recent evolution in the 2009 Influenza A (H1N1) pandemic virus. PLoS Curr 1:RRN1006
Williams TA, Wolfe KH, Fares MA (2009) No rosetta stone for a sense-antisense origin of aminoacyl tRNA synthetase classes. Mol Biol Evol 26:445–450
Xu H, Wang P, Fu Y, Zheng Y, Tang Q, Si L, You J, Zhang Z, Zhu Y, Zhou L, Wei Z, Lin B, Hu L, Kong X (2010) Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts. Cell Res 20:445–457
Yang Z, Nielsen R (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17:32–43
Zhang J, Nei M (1997) Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J Mol Evol 44(Suppl 1):S139–S146
Zhirnov OP, Poyarkov SV, Vorob’eva IV, Safonova OA, Malyshev NA, Klenk HD (2007) Segment NS of influenza A virus contains an additional gene NSP in positive-sense orientation. Dokl Biochem Biophys 414:127–133
Zhong W, Reche PA, Lai CC, Reinhold B, Reinherz EL (2003) Genome-wide characterization of a viral cytotoxic T lymphocyte epitope repertoire. J Biol Chem 278:45135–45144
Acknowledgments
We thank Dr. Chris Upton for suggesting this problem and providing valuable information. DG and NS were supported by a Small Grant Award from the University of Houston and by the US National Library of Medicine grant LM010009-01 to DG and Giddy Landan.
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sabath, N., Morris, J.S. & Graur, D. Is There a Twelfth Protein-Coding Gene in the Genome of Influenza A? A Selection-Based Approach to the Detection of Overlapping Genes in Closely Related Sequences. J Mol Evol 73, 305–315 (2011). https://doi.org/10.1007/s00239-011-9477-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-011-9477-9