- Split View
-
Views
-
Cite
Cite
Joseph Hughes, Stuart J. Longhorn, Anna Papadopoulou, Kosmas Theodorides, Alessandra de Riva, Monica Mejia-Chang, Peter G. Foster, Alfried P. Vogler, Dense Taxonomic EST Sampling and Its Applications for Molecular Systematics of the Coleoptera (Beetles), Molecular Biology and Evolution, Volume 23, Issue 2, February 2006, Pages 268–278, https://doi.org/10.1093/molbev/msj041
- Share Icon Share
Abstract
Expressed sequence tag (EST) sequences can provide a wealth of data for phylogenetic and genomic studies, but the utility of these resources is restricted by poor taxonomic sampling. Here, we use small EST libraries (<1,000 clones) to generate phylogenetic markers across a broad sample of insects, focusing on the species-rich Coleoptera (beetles). We sequenced over 23,000 ESTs from 34 taxa, which produced 8,728 unique sequences after clustering nonredundant sequences. Between taxa, the sequences could be grouped into 731 gene clusters, with the largest corresponding to mitochondrial DNA transcripts and gene families chymotrypsin, actin, troponin, and tubulin. While levels of paralogy were high in most gene clusters, several midsized clusters including many ribosomal protein (RP) genes appeared to be free of expressed paralogs. To evaluate the utility of EST data for molecular systematics, we curated available transcripts for 66 RP genes from representatives of the major groups of Coleoptera. Using supertree and supermatrix approaches for phylogenetic analysis, the results were consistent with the emerging phylogenetic conclusions about basal relationships in Coleoptera. Numerous small EST libraries from a taxonomically densely sampled lineage can provide a core set of genes that together act as a scaffold in phylogenetic reconstruction, comparative genomics, and studies of gene evolution.
Introduction
Current molecular systematics depends on polymerase chain reaction (PCR) amplification of a few “universal” genes to provide phylogenetic data. However, as the need for sequencing further genes is increasingly evident (Murphy et al. 2001; Wheeler et al. 2001; Philippe et al. 2004; Teeling et al. 2005), expanding PCR approaches to a wider selection of genes becomes difficult because of the need to develop new degenerate primers for the amplification of single-copy loci. While the growing number of genome sequences may eventually be used for phylogenetic inferences across a broad sample of taxa, expressed sequence tags (ESTs) provide a more immediately available source of genomic data (Rudd 2003).
Most publicly available ESTs have been generated for gene discovery or to complement genome sequencing efforts. Some ESTs have been compiled into sets of nonredundant clusters in public databases such as tigr (http://www.tigr.org/), PartiGeneDB (http://www.partigenedb.org/), and UniGene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene). However, species for EST analyses have rarely been selected based on taxonomic criteria, which limits their use for phylogenetic analyses and comparative genomics (but see the recent study of Parkinson et al. 2004b). A concerted effort to enlarge EST databases to encompass disparate taxa should alleviate these problems (Bapteste et al. 2002; Theodorides et al. 2002), and recent compilations of large multigene data sets combined from genome sequences and EST data have demonstrated the power of molecular sequences for resolving deep relationships in eukaryotes (Philippe, Lartillot, and Brinkmann 2005; Rodriguez-Ezpeleta et al. 2005). Here we explore the possibility of generating small EST databases for taxa specifically selected to obtain comprehensive coverage of target groups. We apply this approach to the Coleoptera (beetles), a group that includes nearly one-third of all known species of animals (Erwin 1982; Hammond 1992; Beutel and Haas 2000; Caterino et al. 2002) but where existing EST data are limited.
A critical problem for comparative studies is that ESTs from different taxa may not contain overlapping sets of genes. For example, given a conserved core of 6,089 orthologous genes in the genomes of Drosophila melanogaster and Anopheles gambiae (Zdobnov et al. 2002), the probability that 250 ESTs from each species retrieve a matching ortholog is only 1.68 × 10−3 (250/6,089 × 250/6,089) if all genes are equally represented. The challenge of matching orthologous genes between taxa is amplified by the low expression of many transcripts; sequencing of tens of thousands of ESTs in D. melanogaster (Rubin et al. 2000) or Bombyx mori (Mita et al. 2003) fell short of reaching a full complement of predicted genes. However, even relatively small EST data sets consistently recover a subset of genes with conserved roles in core biological processes such as DNA replication, transcription, and cell metabolism (Hsiao et al. 2001). These genes should be suitable for phylogenetic analysis across a broad sample of taxa.
The use of nuclear genes as a source of phylogenetic data requires an appreciation of the complex nature of genome evolution, involving gene loss, duplications, expansion of gene families, and functional diversification. Assignment of gene orthology is difficult even between fairly closely related groups such as the dipteran A. gambiae and D. melanogaster, where genes diversified independently in each lineage (Zdobnov et al. 2002). Increased taxon sampling can improve the confidence of orthology assignments by identifying the origin of gene copies, facilitating inferences on gene duplications, and clarifying the relationship between gene content and the diversity of lineages (Parkinson et al. 2004b).
Here, we test the utility of dense taxonomic EST sampling, generating relatively small numbers of ESTs (<1,000 clones) for each major group in the focal Coleoptera and several related groups of insects. Existing studies of basal relationships in the Coleoptera to date were based on the mitochondrial cox1 (Howland and Hewitt 1995) and the nuclear small subunit rRNA genes (Caterino et al. 2002), but the use of a single locus in these cases was insufficient to resolve the main phylogenetic questions. Novel sources of phylogenetic information are highly desirable and should preferentially rely on multiple single-copy nuclear genes. Using EST-based approaches that do not rely on degenerate PCR would be a great advantage in this diverse group of insects. We therefore used the Coleoptera to test critical questions about the feasibility of dense EST sampling for molecular systematics. Specifically, we investigated the minimum size of EST libraries necessary to produce sufficient overlap in gene representation between libraries and assessed what kind of genes show the widest representation across small EST libraries. Further, the degree of paralogy in EST data remains insufficiently known but is a critical issue if genes from different species libraries are used for phylogenetic reconstruction. The utility of the approach is shown here by producing phylogenetic trees for the basal groups of Coleoptera from 66 genes coding for ribosomal proteins (RP).
Materials and Methods
Insect Specimens, RNA Extraction, and cDNA Library Construction
Twenty-five species of insects, of which 14 were Coleoptera, and two outgroups were used for library construction (table 1). RNA was obtained from entire adult specimens, except for the use of larval wing discs in the butterfly Papilio dardanus (A. Cieslak and A. P. Vogler, unpublished data) and testes in the tiger beetles Cicindela litorea and Cicindela littoralis (J. Galian and A. P. Vogler, unpublished data). Seven published coleopteran EST libraries (Theodorides et al. 2002) were also included in the analysis. Molecular procedures followed Theodorides et al. (2002) using the SMART method (Clontech Laboratories, Mountain View, Calif.) and cloning of cDNA with the Topo TA cloning kit (Invitrogen, Carlsbad, Calif.). In total, over 31,000 clones were screened, and all plasmid inserts >600 bp were sequenced using BigDye technology on an ABI 3700 automated sequencer.
Class: Order . | . | . | Number of Automatedb . | . | . | Number of Manual . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Suborder: Series: Family . | Species . | Accession Numbera . | Contigs . | Singletons . | Sequences . | Contigs . | Singletons . | Sequences . | ||||
Insecta: Coleoptera | ||||||||||||
Archostemata: Micromalthidae | Micromalthus debilis | CV155742–CV155959 | 24 | 108 | 132 | 26 | 124 | 150 | ||||
Myxophaga: Sphaeriusidae | Sphaerius sp. | CV155960–CV156656 | 159 | 181 | 340 | 193 | 165 | 358 | ||||
Adephaga: Carabidae | Carabus granulatus | BQ474802–BQ475107 | 77 | 90 | 167 | 72 | 89 | 161 | ||||
Adephaga: Cicindelidae | Cicindela campestris | BQ475108–BG475778 | 301 | 64 | 365 | 278 | 58 | 336 | ||||
Cicindela litorea | CV156657–CV157115 | 150 | 72 | 222 | 158 | 63 | 221 | |||||
Cicindela littoralis | CV157116–CV157483 | 86 | 106 | 192 | 106 | 107 | 213 | |||||
Adephaga: Dytiscidae | Meladema coriacea | BQ476741–BQ477288 | 123 | 166 | 289 | 122 | 164 | 286 | ||||
Pol.: Staphyliniformia: Georissidae | Georissus sp. | CV157484–CV158376 | 224 | 161 | 385 | 258 | 133 | 391 | ||||
Pol.: Staphyliniformia: Silphidae | Silpha atrata | CV158377–CV158395 | 5 | 9 | 14 | 7 | 10 | 17 | ||||
Pol.: Staphyliniformia: Histeridae | Hister sp. | CV158396–CV159219 | 185 | 141 | 326 | 192 | 130 | 322 | ||||
Pol.: Scarabaeiformia: Scarabaeidae | Scarabaeus laticollis | CV159220–CV160155 | 261 | 119 | 380 | 226 | 75 | 301 | ||||
Pol.: Elateriformia: Elateridae | Agriotes lineatus | CV160156–CV160927 | 171 | 208 | 379 | 203 | 198 | 401 | ||||
Pol.: Elateriformia: Buprestidae | Julodis onopordi | CV152433–CV153501 | 291 | 96 | 387 | 262 | 65 | 327 | ||||
Pol.: Elateriformia: Eucinetidae | Eucinetus sp. | CV153502–CV154310 | 179 | 150 | 329 | 203 | 114 | 317 | ||||
Pol.: Elateriformia: Dascillidae | Dascillus cervinus | CV154311–CV154939 | 194 | 135 | 329 | 197 | 128 | 325 | ||||
Pol.: Cucujiformia: Bipyllidae | Biphyllus lunatus | BQ474131–BQ474801 | 186 | 63 | 249 | 185 | 49 | 234 | ||||
Pol.: Cucujiformia: Mycetophagidae | Mycetophagus quadripustulatus | CV154940–CV155674 | 193 | 188 | 381 | 191 | 210 | 401 | ||||
Pol.: Cucujiformia: Tenebrionidae | Tribolium confusum | CV155675–CV155741 | 4 | 54 | 58 | 5 | 59 | 64 | ||||
Pol.: Cucujiformia: Chrysomelidae | Timarcha balearica | AJ537611–AJ538039 | 55 | 210 | 265 | 170 | 97 | 267 | ||||
Pol.: Cucujiformia: Curculionidae | Curculio glandium | BQ476162–BQ476740 | 142 | 86 | 228 | 162 | 60 | 222 | ||||
Pol.: Cucujiformia: Anthribidae | Platystomos albinus | BQ476142–BQ476161 | 108 | 34 | 142 | 99 | 31 | 130 | ||||
Insecta: Lepidoptera | ||||||||||||
Noctuidae | Euclidea glyphica | CV174082–CV174651 | 186 | 80 | 266 | 197 | 66 | 263 | ||||
Papilionidae | Papilio dardanus | CV174652–CV175351 | 163 | 243 | 406 | 219 | 115 | 334 | ||||
Insecta: Strepsiptera | ||||||||||||
Mengenillidae | Mengenilla chobauti | CD485368–CD485367 | 51 | 280 | 331 | 57 | 288 | 345 | ||||
Mengenillidae | Eoxenos laboulbenei | CD492361–CD492706 | 54 | 321 | 375 | 54 | 335 | 389 | ||||
Insecta: Raphidiodea | ||||||||||||
Raphidiidae | Phaeostigma major | CV176478–CV176535 | 3 | 51 | 54 | 8 | 47 | 55 | ||||
Insecta: Trichoptera | ||||||||||||
Limnephilidae | Limnephilus flavicornis | CV176536–CV176696 | 23 | 100 | 123 | 25 | 95 | 120 | ||||
Insecta: Mecoptera | ||||||||||||
Panorpidae | Panorpa cf. vulgaris | CV176697–CV177401 | 240 | 100 | 340 | 246 | 72 | 318 | ||||
Insecta: Orthoptera | ||||||||||||
Gryllidae | Gryllus bimaculatus | CV175352–CV175963 | 223 | 90 | 313 | 238 | 72 | 310 | ||||
Insecta: Dictyoptera | ||||||||||||
Mantidae | Sphodromantis centralis | CV175964–CV176136 | 14 | 86 | 100 | 17 | 94 | 111 | ||||
Insecta: Hemiptera | ||||||||||||
Aleyrodidae | Aleurothrixus sp | CV176137–CV176477 | 67 | 165 | 232 | 59 | 173 | 232 | ||||
Insecta: Thysanura | ||||||||||||
Lepismatidae | Lepisma aurea | CV177402–CV177826 | 63 | 273 | 336 | 65 | 275 | 340 | ||||
Outgroups | ||||||||||||
Arachnida: Araneae: Dysderidae | Dysdera erythrina | CV177827–CV178552 | 197 | 65 | 262 | 213 | 44 | 257 | ||||
Diplopoda: Julida | Julida sp. | CV178553–CV179005 | 122 | 91 | 213 | 142 | 68 | 210 | ||||
Total | 34 | 4,524 | 4,386 | 8,910 | 4,855 | 3,873 | 8,728 |
Class: Order . | . | . | Number of Automatedb . | . | . | Number of Manual . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Suborder: Series: Family . | Species . | Accession Numbera . | Contigs . | Singletons . | Sequences . | Contigs . | Singletons . | Sequences . | ||||
Insecta: Coleoptera | ||||||||||||
Archostemata: Micromalthidae | Micromalthus debilis | CV155742–CV155959 | 24 | 108 | 132 | 26 | 124 | 150 | ||||
Myxophaga: Sphaeriusidae | Sphaerius sp. | CV155960–CV156656 | 159 | 181 | 340 | 193 | 165 | 358 | ||||
Adephaga: Carabidae | Carabus granulatus | BQ474802–BQ475107 | 77 | 90 | 167 | 72 | 89 | 161 | ||||
Adephaga: Cicindelidae | Cicindela campestris | BQ475108–BG475778 | 301 | 64 | 365 | 278 | 58 | 336 | ||||
Cicindela litorea | CV156657–CV157115 | 150 | 72 | 222 | 158 | 63 | 221 | |||||
Cicindela littoralis | CV157116–CV157483 | 86 | 106 | 192 | 106 | 107 | 213 | |||||
Adephaga: Dytiscidae | Meladema coriacea | BQ476741–BQ477288 | 123 | 166 | 289 | 122 | 164 | 286 | ||||
Pol.: Staphyliniformia: Georissidae | Georissus sp. | CV157484–CV158376 | 224 | 161 | 385 | 258 | 133 | 391 | ||||
Pol.: Staphyliniformia: Silphidae | Silpha atrata | CV158377–CV158395 | 5 | 9 | 14 | 7 | 10 | 17 | ||||
Pol.: Staphyliniformia: Histeridae | Hister sp. | CV158396–CV159219 | 185 | 141 | 326 | 192 | 130 | 322 | ||||
Pol.: Scarabaeiformia: Scarabaeidae | Scarabaeus laticollis | CV159220–CV160155 | 261 | 119 | 380 | 226 | 75 | 301 | ||||
Pol.: Elateriformia: Elateridae | Agriotes lineatus | CV160156–CV160927 | 171 | 208 | 379 | 203 | 198 | 401 | ||||
Pol.: Elateriformia: Buprestidae | Julodis onopordi | CV152433–CV153501 | 291 | 96 | 387 | 262 | 65 | 327 | ||||
Pol.: Elateriformia: Eucinetidae | Eucinetus sp. | CV153502–CV154310 | 179 | 150 | 329 | 203 | 114 | 317 | ||||
Pol.: Elateriformia: Dascillidae | Dascillus cervinus | CV154311–CV154939 | 194 | 135 | 329 | 197 | 128 | 325 | ||||
Pol.: Cucujiformia: Bipyllidae | Biphyllus lunatus | BQ474131–BQ474801 | 186 | 63 | 249 | 185 | 49 | 234 | ||||
Pol.: Cucujiformia: Mycetophagidae | Mycetophagus quadripustulatus | CV154940–CV155674 | 193 | 188 | 381 | 191 | 210 | 401 | ||||
Pol.: Cucujiformia: Tenebrionidae | Tribolium confusum | CV155675–CV155741 | 4 | 54 | 58 | 5 | 59 | 64 | ||||
Pol.: Cucujiformia: Chrysomelidae | Timarcha balearica | AJ537611–AJ538039 | 55 | 210 | 265 | 170 | 97 | 267 | ||||
Pol.: Cucujiformia: Curculionidae | Curculio glandium | BQ476162–BQ476740 | 142 | 86 | 228 | 162 | 60 | 222 | ||||
Pol.: Cucujiformia: Anthribidae | Platystomos albinus | BQ476142–BQ476161 | 108 | 34 | 142 | 99 | 31 | 130 | ||||
Insecta: Lepidoptera | ||||||||||||
Noctuidae | Euclidea glyphica | CV174082–CV174651 | 186 | 80 | 266 | 197 | 66 | 263 | ||||
Papilionidae | Papilio dardanus | CV174652–CV175351 | 163 | 243 | 406 | 219 | 115 | 334 | ||||
Insecta: Strepsiptera | ||||||||||||
Mengenillidae | Mengenilla chobauti | CD485368–CD485367 | 51 | 280 | 331 | 57 | 288 | 345 | ||||
Mengenillidae | Eoxenos laboulbenei | CD492361–CD492706 | 54 | 321 | 375 | 54 | 335 | 389 | ||||
Insecta: Raphidiodea | ||||||||||||
Raphidiidae | Phaeostigma major | CV176478–CV176535 | 3 | 51 | 54 | 8 | 47 | 55 | ||||
Insecta: Trichoptera | ||||||||||||
Limnephilidae | Limnephilus flavicornis | CV176536–CV176696 | 23 | 100 | 123 | 25 | 95 | 120 | ||||
Insecta: Mecoptera | ||||||||||||
Panorpidae | Panorpa cf. vulgaris | CV176697–CV177401 | 240 | 100 | 340 | 246 | 72 | 318 | ||||
Insecta: Orthoptera | ||||||||||||
Gryllidae | Gryllus bimaculatus | CV175352–CV175963 | 223 | 90 | 313 | 238 | 72 | 310 | ||||
Insecta: Dictyoptera | ||||||||||||
Mantidae | Sphodromantis centralis | CV175964–CV176136 | 14 | 86 | 100 | 17 | 94 | 111 | ||||
Insecta: Hemiptera | ||||||||||||
Aleyrodidae | Aleurothrixus sp | CV176137–CV176477 | 67 | 165 | 232 | 59 | 173 | 232 | ||||
Insecta: Thysanura | ||||||||||||
Lepismatidae | Lepisma aurea | CV177402–CV177826 | 63 | 273 | 336 | 65 | 275 | 340 | ||||
Outgroups | ||||||||||||
Arachnida: Araneae: Dysderidae | Dysdera erythrina | CV177827–CV178552 | 197 | 65 | 262 | 213 | 44 | 257 | ||||
Diplopoda: Julida | Julida sp. | CV178553–CV179005 | 122 | 91 | 213 | 142 | 68 | 210 | ||||
Total | 34 | 4,524 | 4,386 | 8,910 | 4,855 | 3,873 | 8,728 |
Mitochondrial sequences have been removed from the submitted sequences but have been used in all analyses.
Number of TUGs, singletons, and unique sequences after EST vector trimming and sequence quality control for the automated and manual approaches.
Class: Order . | . | . | Number of Automatedb . | . | . | Number of Manual . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Suborder: Series: Family . | Species . | Accession Numbera . | Contigs . | Singletons . | Sequences . | Contigs . | Singletons . | Sequences . | ||||
Insecta: Coleoptera | ||||||||||||
Archostemata: Micromalthidae | Micromalthus debilis | CV155742–CV155959 | 24 | 108 | 132 | 26 | 124 | 150 | ||||
Myxophaga: Sphaeriusidae | Sphaerius sp. | CV155960–CV156656 | 159 | 181 | 340 | 193 | 165 | 358 | ||||
Adephaga: Carabidae | Carabus granulatus | BQ474802–BQ475107 | 77 | 90 | 167 | 72 | 89 | 161 | ||||
Adephaga: Cicindelidae | Cicindela campestris | BQ475108–BG475778 | 301 | 64 | 365 | 278 | 58 | 336 | ||||
Cicindela litorea | CV156657–CV157115 | 150 | 72 | 222 | 158 | 63 | 221 | |||||
Cicindela littoralis | CV157116–CV157483 | 86 | 106 | 192 | 106 | 107 | 213 | |||||
Adephaga: Dytiscidae | Meladema coriacea | BQ476741–BQ477288 | 123 | 166 | 289 | 122 | 164 | 286 | ||||
Pol.: Staphyliniformia: Georissidae | Georissus sp. | CV157484–CV158376 | 224 | 161 | 385 | 258 | 133 | 391 | ||||
Pol.: Staphyliniformia: Silphidae | Silpha atrata | CV158377–CV158395 | 5 | 9 | 14 | 7 | 10 | 17 | ||||
Pol.: Staphyliniformia: Histeridae | Hister sp. | CV158396–CV159219 | 185 | 141 | 326 | 192 | 130 | 322 | ||||
Pol.: Scarabaeiformia: Scarabaeidae | Scarabaeus laticollis | CV159220–CV160155 | 261 | 119 | 380 | 226 | 75 | 301 | ||||
Pol.: Elateriformia: Elateridae | Agriotes lineatus | CV160156–CV160927 | 171 | 208 | 379 | 203 | 198 | 401 | ||||
Pol.: Elateriformia: Buprestidae | Julodis onopordi | CV152433–CV153501 | 291 | 96 | 387 | 262 | 65 | 327 | ||||
Pol.: Elateriformia: Eucinetidae | Eucinetus sp. | CV153502–CV154310 | 179 | 150 | 329 | 203 | 114 | 317 | ||||
Pol.: Elateriformia: Dascillidae | Dascillus cervinus | CV154311–CV154939 | 194 | 135 | 329 | 197 | 128 | 325 | ||||
Pol.: Cucujiformia: Bipyllidae | Biphyllus lunatus | BQ474131–BQ474801 | 186 | 63 | 249 | 185 | 49 | 234 | ||||
Pol.: Cucujiformia: Mycetophagidae | Mycetophagus quadripustulatus | CV154940–CV155674 | 193 | 188 | 381 | 191 | 210 | 401 | ||||
Pol.: Cucujiformia: Tenebrionidae | Tribolium confusum | CV155675–CV155741 | 4 | 54 | 58 | 5 | 59 | 64 | ||||
Pol.: Cucujiformia: Chrysomelidae | Timarcha balearica | AJ537611–AJ538039 | 55 | 210 | 265 | 170 | 97 | 267 | ||||
Pol.: Cucujiformia: Curculionidae | Curculio glandium | BQ476162–BQ476740 | 142 | 86 | 228 | 162 | 60 | 222 | ||||
Pol.: Cucujiformia: Anthribidae | Platystomos albinus | BQ476142–BQ476161 | 108 | 34 | 142 | 99 | 31 | 130 | ||||
Insecta: Lepidoptera | ||||||||||||
Noctuidae | Euclidea glyphica | CV174082–CV174651 | 186 | 80 | 266 | 197 | 66 | 263 | ||||
Papilionidae | Papilio dardanus | CV174652–CV175351 | 163 | 243 | 406 | 219 | 115 | 334 | ||||
Insecta: Strepsiptera | ||||||||||||
Mengenillidae | Mengenilla chobauti | CD485368–CD485367 | 51 | 280 | 331 | 57 | 288 | 345 | ||||
Mengenillidae | Eoxenos laboulbenei | CD492361–CD492706 | 54 | 321 | 375 | 54 | 335 | 389 | ||||
Insecta: Raphidiodea | ||||||||||||
Raphidiidae | Phaeostigma major | CV176478–CV176535 | 3 | 51 | 54 | 8 | 47 | 55 | ||||
Insecta: Trichoptera | ||||||||||||
Limnephilidae | Limnephilus flavicornis | CV176536–CV176696 | 23 | 100 | 123 | 25 | 95 | 120 | ||||
Insecta: Mecoptera | ||||||||||||
Panorpidae | Panorpa cf. vulgaris | CV176697–CV177401 | 240 | 100 | 340 | 246 | 72 | 318 | ||||
Insecta: Orthoptera | ||||||||||||
Gryllidae | Gryllus bimaculatus | CV175352–CV175963 | 223 | 90 | 313 | 238 | 72 | 310 | ||||
Insecta: Dictyoptera | ||||||||||||
Mantidae | Sphodromantis centralis | CV175964–CV176136 | 14 | 86 | 100 | 17 | 94 | 111 | ||||
Insecta: Hemiptera | ||||||||||||
Aleyrodidae | Aleurothrixus sp | CV176137–CV176477 | 67 | 165 | 232 | 59 | 173 | 232 | ||||
Insecta: Thysanura | ||||||||||||
Lepismatidae | Lepisma aurea | CV177402–CV177826 | 63 | 273 | 336 | 65 | 275 | 340 | ||||
Outgroups | ||||||||||||
Arachnida: Araneae: Dysderidae | Dysdera erythrina | CV177827–CV178552 | 197 | 65 | 262 | 213 | 44 | 257 | ||||
Diplopoda: Julida | Julida sp. | CV178553–CV179005 | 122 | 91 | 213 | 142 | 68 | 210 | ||||
Total | 34 | 4,524 | 4,386 | 8,910 | 4,855 | 3,873 | 8,728 |
Class: Order . | . | . | Number of Automatedb . | . | . | Number of Manual . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Suborder: Series: Family . | Species . | Accession Numbera . | Contigs . | Singletons . | Sequences . | Contigs . | Singletons . | Sequences . | ||||
Insecta: Coleoptera | ||||||||||||
Archostemata: Micromalthidae | Micromalthus debilis | CV155742–CV155959 | 24 | 108 | 132 | 26 | 124 | 150 | ||||
Myxophaga: Sphaeriusidae | Sphaerius sp. | CV155960–CV156656 | 159 | 181 | 340 | 193 | 165 | 358 | ||||
Adephaga: Carabidae | Carabus granulatus | BQ474802–BQ475107 | 77 | 90 | 167 | 72 | 89 | 161 | ||||
Adephaga: Cicindelidae | Cicindela campestris | BQ475108–BG475778 | 301 | 64 | 365 | 278 | 58 | 336 | ||||
Cicindela litorea | CV156657–CV157115 | 150 | 72 | 222 | 158 | 63 | 221 | |||||
Cicindela littoralis | CV157116–CV157483 | 86 | 106 | 192 | 106 | 107 | 213 | |||||
Adephaga: Dytiscidae | Meladema coriacea | BQ476741–BQ477288 | 123 | 166 | 289 | 122 | 164 | 286 | ||||
Pol.: Staphyliniformia: Georissidae | Georissus sp. | CV157484–CV158376 | 224 | 161 | 385 | 258 | 133 | 391 | ||||
Pol.: Staphyliniformia: Silphidae | Silpha atrata | CV158377–CV158395 | 5 | 9 | 14 | 7 | 10 | 17 | ||||
Pol.: Staphyliniformia: Histeridae | Hister sp. | CV158396–CV159219 | 185 | 141 | 326 | 192 | 130 | 322 | ||||
Pol.: Scarabaeiformia: Scarabaeidae | Scarabaeus laticollis | CV159220–CV160155 | 261 | 119 | 380 | 226 | 75 | 301 | ||||
Pol.: Elateriformia: Elateridae | Agriotes lineatus | CV160156–CV160927 | 171 | 208 | 379 | 203 | 198 | 401 | ||||
Pol.: Elateriformia: Buprestidae | Julodis onopordi | CV152433–CV153501 | 291 | 96 | 387 | 262 | 65 | 327 | ||||
Pol.: Elateriformia: Eucinetidae | Eucinetus sp. | CV153502–CV154310 | 179 | 150 | 329 | 203 | 114 | 317 | ||||
Pol.: Elateriformia: Dascillidae | Dascillus cervinus | CV154311–CV154939 | 194 | 135 | 329 | 197 | 128 | 325 | ||||
Pol.: Cucujiformia: Bipyllidae | Biphyllus lunatus | BQ474131–BQ474801 | 186 | 63 | 249 | 185 | 49 | 234 | ||||
Pol.: Cucujiformia: Mycetophagidae | Mycetophagus quadripustulatus | CV154940–CV155674 | 193 | 188 | 381 | 191 | 210 | 401 | ||||
Pol.: Cucujiformia: Tenebrionidae | Tribolium confusum | CV155675–CV155741 | 4 | 54 | 58 | 5 | 59 | 64 | ||||
Pol.: Cucujiformia: Chrysomelidae | Timarcha balearica | AJ537611–AJ538039 | 55 | 210 | 265 | 170 | 97 | 267 | ||||
Pol.: Cucujiformia: Curculionidae | Curculio glandium | BQ476162–BQ476740 | 142 | 86 | 228 | 162 | 60 | 222 | ||||
Pol.: Cucujiformia: Anthribidae | Platystomos albinus | BQ476142–BQ476161 | 108 | 34 | 142 | 99 | 31 | 130 | ||||
Insecta: Lepidoptera | ||||||||||||
Noctuidae | Euclidea glyphica | CV174082–CV174651 | 186 | 80 | 266 | 197 | 66 | 263 | ||||
Papilionidae | Papilio dardanus | CV174652–CV175351 | 163 | 243 | 406 | 219 | 115 | 334 | ||||
Insecta: Strepsiptera | ||||||||||||
Mengenillidae | Mengenilla chobauti | CD485368–CD485367 | 51 | 280 | 331 | 57 | 288 | 345 | ||||
Mengenillidae | Eoxenos laboulbenei | CD492361–CD492706 | 54 | 321 | 375 | 54 | 335 | 389 | ||||
Insecta: Raphidiodea | ||||||||||||
Raphidiidae | Phaeostigma major | CV176478–CV176535 | 3 | 51 | 54 | 8 | 47 | 55 | ||||
Insecta: Trichoptera | ||||||||||||
Limnephilidae | Limnephilus flavicornis | CV176536–CV176696 | 23 | 100 | 123 | 25 | 95 | 120 | ||||
Insecta: Mecoptera | ||||||||||||
Panorpidae | Panorpa cf. vulgaris | CV176697–CV177401 | 240 | 100 | 340 | 246 | 72 | 318 | ||||
Insecta: Orthoptera | ||||||||||||
Gryllidae | Gryllus bimaculatus | CV175352–CV175963 | 223 | 90 | 313 | 238 | 72 | 310 | ||||
Insecta: Dictyoptera | ||||||||||||
Mantidae | Sphodromantis centralis | CV175964–CV176136 | 14 | 86 | 100 | 17 | 94 | 111 | ||||
Insecta: Hemiptera | ||||||||||||
Aleyrodidae | Aleurothrixus sp | CV176137–CV176477 | 67 | 165 | 232 | 59 | 173 | 232 | ||||
Insecta: Thysanura | ||||||||||||
Lepismatidae | Lepisma aurea | CV177402–CV177826 | 63 | 273 | 336 | 65 | 275 | 340 | ||||
Outgroups | ||||||||||||
Arachnida: Araneae: Dysderidae | Dysdera erythrina | CV177827–CV178552 | 197 | 65 | 262 | 213 | 44 | 257 | ||||
Diplopoda: Julida | Julida sp. | CV178553–CV179005 | 122 | 91 | 213 | 142 | 68 | 210 | ||||
Total | 34 | 4,524 | 4,386 | 8,910 | 4,855 | 3,873 | 8,728 |
Mitochondrial sequences have been removed from the submitted sequences but have been used in all analyses.
Number of TUGs, singletons, and unique sequences after EST vector trimming and sequence quality control for the automated and manual approaches.
For most libraries, ESTs were sequenced in both directions to provide longer and more accurate sequences which is critical for phylogenetic analysis. Sequencher 4.1 (Gene Codes Corp., Ann Harbor, Mich.) was used for sequence editing, including the automated removal of vector sequences and poor-quality data. Sequences were further edited manually to recall ambiguities and resolve conflicting base calls in forward and reverse reads where multiple clones were available. Edited sequences were clustered into contigs in Sequencher at high stringency to obtain “tentative unique genes” (TUGs) for each library and exported for further analysis. We also used a fully automated method for sequence editing with the Trace2dbest perl script (Parkinson and Blaxter 2004), based on the Phred base-calling software (Ewing and Green 1998; Ewing et al. 1998). The PartiGene script (Parkinson et al. 2004a) was used to cluster redundant sequences using the CLOBB EST software (Parkinson, Guiliano, and Blaxter 2002) and Phrap (P. Green, personal communication). The manually edited EST sequences were submitted to the National Center for Biotechnology Information EST database (table 1, lineage and accession numbers). Mitochondrial and rRNA transcripts were excluded from GenBank EST submissions, but the full data are available from http://www.bio.ic.ac.uk/research/apvogler/vogler.htm.
Sequence Clustering and Phylogenetic Analysis
EST sequences were subjected to Blast comparisons against GenBank using BlastN (nucleotide–nucleotide searches) and TBlastX (conceptual protein translations) (Altschul et al. 1990). Where significant matches were found (E value >10−5) and putative gene identity was established by these sequence comparisons, TUGs were assigned gene ontology (GO) classifications by comparing deduced amino acids with the Uniprot database and parsing of the Uniprot GO table (http://www.ebi.ac.uk/uniprot/index.html). Gene classifications were accepted if our data had 30% similarity over >100 amino acids with curated data and a significant E value (>10−5) in a TBlastX search. When parsed for GO classification, we accepted identity from lower TBlastX matches if top matches did not contain GO classifications. TBlastX searches were used to calculate the proportion of TUGs which matched sets of proteins from D. melanogaster, Homo sapiens, and Caenorhabditis elegans with E values <10−5.
For clustering, similarity between TUGs within and between libraries was determined using TBlastX searches. For each TUG, its TBlastX hits were examined, and if the similarity was above a specified threshold, then a cluster was made. These first-pass clusters contained many TUGs in more than one cluster, so these clusters were themselves iteratively merged and redundant sequences removed, until there were no sequences contained in more than one cluster. The Python scripts used for clustering are available from PGF on request. TUGs clustered in searches were translated in Sequencher and aligned with ClustalX (Thompson et al. 1997).
For phylogenetic analysis from these clusters, we focused specifically on the RP genes. After minor sequence editing and verification of transcript fidelity, the most complete amino acid sequences were used for conceptual translations using ClustalX and submitted to European Molecular Biology Laboratory nr databases (Supplementary Material A, see Supplementary Material online). Three further Coleoptera, Tribolium castaneum (J. Savard and D. Tautz, personal communication), Callosobruchus maculatus (J. H. F. Pedra, A. Brandt, R. Westerman, H.-M. Li, J. Romero-Severson, L. L. Murdock, and B. R. Pittendrigh, personal communication), and Ips pini (Eigenheer et al. 2003) with public ESTs in GenBank were also searched for RPs and used in the phylogenetic analysis. After excluding the smallest EST libraries (Silpha atrata and Tribolium confusum), we concatenated data from 66 RPs found in four or more species of Coleoptera, which correspond to minimal phylogenetic clusters (sensuDriskell et al. 2004). Regions of uncertain amino acid alignment homology were removed using Gblocks 0.91b (Castresana 2000). Phylogenetic analysis was conducted with parsimony, with a heuristic search strategy (random taxon addition, 100 replicates; Tree Bisection-Reconnection branch swapping). We used PAUP* to calculate nonparametric bootstrap scores (1,000 replicates) and Bremer support, facilitated by TreeRot 2.0 (Sorenson 1999). Phyml v2.4.4 (Guindon and Gascuel 2003) was used for maximum likelihood (ML) analyses with 100 bootstraps, using both the WAG substitution model, suitable for soluble proteins such as RPs, and the Dayhoff model selected with ModelGenerator (http://bioinf.nuim.ie/software/modelgenerator). With both models, we accounted for the among-site rate variation using a gamma distribution and a proportion of invariant sites (pInvar). Bayesian analyses were also conducted using the latter model on the concatenated multigene data set with MrBayes v3.1.1 (Huelsenbeck and Ronquist 2001). Nodal support was assessed as posterior probability from two independent runs each with four chains of 1,000,000 generations in the Markov chain Monte Carlo procedure (the first 500,000 generations were discarded as “burn-in”). In an alternative supertree approach, the same amino acid alignments from each RP gene were first used individually for parsimony analysis using branch and bound searches. For each RP, the strict consensus tree was saved to the file, and resolved nodes were recoded as binary state using matrix representation with parsimony coding (Baum 1992; Ragan 1992) with Clann 2.0.1 (Creevey and McInerney 2005).
Results
Characteristics of the Libraries
Among the EST libraries for 32 insect species, plus two arthropod outgroups (a spider and millipede), we sampled 20 species of Coleoptera, with representatives from each of the four suborders, and a selection of all major groups (Series) in the large suborder Polyphaga. Together, the libraries contained 23,026 EST sequences with high-quality base calls, ranging from 29 to 1,341 ESTs per taxon (table 1). In total, 8,728 TUGs were obtained after semimanual editing (Materials and Methods). Automated editing of the same data produced 8,910 unique sequences, with ∼7% fewer sequences in redundant groups and 12% more singletons (table 1). Overall sequence similarity and statistical analysis (below) produced similar results as the manually edited sequences, and hence, the automated EST clustering appears sufficiently reliable for the initial compilation of large data sets, in particular as sequence quality increases with greater number of ESTs in a TUG.
According to the GO categorization of the 34 EST libraries (table 2), the nuclear genes most frequently detected were “housekeeping” genes, including RPs and enzymes. Transcripts from mitochondrial genes were also prevalent, with an average of six mitochondrial transcripts per taxon. Although mitochondrial sequences present in EST libraries are an artifact of the reverse transcriptase–PCR procedure, they provide valuable phylogenetic markers. In contrast, relatively few developmental proteins, transcription factors, and elongation factors (EFs) were detected among ESTs. A large number of ESTs showed significant similarity to genes of unknown function in the Uniprot database (5%–37% depending on the library), and in each taxon, a large proportion of the sequences (35%–80%) did not have any significant public database matches within the search parameters.
. | GO Classification . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Percent Matchesc . | . | . | . | . | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Library . | Enzyme . | RP . | Mitochondrial Gene . | Transport . | Nucleic acid Bindinga . | Chaperone/Heat Shock . | Motor . | Protein Kinase/Phosphatase . | Developmental Protein . | Axon/Neurotransmitter . | EF . | Signal Transduction . | Cell Cycle . | Proteasome . | Translation Initiation Factor . | Actin Binding . | Transcription Factor . | Cell Adhesion . | Unknown . | No Matchb . | Total . | To D. melanogaster . | To H. sapiens . | To C. elegans . | To the other libraries . | Intralibrary . | ||||||||||||||||||||||||
Micromalthus debillis | 12 | 7 | 6 | 2 | 3 | 1 | 1 | 2 | 33 | 83 | 150 | 46 | 36 | 31 | 24 | 15 | ||||||||||||||||||||||||||||||||||
Sphaerius sp. | 14 | 21 | 5 | 9 | 2 | 1 | 1 | 1 | 1 | 1 | 75 | 227 | 358 | 45 | 36 | 31 | 18 | 0 | ||||||||||||||||||||||||||||||||
Carabus granulatus | 7 | 18 | 6 | 7 | 3 | 1 | 3 | 1 | 3 | 4 | 1 | 1 | 50 | 56 | 161 | 63 | 58 | 54 | 47 | 13 | ||||||||||||||||||||||||||||||
Cicindela campestris | 22 | 13 | 7 | 15 | 1 | 1 | 2 | 2 | 1 | 1 | 3 | 1 | 77 | 190 | 336 | 48 | 42 | 36 | 28 | 9 | ||||||||||||||||||||||||||||||
Cicindela litorea | 7 | 2 | 2 | 8 | 5 | 5 | 1 | 40 | 151 | 221 | 38 | 37 | 25 | 16 | 8 | |||||||||||||||||||||||||||||||||||
Cicindela littoralis | 10 | 4 | 8 | 5 | 6 | 1 | 1 | 1 | 2 | 47 | 128 | 213 | 46 | 42 | 33 | 30 | 16 | |||||||||||||||||||||||||||||||||
Meladema coriacea | 25 | 8 | 10 | 6 | 4 | 4 | 1 | 5 | 3 | 4 | 1 | 84 | 131 | 286 | 52 | 49 | 42 | 25 | 14 | |||||||||||||||||||||||||||||||
Georissus sp. | 28 | 18 | 10 | 10 | 4 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 98 | 211 | 391 | 52 | 45 | 37 | 27 | 15 | ||||||||||||||||||||||||||||||
Silpha atrata | 1 | 3 | 1 | 3 | 9 | 17 | 29 | 35 | 18 | 53 | 0 | |||||||||||||||||||||||||||||||||||||||
Hister sp. | 25 | 16 | 5 | 7 | 6 | 3 | 3 | 1 | 1 | 2 | 1 | 2 | 2 | 84 | 164 | 322 | 58 | 54 | 47 | 25 | 10 | |||||||||||||||||||||||||||||
Scarabaeus laticollis | 21 | 15 | 8 | 12 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 113 | 121 | 301 | 64 | 53 | 48 | 26 | 19 | |||||||||||||||||||||||||||||
Agriotes lineatus | 22 | 12 | 5 | 5 | 3 | 1 | 6 | 4 | 4 | 4 | 1 | 3 | 1 | 83 | 247 | 401 | 45 | 39 | 35 | 23 | 20 | |||||||||||||||||||||||||||||
Julodis onopordi | 15 | 3 | 11 | 4 | 2 | 1 | 1 | 3 | 3 | 4 | 1 | 100 | 179 | 327 | 46 | 42 | 33 | 19 | 14 | |||||||||||||||||||||||||||||||
Eucinetus sp. | 13 | 13 | 11 | 14 | 2 | 3 | 1 | 3 | 4 | 1 | 1 | 1 | 2 | 78 | 170 | 317 | 60 | 50 | 41 | 24 | 13 | |||||||||||||||||||||||||||||
Dascillus cervinus | 15 | 11 | 5 | 6 | 10 | 1 | 2 | 2 | 3 | 1 | 1 | 2 | 104 | 162 | 325 | 61 | 35 | 23 | 23 | 11 | ||||||||||||||||||||||||||||||
Biphyllus lunatus | 14 | 19 | 4 | 4 | 3 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 81 | 97 | 234 | 67 | 58 | 51 | 37 | 15 | |||||||||||||||||||||||||||||
Mycetophagus 4-pustulatus | 36 | 9 | 5 | 11 | 4 | 7 | 2 | 1 | 3 | 1 | 1 | 107 | 214 | 401 | 57 | 50 | 42 | 25 | 13 | |||||||||||||||||||||||||||||||
Tribolium confusum | 3 | 2 | 2 | 1 | 1 | 2 | 22 | 31 | 64 | 56 | 50 | 38 | 25 | 6 | ||||||||||||||||||||||||||||||||||||
Timarcha balearica | 27 | 21 | 6 | 6 | 5 | 1 | 2 | 1 | 1 | 1 | 1 | 77 | 118 | 267 | 60 | 54 | 49 | 27 | 14 | |||||||||||||||||||||||||||||||
Curculio glandium | 9 | 17 | 8 | 4 | 2 | 3 | 1 | 1 | 2 | 1 | 2 | 76 | 95 | 221 | 66 | 56 | 39 | 37 | 10 | |||||||||||||||||||||||||||||||
Platystomos albinus | 5 | 4 | 10 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 35 | 65 | 130 | 47 | 44 | 42 | 25 | 18 | ||||||||||||||||||||||||||||||||
Euclidia glyphica | 18 | 10 | 3 | 6 | 1 | 4 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 38 | 165 | 255 | 44 | 34 | 29 | 23 | 17 | |||||||||||||||||||||||||||||
Papilio dardanus | 18 | 43 | 15 | 10 | 3 | 3 | 2 | 2 | 4 | 1 | 1 | 2 | 99 | 131 | 334 | 63 | 58 | 52 | 36 | 23 | ||||||||||||||||||||||||||||||
Mengenilla chobauti | 20 | 12 | 6 | 6 | 5 | 7 | 3 | 3 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 87 | 187 | 346 | 56 | 48 | 41 | 23 | 6 | |||||||||||||||||||||||||||
Eoxenos laboulbenei | 23 | 18 | 4 | 8 | 10 | 1 | 3 | 3 | 1 | 3 | 2 | 3 | 2 | 1 | 121 | 186 | 389 | 60 | 54 | 45 | 21 | 11 | ||||||||||||||||||||||||||||
Gryllus bimaculatus | 11 | 7 | 9 | 2 | 5 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 37 | 230 | 310 | 32 | 28 | 24 | 16 | 12 | ||||||||||||||||||||||||||||||
Sphrodromantis centralis | 4 | 7 | 2 | 1 | 1 | 1 | 6 | 89 | 111 | 21 | 22 | 14 | 17 | 8 | ||||||||||||||||||||||||||||||||||||
Aleurothrixus sp. | 9 | 14 | 10 | 7 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 52 | 166 | 270 | 45 | 42 | 38 | 21 | 5 | |||||||||||||||||||||||||||||
Phaeostigma major | 2 | 7 | 2 | 2 | 1 | 10 | 31 | 55 | 47 | 40 | 36 | 38 | 0 | |||||||||||||||||||||||||||||||||||||
Limnephilus flavicornis | 7 | 1 | 9 | 2 | 1 | 13 | 87 | 120 | 29 | 25 | 22 | 19 | 11 | |||||||||||||||||||||||||||||||||||||
Panorpa cf. vulgaris | 17 | 8 | 4 | 1 | 2 | 1 | 3 | 4 | 1 | 1 | 4 | 88 | 184 | 318 | 54 | 45 | 40 | 18 | 12 | |||||||||||||||||||||||||||||||
Lepisma aurea | 15 | 25 | 10 | 9 | 4 | 5 | 4 | 3 | 6 | 5 | 2 | 1 | 4 | 2 | 3 | 1 | 81 | 160 | 340 | 57 | 53 | 46 | 29 | 13 | ||||||||||||||||||||||||||
Dysdera erythrina | 13 | 10 | 6 | 15 | 5 | 1 | 1 | 4 | 1 | 1 | 2 | 1 | 3 | 1 | 45 | 148 | 257 | 45 | 44 | 39 | 14 | 22 | ||||||||||||||||||||||||||||
Julida sp. | 14 | 6 | 3 | 8 | 4 | 4 | 1 | 1 | 1 | 1 | 27 | 140 | 210 | 36 | 38 | 31 | 10 | 14 | ||||||||||||||||||||||||||||||||
Total | 502 | 401 | 220 | 216 | 109 | 41 | 56 | 53 | 43 | 40 | 35 | 29 | 25 | 17 | 17 | 11 | 12 | 7 | 2,147 | 4,753 | 8,737 | |||||||||||||||||||||||||||||
Average | 15 | 12 | 6 | 7 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 1 | 64 | 140 | 258 | 50 | 44 | 37 | 26 | 38 |
. | GO Classification . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Percent Matchesc . | . | . | . | . | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Library . | Enzyme . | RP . | Mitochondrial Gene . | Transport . | Nucleic acid Bindinga . | Chaperone/Heat Shock . | Motor . | Protein Kinase/Phosphatase . | Developmental Protein . | Axon/Neurotransmitter . | EF . | Signal Transduction . | Cell Cycle . | Proteasome . | Translation Initiation Factor . | Actin Binding . | Transcription Factor . | Cell Adhesion . | Unknown . | No Matchb . | Total . | To D. melanogaster . | To H. sapiens . | To C. elegans . | To the other libraries . | Intralibrary . | ||||||||||||||||||||||||
Micromalthus debillis | 12 | 7 | 6 | 2 | 3 | 1 | 1 | 2 | 33 | 83 | 150 | 46 | 36 | 31 | 24 | 15 | ||||||||||||||||||||||||||||||||||
Sphaerius sp. | 14 | 21 | 5 | 9 | 2 | 1 | 1 | 1 | 1 | 1 | 75 | 227 | 358 | 45 | 36 | 31 | 18 | 0 | ||||||||||||||||||||||||||||||||
Carabus granulatus | 7 | 18 | 6 | 7 | 3 | 1 | 3 | 1 | 3 | 4 | 1 | 1 | 50 | 56 | 161 | 63 | 58 | 54 | 47 | 13 | ||||||||||||||||||||||||||||||
Cicindela campestris | 22 | 13 | 7 | 15 | 1 | 1 | 2 | 2 | 1 | 1 | 3 | 1 | 77 | 190 | 336 | 48 | 42 | 36 | 28 | 9 | ||||||||||||||||||||||||||||||
Cicindela litorea | 7 | 2 | 2 | 8 | 5 | 5 | 1 | 40 | 151 | 221 | 38 | 37 | 25 | 16 | 8 | |||||||||||||||||||||||||||||||||||
Cicindela littoralis | 10 | 4 | 8 | 5 | 6 | 1 | 1 | 1 | 2 | 47 | 128 | 213 | 46 | 42 | 33 | 30 | 16 | |||||||||||||||||||||||||||||||||
Meladema coriacea | 25 | 8 | 10 | 6 | 4 | 4 | 1 | 5 | 3 | 4 | 1 | 84 | 131 | 286 | 52 | 49 | 42 | 25 | 14 | |||||||||||||||||||||||||||||||
Georissus sp. | 28 | 18 | 10 | 10 | 4 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 98 | 211 | 391 | 52 | 45 | 37 | 27 | 15 | ||||||||||||||||||||||||||||||
Silpha atrata | 1 | 3 | 1 | 3 | 9 | 17 | 29 | 35 | 18 | 53 | 0 | |||||||||||||||||||||||||||||||||||||||
Hister sp. | 25 | 16 | 5 | 7 | 6 | 3 | 3 | 1 | 1 | 2 | 1 | 2 | 2 | 84 | 164 | 322 | 58 | 54 | 47 | 25 | 10 | |||||||||||||||||||||||||||||
Scarabaeus laticollis | 21 | 15 | 8 | 12 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 113 | 121 | 301 | 64 | 53 | 48 | 26 | 19 | |||||||||||||||||||||||||||||
Agriotes lineatus | 22 | 12 | 5 | 5 | 3 | 1 | 6 | 4 | 4 | 4 | 1 | 3 | 1 | 83 | 247 | 401 | 45 | 39 | 35 | 23 | 20 | |||||||||||||||||||||||||||||
Julodis onopordi | 15 | 3 | 11 | 4 | 2 | 1 | 1 | 3 | 3 | 4 | 1 | 100 | 179 | 327 | 46 | 42 | 33 | 19 | 14 | |||||||||||||||||||||||||||||||
Eucinetus sp. | 13 | 13 | 11 | 14 | 2 | 3 | 1 | 3 | 4 | 1 | 1 | 1 | 2 | 78 | 170 | 317 | 60 | 50 | 41 | 24 | 13 | |||||||||||||||||||||||||||||
Dascillus cervinus | 15 | 11 | 5 | 6 | 10 | 1 | 2 | 2 | 3 | 1 | 1 | 2 | 104 | 162 | 325 | 61 | 35 | 23 | 23 | 11 | ||||||||||||||||||||||||||||||
Biphyllus lunatus | 14 | 19 | 4 | 4 | 3 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 81 | 97 | 234 | 67 | 58 | 51 | 37 | 15 | |||||||||||||||||||||||||||||
Mycetophagus 4-pustulatus | 36 | 9 | 5 | 11 | 4 | 7 | 2 | 1 | 3 | 1 | 1 | 107 | 214 | 401 | 57 | 50 | 42 | 25 | 13 | |||||||||||||||||||||||||||||||
Tribolium confusum | 3 | 2 | 2 | 1 | 1 | 2 | 22 | 31 | 64 | 56 | 50 | 38 | 25 | 6 | ||||||||||||||||||||||||||||||||||||
Timarcha balearica | 27 | 21 | 6 | 6 | 5 | 1 | 2 | 1 | 1 | 1 | 1 | 77 | 118 | 267 | 60 | 54 | 49 | 27 | 14 | |||||||||||||||||||||||||||||||
Curculio glandium | 9 | 17 | 8 | 4 | 2 | 3 | 1 | 1 | 2 | 1 | 2 | 76 | 95 | 221 | 66 | 56 | 39 | 37 | 10 | |||||||||||||||||||||||||||||||
Platystomos albinus | 5 | 4 | 10 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 35 | 65 | 130 | 47 | 44 | 42 | 25 | 18 | ||||||||||||||||||||||||||||||||
Euclidia glyphica | 18 | 10 | 3 | 6 | 1 | 4 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 38 | 165 | 255 | 44 | 34 | 29 | 23 | 17 | |||||||||||||||||||||||||||||
Papilio dardanus | 18 | 43 | 15 | 10 | 3 | 3 | 2 | 2 | 4 | 1 | 1 | 2 | 99 | 131 | 334 | 63 | 58 | 52 | 36 | 23 | ||||||||||||||||||||||||||||||
Mengenilla chobauti | 20 | 12 | 6 | 6 | 5 | 7 | 3 | 3 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 87 | 187 | 346 | 56 | 48 | 41 | 23 | 6 | |||||||||||||||||||||||||||
Eoxenos laboulbenei | 23 | 18 | 4 | 8 | 10 | 1 | 3 | 3 | 1 | 3 | 2 | 3 | 2 | 1 | 121 | 186 | 389 | 60 | 54 | 45 | 21 | 11 | ||||||||||||||||||||||||||||
Gryllus bimaculatus | 11 | 7 | 9 | 2 | 5 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 37 | 230 | 310 | 32 | 28 | 24 | 16 | 12 | ||||||||||||||||||||||||||||||
Sphrodromantis centralis | 4 | 7 | 2 | 1 | 1 | 1 | 6 | 89 | 111 | 21 | 22 | 14 | 17 | 8 | ||||||||||||||||||||||||||||||||||||
Aleurothrixus sp. | 9 | 14 | 10 | 7 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 52 | 166 | 270 | 45 | 42 | 38 | 21 | 5 | |||||||||||||||||||||||||||||
Phaeostigma major | 2 | 7 | 2 | 2 | 1 | 10 | 31 | 55 | 47 | 40 | 36 | 38 | 0 | |||||||||||||||||||||||||||||||||||||
Limnephilus flavicornis | 7 | 1 | 9 | 2 | 1 | 13 | 87 | 120 | 29 | 25 | 22 | 19 | 11 | |||||||||||||||||||||||||||||||||||||
Panorpa cf. vulgaris | 17 | 8 | 4 | 1 | 2 | 1 | 3 | 4 | 1 | 1 | 4 | 88 | 184 | 318 | 54 | 45 | 40 | 18 | 12 | |||||||||||||||||||||||||||||||
Lepisma aurea | 15 | 25 | 10 | 9 | 4 | 5 | 4 | 3 | 6 | 5 | 2 | 1 | 4 | 2 | 3 | 1 | 81 | 160 | 340 | 57 | 53 | 46 | 29 | 13 | ||||||||||||||||||||||||||
Dysdera erythrina | 13 | 10 | 6 | 15 | 5 | 1 | 1 | 4 | 1 | 1 | 2 | 1 | 3 | 1 | 45 | 148 | 257 | 45 | 44 | 39 | 14 | 22 | ||||||||||||||||||||||||||||
Julida sp. | 14 | 6 | 3 | 8 | 4 | 4 | 1 | 1 | 1 | 1 | 27 | 140 | 210 | 36 | 38 | 31 | 10 | 14 | ||||||||||||||||||||||||||||||||
Total | 502 | 401 | 220 | 216 | 109 | 41 | 56 | 53 | 43 | 40 | 35 | 29 | 25 | 17 | 17 | 11 | 12 | 7 | 2,147 | 4,753 | 8,737 | |||||||||||||||||||||||||||||
Average | 15 | 12 | 6 | 7 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 1 | 64 | 140 | 258 | 50 | 44 | 37 | 26 | 38 |
Includes RNA processing and small nuclear ribonucleoprotein complex.
No Blast hits found within the selected parameters of >30% similarity, >100 amino acids, and E value lower than 10−5.
The percentage of sequences in a given library with matches to complete databases (known and predicted proteins) of Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans (BlastX, E value < 10-5); the nucleotide sequences of other libraries in this study BlastN, E-value < 10−5); and translated sequences within the library (TBlastX E-value<10−5).
. | GO Classification . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Percent Matchesc . | . | . | . | . | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Library . | Enzyme . | RP . | Mitochondrial Gene . | Transport . | Nucleic acid Bindinga . | Chaperone/Heat Shock . | Motor . | Protein Kinase/Phosphatase . | Developmental Protein . | Axon/Neurotransmitter . | EF . | Signal Transduction . | Cell Cycle . | Proteasome . | Translation Initiation Factor . | Actin Binding . | Transcription Factor . | Cell Adhesion . | Unknown . | No Matchb . | Total . | To D. melanogaster . | To H. sapiens . | To C. elegans . | To the other libraries . | Intralibrary . | ||||||||||||||||||||||||
Micromalthus debillis | 12 | 7 | 6 | 2 | 3 | 1 | 1 | 2 | 33 | 83 | 150 | 46 | 36 | 31 | 24 | 15 | ||||||||||||||||||||||||||||||||||
Sphaerius sp. | 14 | 21 | 5 | 9 | 2 | 1 | 1 | 1 | 1 | 1 | 75 | 227 | 358 | 45 | 36 | 31 | 18 | 0 | ||||||||||||||||||||||||||||||||
Carabus granulatus | 7 | 18 | 6 | 7 | 3 | 1 | 3 | 1 | 3 | 4 | 1 | 1 | 50 | 56 | 161 | 63 | 58 | 54 | 47 | 13 | ||||||||||||||||||||||||||||||
Cicindela campestris | 22 | 13 | 7 | 15 | 1 | 1 | 2 | 2 | 1 | 1 | 3 | 1 | 77 | 190 | 336 | 48 | 42 | 36 | 28 | 9 | ||||||||||||||||||||||||||||||
Cicindela litorea | 7 | 2 | 2 | 8 | 5 | 5 | 1 | 40 | 151 | 221 | 38 | 37 | 25 | 16 | 8 | |||||||||||||||||||||||||||||||||||
Cicindela littoralis | 10 | 4 | 8 | 5 | 6 | 1 | 1 | 1 | 2 | 47 | 128 | 213 | 46 | 42 | 33 | 30 | 16 | |||||||||||||||||||||||||||||||||
Meladema coriacea | 25 | 8 | 10 | 6 | 4 | 4 | 1 | 5 | 3 | 4 | 1 | 84 | 131 | 286 | 52 | 49 | 42 | 25 | 14 | |||||||||||||||||||||||||||||||
Georissus sp. | 28 | 18 | 10 | 10 | 4 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 98 | 211 | 391 | 52 | 45 | 37 | 27 | 15 | ||||||||||||||||||||||||||||||
Silpha atrata | 1 | 3 | 1 | 3 | 9 | 17 | 29 | 35 | 18 | 53 | 0 | |||||||||||||||||||||||||||||||||||||||
Hister sp. | 25 | 16 | 5 | 7 | 6 | 3 | 3 | 1 | 1 | 2 | 1 | 2 | 2 | 84 | 164 | 322 | 58 | 54 | 47 | 25 | 10 | |||||||||||||||||||||||||||||
Scarabaeus laticollis | 21 | 15 | 8 | 12 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 113 | 121 | 301 | 64 | 53 | 48 | 26 | 19 | |||||||||||||||||||||||||||||
Agriotes lineatus | 22 | 12 | 5 | 5 | 3 | 1 | 6 | 4 | 4 | 4 | 1 | 3 | 1 | 83 | 247 | 401 | 45 | 39 | 35 | 23 | 20 | |||||||||||||||||||||||||||||
Julodis onopordi | 15 | 3 | 11 | 4 | 2 | 1 | 1 | 3 | 3 | 4 | 1 | 100 | 179 | 327 | 46 | 42 | 33 | 19 | 14 | |||||||||||||||||||||||||||||||
Eucinetus sp. | 13 | 13 | 11 | 14 | 2 | 3 | 1 | 3 | 4 | 1 | 1 | 1 | 2 | 78 | 170 | 317 | 60 | 50 | 41 | 24 | 13 | |||||||||||||||||||||||||||||
Dascillus cervinus | 15 | 11 | 5 | 6 | 10 | 1 | 2 | 2 | 3 | 1 | 1 | 2 | 104 | 162 | 325 | 61 | 35 | 23 | 23 | 11 | ||||||||||||||||||||||||||||||
Biphyllus lunatus | 14 | 19 | 4 | 4 | 3 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 81 | 97 | 234 | 67 | 58 | 51 | 37 | 15 | |||||||||||||||||||||||||||||
Mycetophagus 4-pustulatus | 36 | 9 | 5 | 11 | 4 | 7 | 2 | 1 | 3 | 1 | 1 | 107 | 214 | 401 | 57 | 50 | 42 | 25 | 13 | |||||||||||||||||||||||||||||||
Tribolium confusum | 3 | 2 | 2 | 1 | 1 | 2 | 22 | 31 | 64 | 56 | 50 | 38 | 25 | 6 | ||||||||||||||||||||||||||||||||||||
Timarcha balearica | 27 | 21 | 6 | 6 | 5 | 1 | 2 | 1 | 1 | 1 | 1 | 77 | 118 | 267 | 60 | 54 | 49 | 27 | 14 | |||||||||||||||||||||||||||||||
Curculio glandium | 9 | 17 | 8 | 4 | 2 | 3 | 1 | 1 | 2 | 1 | 2 | 76 | 95 | 221 | 66 | 56 | 39 | 37 | 10 | |||||||||||||||||||||||||||||||
Platystomos albinus | 5 | 4 | 10 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 35 | 65 | 130 | 47 | 44 | 42 | 25 | 18 | ||||||||||||||||||||||||||||||||
Euclidia glyphica | 18 | 10 | 3 | 6 | 1 | 4 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 38 | 165 | 255 | 44 | 34 | 29 | 23 | 17 | |||||||||||||||||||||||||||||
Papilio dardanus | 18 | 43 | 15 | 10 | 3 | 3 | 2 | 2 | 4 | 1 | 1 | 2 | 99 | 131 | 334 | 63 | 58 | 52 | 36 | 23 | ||||||||||||||||||||||||||||||
Mengenilla chobauti | 20 | 12 | 6 | 6 | 5 | 7 | 3 | 3 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 87 | 187 | 346 | 56 | 48 | 41 | 23 | 6 | |||||||||||||||||||||||||||
Eoxenos laboulbenei | 23 | 18 | 4 | 8 | 10 | 1 | 3 | 3 | 1 | 3 | 2 | 3 | 2 | 1 | 121 | 186 | 389 | 60 | 54 | 45 | 21 | 11 | ||||||||||||||||||||||||||||
Gryllus bimaculatus | 11 | 7 | 9 | 2 | 5 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 37 | 230 | 310 | 32 | 28 | 24 | 16 | 12 | ||||||||||||||||||||||||||||||
Sphrodromantis centralis | 4 | 7 | 2 | 1 | 1 | 1 | 6 | 89 | 111 | 21 | 22 | 14 | 17 | 8 | ||||||||||||||||||||||||||||||||||||
Aleurothrixus sp. | 9 | 14 | 10 | 7 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 52 | 166 | 270 | 45 | 42 | 38 | 21 | 5 | |||||||||||||||||||||||||||||
Phaeostigma major | 2 | 7 | 2 | 2 | 1 | 10 | 31 | 55 | 47 | 40 | 36 | 38 | 0 | |||||||||||||||||||||||||||||||||||||
Limnephilus flavicornis | 7 | 1 | 9 | 2 | 1 | 13 | 87 | 120 | 29 | 25 | 22 | 19 | 11 | |||||||||||||||||||||||||||||||||||||
Panorpa cf. vulgaris | 17 | 8 | 4 | 1 | 2 | 1 | 3 | 4 | 1 | 1 | 4 | 88 | 184 | 318 | 54 | 45 | 40 | 18 | 12 | |||||||||||||||||||||||||||||||
Lepisma aurea | 15 | 25 | 10 | 9 | 4 | 5 | 4 | 3 | 6 | 5 | 2 | 1 | 4 | 2 | 3 | 1 | 81 | 160 | 340 | 57 | 53 | 46 | 29 | 13 | ||||||||||||||||||||||||||
Dysdera erythrina | 13 | 10 | 6 | 15 | 5 | 1 | 1 | 4 | 1 | 1 | 2 | 1 | 3 | 1 | 45 | 148 | 257 | 45 | 44 | 39 | 14 | 22 | ||||||||||||||||||||||||||||
Julida sp. | 14 | 6 | 3 | 8 | 4 | 4 | 1 | 1 | 1 | 1 | 27 | 140 | 210 | 36 | 38 | 31 | 10 | 14 | ||||||||||||||||||||||||||||||||
Total | 502 | 401 | 220 | 216 | 109 | 41 | 56 | 53 | 43 | 40 | 35 | 29 | 25 | 17 | 17 | 11 | 12 | 7 | 2,147 | 4,753 | 8,737 | |||||||||||||||||||||||||||||
Average | 15 | 12 | 6 | 7 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 1 | 64 | 140 | 258 | 50 | 44 | 37 | 26 | 38 |
. | GO Classification . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | Percent Matchesc . | . | . | . | . | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Library . | Enzyme . | RP . | Mitochondrial Gene . | Transport . | Nucleic acid Bindinga . | Chaperone/Heat Shock . | Motor . | Protein Kinase/Phosphatase . | Developmental Protein . | Axon/Neurotransmitter . | EF . | Signal Transduction . | Cell Cycle . | Proteasome . | Translation Initiation Factor . | Actin Binding . | Transcription Factor . | Cell Adhesion . | Unknown . | No Matchb . | Total . | To D. melanogaster . | To H. sapiens . | To C. elegans . | To the other libraries . | Intralibrary . | ||||||||||||||||||||||||
Micromalthus debillis | 12 | 7 | 6 | 2 | 3 | 1 | 1 | 2 | 33 | 83 | 150 | 46 | 36 | 31 | 24 | 15 | ||||||||||||||||||||||||||||||||||
Sphaerius sp. | 14 | 21 | 5 | 9 | 2 | 1 | 1 | 1 | 1 | 1 | 75 | 227 | 358 | 45 | 36 | 31 | 18 | 0 | ||||||||||||||||||||||||||||||||
Carabus granulatus | 7 | 18 | 6 | 7 | 3 | 1 | 3 | 1 | 3 | 4 | 1 | 1 | 50 | 56 | 161 | 63 | 58 | 54 | 47 | 13 | ||||||||||||||||||||||||||||||
Cicindela campestris | 22 | 13 | 7 | 15 | 1 | 1 | 2 | 2 | 1 | 1 | 3 | 1 | 77 | 190 | 336 | 48 | 42 | 36 | 28 | 9 | ||||||||||||||||||||||||||||||
Cicindela litorea | 7 | 2 | 2 | 8 | 5 | 5 | 1 | 40 | 151 | 221 | 38 | 37 | 25 | 16 | 8 | |||||||||||||||||||||||||||||||||||
Cicindela littoralis | 10 | 4 | 8 | 5 | 6 | 1 | 1 | 1 | 2 | 47 | 128 | 213 | 46 | 42 | 33 | 30 | 16 | |||||||||||||||||||||||||||||||||
Meladema coriacea | 25 | 8 | 10 | 6 | 4 | 4 | 1 | 5 | 3 | 4 | 1 | 84 | 131 | 286 | 52 | 49 | 42 | 25 | 14 | |||||||||||||||||||||||||||||||
Georissus sp. | 28 | 18 | 10 | 10 | 4 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 98 | 211 | 391 | 52 | 45 | 37 | 27 | 15 | ||||||||||||||||||||||||||||||
Silpha atrata | 1 | 3 | 1 | 3 | 9 | 17 | 29 | 35 | 18 | 53 | 0 | |||||||||||||||||||||||||||||||||||||||
Hister sp. | 25 | 16 | 5 | 7 | 6 | 3 | 3 | 1 | 1 | 2 | 1 | 2 | 2 | 84 | 164 | 322 | 58 | 54 | 47 | 25 | 10 | |||||||||||||||||||||||||||||
Scarabaeus laticollis | 21 | 15 | 8 | 12 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 113 | 121 | 301 | 64 | 53 | 48 | 26 | 19 | |||||||||||||||||||||||||||||
Agriotes lineatus | 22 | 12 | 5 | 5 | 3 | 1 | 6 | 4 | 4 | 4 | 1 | 3 | 1 | 83 | 247 | 401 | 45 | 39 | 35 | 23 | 20 | |||||||||||||||||||||||||||||
Julodis onopordi | 15 | 3 | 11 | 4 | 2 | 1 | 1 | 3 | 3 | 4 | 1 | 100 | 179 | 327 | 46 | 42 | 33 | 19 | 14 | |||||||||||||||||||||||||||||||
Eucinetus sp. | 13 | 13 | 11 | 14 | 2 | 3 | 1 | 3 | 4 | 1 | 1 | 1 | 2 | 78 | 170 | 317 | 60 | 50 | 41 | 24 | 13 | |||||||||||||||||||||||||||||
Dascillus cervinus | 15 | 11 | 5 | 6 | 10 | 1 | 2 | 2 | 3 | 1 | 1 | 2 | 104 | 162 | 325 | 61 | 35 | 23 | 23 | 11 | ||||||||||||||||||||||||||||||
Biphyllus lunatus | 14 | 19 | 4 | 4 | 3 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 81 | 97 | 234 | 67 | 58 | 51 | 37 | 15 | |||||||||||||||||||||||||||||
Mycetophagus 4-pustulatus | 36 | 9 | 5 | 11 | 4 | 7 | 2 | 1 | 3 | 1 | 1 | 107 | 214 | 401 | 57 | 50 | 42 | 25 | 13 | |||||||||||||||||||||||||||||||
Tribolium confusum | 3 | 2 | 2 | 1 | 1 | 2 | 22 | 31 | 64 | 56 | 50 | 38 | 25 | 6 | ||||||||||||||||||||||||||||||||||||
Timarcha balearica | 27 | 21 | 6 | 6 | 5 | 1 | 2 | 1 | 1 | 1 | 1 | 77 | 118 | 267 | 60 | 54 | 49 | 27 | 14 | |||||||||||||||||||||||||||||||
Curculio glandium | 9 | 17 | 8 | 4 | 2 | 3 | 1 | 1 | 2 | 1 | 2 | 76 | 95 | 221 | 66 | 56 | 39 | 37 | 10 | |||||||||||||||||||||||||||||||
Platystomos albinus | 5 | 4 | 10 | 4 | 1 | 1 | 1 | 2 | 1 | 1 | 35 | 65 | 130 | 47 | 44 | 42 | 25 | 18 | ||||||||||||||||||||||||||||||||
Euclidia glyphica | 18 | 10 | 3 | 6 | 1 | 4 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 38 | 165 | 255 | 44 | 34 | 29 | 23 | 17 | |||||||||||||||||||||||||||||
Papilio dardanus | 18 | 43 | 15 | 10 | 3 | 3 | 2 | 2 | 4 | 1 | 1 | 2 | 99 | 131 | 334 | 63 | 58 | 52 | 36 | 23 | ||||||||||||||||||||||||||||||
Mengenilla chobauti | 20 | 12 | 6 | 6 | 5 | 7 | 3 | 3 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 87 | 187 | 346 | 56 | 48 | 41 | 23 | 6 | |||||||||||||||||||||||||||
Eoxenos laboulbenei | 23 | 18 | 4 | 8 | 10 | 1 | 3 | 3 | 1 | 3 | 2 | 3 | 2 | 1 | 121 | 186 | 389 | 60 | 54 | 45 | 21 | 11 | ||||||||||||||||||||||||||||
Gryllus bimaculatus | 11 | 7 | 9 | 2 | 5 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 37 | 230 | 310 | 32 | 28 | 24 | 16 | 12 | ||||||||||||||||||||||||||||||
Sphrodromantis centralis | 4 | 7 | 2 | 1 | 1 | 1 | 6 | 89 | 111 | 21 | 22 | 14 | 17 | 8 | ||||||||||||||||||||||||||||||||||||
Aleurothrixus sp. | 9 | 14 | 10 | 7 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 52 | 166 | 270 | 45 | 42 | 38 | 21 | 5 | |||||||||||||||||||||||||||||
Phaeostigma major | 2 | 7 | 2 | 2 | 1 | 10 | 31 | 55 | 47 | 40 | 36 | 38 | 0 | |||||||||||||||||||||||||||||||||||||
Limnephilus flavicornis | 7 | 1 | 9 | 2 | 1 | 13 | 87 | 120 | 29 | 25 | 22 | 19 | 11 | |||||||||||||||||||||||||||||||||||||
Panorpa cf. vulgaris | 17 | 8 | 4 | 1 | 2 | 1 | 3 | 4 | 1 | 1 | 4 | 88 | 184 | 318 | 54 | 45 | 40 | 18 | 12 | |||||||||||||||||||||||||||||||
Lepisma aurea | 15 | 25 | 10 | 9 | 4 | 5 | 4 | 3 | 6 | 5 | 2 | 1 | 4 | 2 | 3 | 1 | 81 | 160 | 340 | 57 | 53 | 46 | 29 | 13 | ||||||||||||||||||||||||||
Dysdera erythrina | 13 | 10 | 6 | 15 | 5 | 1 | 1 | 4 | 1 | 1 | 2 | 1 | 3 | 1 | 45 | 148 | 257 | 45 | 44 | 39 | 14 | 22 | ||||||||||||||||||||||||||||
Julida sp. | 14 | 6 | 3 | 8 | 4 | 4 | 1 | 1 | 1 | 1 | 27 | 140 | 210 | 36 | 38 | 31 | 10 | 14 | ||||||||||||||||||||||||||||||||
Total | 502 | 401 | 220 | 216 | 109 | 41 | 56 | 53 | 43 | 40 | 35 | 29 | 25 | 17 | 17 | 11 | 12 | 7 | 2,147 | 4,753 | 8,737 | |||||||||||||||||||||||||||||
Average | 15 | 12 | 6 | 7 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 1 | 64 | 140 | 258 | 50 | 44 | 37 | 26 | 38 |
Includes RNA processing and small nuclear ribonucleoprotein complex.
No Blast hits found within the selected parameters of >30% similarity, >100 amino acids, and E value lower than 10−5.
The percentage of sequences in a given library with matches to complete databases (known and predicted proteins) of Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans (BlastX, E value < 10-5); the nucleotide sequences of other libraries in this study BlastN, E-value < 10−5); and translated sequences within the library (TBlastX E-value<10−5).
When our ESTs where compared against the genes of D. melanogaster, 50% of sequences had significant matches with E values <10−5 (ranging from 21% to 67% depending on probe species; table 2). Overall, our insect ESTs had significantly more matches with D. melanogaster sequences than with H. sapiens or C. elegans (df = 33, t = 6.5 and 5.6, respectively, P < 0.001). The insect ESTs showed fewer matches with C. elegans (df = 33, t = 8.7, P < 0.001) than with H. sapiens, despite the presumed closer relationships of nematodes with insects based on rRNA (Aguinaldo et al. 1997), protein-encoding genes (Philippe, Lartillot, and Brinkmann 2005), and genome-scale evidence (H. Dopazo and J. Dopazo 2005) but in accordance with analyses of genomic sequences and ESTs (Blair et al. 2002; Theodorides et al. 2002; Hedges et al. 2004; Philip, Creevey, and McInerney 2005). It is now increasingly well established that this affinity of insects with humans is an artifact of poor taxon sampling (Philippe, Lartillot, and Brinkmann 2005; Telford and Copley 2005). We present here the distribution of matches of each organism to the complete genomes of D. melanogaster, H. sapiens, and C. elegans as Venn diagrams using SimiTri (Parkinson and Blaxter 2003). Compared to ESTs of Coleoptera, levels of sequence similarity between nonholometabolan insect species and D. melanogaster were somewhat reduced (table 2 and SimiTri graphics at http://darwin.zoology.gla.ac.uk/∼jhughes/SimiTri/), as expected with decreased phylogenetic proximity.
Clustering Between Libraries
The presence of putative orthologs across libraries is critical for EST data to be useful in molecular systematics. Using the BlastN algorithm, we found that between 10% and 53% of unique sequences in a given library had matches (E value < 10−5) with the data set containing all the other libraries (table 2). After conceptual translation, pairwise sequence matches (TBlastX E < 10−5) ranged from 1% to 29% of unique sequences shared between any two libraries, with an average of 12% (Supplementary Material B, see Supplementary Material online). The number of intralibrary matches was slightly lower, with 0% to 23% of sequences showing significant matches within the same library in a protein-level search (table 2), but indicating a high proportion of paralogy in each library. Manual editing of primary sequences increased the between-library matches and the size of clusters at stringent cutoff values when compared to the automated approach (10−80: t = 2.2, df = 34, P < 0.05; 10−100: t = 2.4, df = 34, P < 0.05; 10−150: t = 2.1, df = 34, P < 0.05; Supplementary Material C, see Supplementary Material online).
When sequences with significant similarity were clustered across all libraries, up to 731 clusters included TUGs from two or more taxa, although no TUG had representatives in more than 28 of the 34 libraries. A total of 154 TUGs showed significant Blast matches within a single taxon only (Supplementary Material C, see Supplementary Material online). Most of the largest clusters, with TUGs in more than eight species at an E value < 10−10 (table 3; Supplementary Material D, see Supplementary Material online), included genes for which exceptional levels of mRNA expression have been established (Hsiao et al. 2001). The largest clusters included RPs and mitochondrial genes. Sequences from known protein families were also present in the clusters, such as tubulins, myosins, and troponin I. Three clusters contained EF genes (EF-1 alpha homologs, EF-1 beta, and EF-2). Interestingly, there were four clusters of genes that did not have any Blast matches, and six clusters that only showed matches to D. melanogaster and A. gambiae genes of unknown function (Supplementary Material D, see Supplementary Material online). Along with mitochondrial genes, several nuclear genes detected in multiple EST libraries have been used widely in insect molecular systematic studies (Caterino, Cho, and Sperling 2000). These included 28S rRNA (represented in EST libraries of 14 species), EF-1 alpha (13 species), H3 histone (7 species), and Cu, Zn–superoxide dismutase (7 species).
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
16S Ribosomal RNA gene | 28 | 113 | 28 | 109 | 28 | 100 | 26 | 50 | 24 | 47 | 8 | 10 | 7 | 7 | |||||||||||||
Cytochrome oxidase subunit I | 27 | 57 | 27 | 55 | 27 | 55 | 26 | 52 | 26 | 49 | 25 | 48 | 24 | 40 | |||||||||||||
Troponin/myosin family | 25 | 85 | 25 | 83 | 25 | 81 | 16 | 24 | 15 | 18 | 13 | 16 | 10 | 11 | |||||||||||||
Cytochrome c oxidase subunit III | 24 | 46 | 24 | 45 | 24 | 44 | 24 | 30 | 24 | 29 | 24 | 28 | 18 | 22 | |||||||||||||
Cytochrome oxidase subunit II | 23 | 31 | 23 | 30 | 23 | 29 | 22 | 28 | 20 | 26 | 19 | 24 | 17 | 20 | |||||||||||||
Cytochrome b | 23 | 26 | 23 | 26 | 23 | 26 | 22 | 23 | 22 | 23 | 21 | 22 | 20 | 21 | |||||||||||||
Chymotrypsin family | 20 | 69 | 17 | 44 | 16 | 38 | 11 | 14 | 3 | 4 | 2 | 2 | |||||||||||||||
Adenosine triphosphatase 6 | 20 | 25 | 20 | 25 | 20 | 24 | 15 | 19 | 15 | 17 | 10 | 11 | |||||||||||||||
Actin | 19 | 39 | 19 | 38 | 19 | 37 | 19 | 35 | 19 | 33 | 19 | 33 | 18 | 30 | |||||||||||||
Ubiquitin family | 18 | 31 | 18 | 31 | 18 | 31 | 17 | 27 | 9 | 10 | 9 | 10 | 7 | 7 | |||||||||||||
Chemosensory protein | 18 | 24 | 17 | 22 | 16 | 20 | 6 | 6 | 3 | 3 | |||||||||||||||||
Troponin I family | 17 | 26 | 17 | 26 | 17 | 26 | 14 | 23 | 13 | 21 | 10 | 18 | 3 | 4 | |||||||||||||
Cathepsin family | 15 | 34 | 12 | 19 | 11 | 16 | 8 | 10 | 5 | 7 | 5 | 6 | 4 | 4 | |||||||||||||
NADH dehydrogenase subunit 2 | 15 | 19 | 14 | 18 | 13 | 16 | 3 | 3 | 2 | 2 | |||||||||||||||||
NADH dehydrogenase subunit 4 | 15 | 17 | 15 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | 11 | 11 | 6 | 6 | |||||||||||||
Tubulin family | 14 | 34 | 13 | 31 | 13 | 30 | 11 | 24 | 7 | 9 | 7 | 9 | 6 | 6 | |||||||||||||
RAS oncogene family | 14 | 26 | 11 | 17 | 9 | 13 | 8 | 11 | 3 | 3 | 3 | 3 | 3 | 3 | |||||||||||||
Heat shock protein family | 14 | 23 | 10 | 18 | 10 | 18 | 6 | 12 | 3 | 7 | 3 | 7 | 2 | 3 | |||||||||||||
Disulfide isomerase/thioredoxin family | 14 | 22 | 12 | 15 | 7 | 9 | 6 | 8 | 2 | 2 | |||||||||||||||||
28S Ribosomal RNA gene/ gamma-aminobutyric acid A receptor–associated protein | 14 | 21 | 14 | 20 | 9 | 11 | 8 | 10 | 7 | 8 | 7 | 7 | |||||||||||||||
Ferritin 1 family | 14 | 18 | 14 | 18 | 14 | 17 | 12 | 14 | 11 | 13 | 9 | 10 | 7 | 8 | |||||||||||||
NADH dehydrogenase subunit 1 | 14 | 18 | 14 | 18 | 13 | 16 | 11 | 12 | 9 | 9 | 8 | 8 | 4 | 4 | |||||||||||||
MP20/CalPoNin | 14 | 17 | 13 | 16 | 13 | 15 | 11 | 12 | 10 | 11 | 7 | 8 | 6 | 6 |
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
16S Ribosomal RNA gene | 28 | 113 | 28 | 109 | 28 | 100 | 26 | 50 | 24 | 47 | 8 | 10 | 7 | 7 | |||||||||||||
Cytochrome oxidase subunit I | 27 | 57 | 27 | 55 | 27 | 55 | 26 | 52 | 26 | 49 | 25 | 48 | 24 | 40 | |||||||||||||
Troponin/myosin family | 25 | 85 | 25 | 83 | 25 | 81 | 16 | 24 | 15 | 18 | 13 | 16 | 10 | 11 | |||||||||||||
Cytochrome c oxidase subunit III | 24 | 46 | 24 | 45 | 24 | 44 | 24 | 30 | 24 | 29 | 24 | 28 | 18 | 22 | |||||||||||||
Cytochrome oxidase subunit II | 23 | 31 | 23 | 30 | 23 | 29 | 22 | 28 | 20 | 26 | 19 | 24 | 17 | 20 | |||||||||||||
Cytochrome b | 23 | 26 | 23 | 26 | 23 | 26 | 22 | 23 | 22 | 23 | 21 | 22 | 20 | 21 | |||||||||||||
Chymotrypsin family | 20 | 69 | 17 | 44 | 16 | 38 | 11 | 14 | 3 | 4 | 2 | 2 | |||||||||||||||
Adenosine triphosphatase 6 | 20 | 25 | 20 | 25 | 20 | 24 | 15 | 19 | 15 | 17 | 10 | 11 | |||||||||||||||
Actin | 19 | 39 | 19 | 38 | 19 | 37 | 19 | 35 | 19 | 33 | 19 | 33 | 18 | 30 | |||||||||||||
Ubiquitin family | 18 | 31 | 18 | 31 | 18 | 31 | 17 | 27 | 9 | 10 | 9 | 10 | 7 | 7 | |||||||||||||
Chemosensory protein | 18 | 24 | 17 | 22 | 16 | 20 | 6 | 6 | 3 | 3 | |||||||||||||||||
Troponin I family | 17 | 26 | 17 | 26 | 17 | 26 | 14 | 23 | 13 | 21 | 10 | 18 | 3 | 4 | |||||||||||||
Cathepsin family | 15 | 34 | 12 | 19 | 11 | 16 | 8 | 10 | 5 | 7 | 5 | 6 | 4 | 4 | |||||||||||||
NADH dehydrogenase subunit 2 | 15 | 19 | 14 | 18 | 13 | 16 | 3 | 3 | 2 | 2 | |||||||||||||||||
NADH dehydrogenase subunit 4 | 15 | 17 | 15 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | 11 | 11 | 6 | 6 | |||||||||||||
Tubulin family | 14 | 34 | 13 | 31 | 13 | 30 | 11 | 24 | 7 | 9 | 7 | 9 | 6 | 6 | |||||||||||||
RAS oncogene family | 14 | 26 | 11 | 17 | 9 | 13 | 8 | 11 | 3 | 3 | 3 | 3 | 3 | 3 | |||||||||||||
Heat shock protein family | 14 | 23 | 10 | 18 | 10 | 18 | 6 | 12 | 3 | 7 | 3 | 7 | 2 | 3 | |||||||||||||
Disulfide isomerase/thioredoxin family | 14 | 22 | 12 | 15 | 7 | 9 | 6 | 8 | 2 | 2 | |||||||||||||||||
28S Ribosomal RNA gene/ gamma-aminobutyric acid A receptor–associated protein | 14 | 21 | 14 | 20 | 9 | 11 | 8 | 10 | 7 | 8 | 7 | 7 | |||||||||||||||
Ferritin 1 family | 14 | 18 | 14 | 18 | 14 | 17 | 12 | 14 | 11 | 13 | 9 | 10 | 7 | 8 | |||||||||||||
NADH dehydrogenase subunit 1 | 14 | 18 | 14 | 18 | 13 | 16 | 11 | 12 | 9 | 9 | 8 | 8 | 4 | 4 | |||||||||||||
MP20/CalPoNin | 14 | 17 | 13 | 16 | 13 | 15 | 11 | 12 | 10 | 11 | 7 | 8 | 6 | 6 |
NOTE.—NADH, reduced form of nicotinamide adenine dinucleotide.
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
16S Ribosomal RNA gene | 28 | 113 | 28 | 109 | 28 | 100 | 26 | 50 | 24 | 47 | 8 | 10 | 7 | 7 | |||||||||||||
Cytochrome oxidase subunit I | 27 | 57 | 27 | 55 | 27 | 55 | 26 | 52 | 26 | 49 | 25 | 48 | 24 | 40 | |||||||||||||
Troponin/myosin family | 25 | 85 | 25 | 83 | 25 | 81 | 16 | 24 | 15 | 18 | 13 | 16 | 10 | 11 | |||||||||||||
Cytochrome c oxidase subunit III | 24 | 46 | 24 | 45 | 24 | 44 | 24 | 30 | 24 | 29 | 24 | 28 | 18 | 22 | |||||||||||||
Cytochrome oxidase subunit II | 23 | 31 | 23 | 30 | 23 | 29 | 22 | 28 | 20 | 26 | 19 | 24 | 17 | 20 | |||||||||||||
Cytochrome b | 23 | 26 | 23 | 26 | 23 | 26 | 22 | 23 | 22 | 23 | 21 | 22 | 20 | 21 | |||||||||||||
Chymotrypsin family | 20 | 69 | 17 | 44 | 16 | 38 | 11 | 14 | 3 | 4 | 2 | 2 | |||||||||||||||
Adenosine triphosphatase 6 | 20 | 25 | 20 | 25 | 20 | 24 | 15 | 19 | 15 | 17 | 10 | 11 | |||||||||||||||
Actin | 19 | 39 | 19 | 38 | 19 | 37 | 19 | 35 | 19 | 33 | 19 | 33 | 18 | 30 | |||||||||||||
Ubiquitin family | 18 | 31 | 18 | 31 | 18 | 31 | 17 | 27 | 9 | 10 | 9 | 10 | 7 | 7 | |||||||||||||
Chemosensory protein | 18 | 24 | 17 | 22 | 16 | 20 | 6 | 6 | 3 | 3 | |||||||||||||||||
Troponin I family | 17 | 26 | 17 | 26 | 17 | 26 | 14 | 23 | 13 | 21 | 10 | 18 | 3 | 4 | |||||||||||||
Cathepsin family | 15 | 34 | 12 | 19 | 11 | 16 | 8 | 10 | 5 | 7 | 5 | 6 | 4 | 4 | |||||||||||||
NADH dehydrogenase subunit 2 | 15 | 19 | 14 | 18 | 13 | 16 | 3 | 3 | 2 | 2 | |||||||||||||||||
NADH dehydrogenase subunit 4 | 15 | 17 | 15 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | 11 | 11 | 6 | 6 | |||||||||||||
Tubulin family | 14 | 34 | 13 | 31 | 13 | 30 | 11 | 24 | 7 | 9 | 7 | 9 | 6 | 6 | |||||||||||||
RAS oncogene family | 14 | 26 | 11 | 17 | 9 | 13 | 8 | 11 | 3 | 3 | 3 | 3 | 3 | 3 | |||||||||||||
Heat shock protein family | 14 | 23 | 10 | 18 | 10 | 18 | 6 | 12 | 3 | 7 | 3 | 7 | 2 | 3 | |||||||||||||
Disulfide isomerase/thioredoxin family | 14 | 22 | 12 | 15 | 7 | 9 | 6 | 8 | 2 | 2 | |||||||||||||||||
28S Ribosomal RNA gene/ gamma-aminobutyric acid A receptor–associated protein | 14 | 21 | 14 | 20 | 9 | 11 | 8 | 10 | 7 | 8 | 7 | 7 | |||||||||||||||
Ferritin 1 family | 14 | 18 | 14 | 18 | 14 | 17 | 12 | 14 | 11 | 13 | 9 | 10 | 7 | 8 | |||||||||||||
NADH dehydrogenase subunit 1 | 14 | 18 | 14 | 18 | 13 | 16 | 11 | 12 | 9 | 9 | 8 | 8 | 4 | 4 | |||||||||||||
MP20/CalPoNin | 14 | 17 | 13 | 16 | 13 | 15 | 11 | 12 | 10 | 11 | 7 | 8 | 6 | 6 |
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
16S Ribosomal RNA gene | 28 | 113 | 28 | 109 | 28 | 100 | 26 | 50 | 24 | 47 | 8 | 10 | 7 | 7 | |||||||||||||
Cytochrome oxidase subunit I | 27 | 57 | 27 | 55 | 27 | 55 | 26 | 52 | 26 | 49 | 25 | 48 | 24 | 40 | |||||||||||||
Troponin/myosin family | 25 | 85 | 25 | 83 | 25 | 81 | 16 | 24 | 15 | 18 | 13 | 16 | 10 | 11 | |||||||||||||
Cytochrome c oxidase subunit III | 24 | 46 | 24 | 45 | 24 | 44 | 24 | 30 | 24 | 29 | 24 | 28 | 18 | 22 | |||||||||||||
Cytochrome oxidase subunit II | 23 | 31 | 23 | 30 | 23 | 29 | 22 | 28 | 20 | 26 | 19 | 24 | 17 | 20 | |||||||||||||
Cytochrome b | 23 | 26 | 23 | 26 | 23 | 26 | 22 | 23 | 22 | 23 | 21 | 22 | 20 | 21 | |||||||||||||
Chymotrypsin family | 20 | 69 | 17 | 44 | 16 | 38 | 11 | 14 | 3 | 4 | 2 | 2 | |||||||||||||||
Adenosine triphosphatase 6 | 20 | 25 | 20 | 25 | 20 | 24 | 15 | 19 | 15 | 17 | 10 | 11 | |||||||||||||||
Actin | 19 | 39 | 19 | 38 | 19 | 37 | 19 | 35 | 19 | 33 | 19 | 33 | 18 | 30 | |||||||||||||
Ubiquitin family | 18 | 31 | 18 | 31 | 18 | 31 | 17 | 27 | 9 | 10 | 9 | 10 | 7 | 7 | |||||||||||||
Chemosensory protein | 18 | 24 | 17 | 22 | 16 | 20 | 6 | 6 | 3 | 3 | |||||||||||||||||
Troponin I family | 17 | 26 | 17 | 26 | 17 | 26 | 14 | 23 | 13 | 21 | 10 | 18 | 3 | 4 | |||||||||||||
Cathepsin family | 15 | 34 | 12 | 19 | 11 | 16 | 8 | 10 | 5 | 7 | 5 | 6 | 4 | 4 | |||||||||||||
NADH dehydrogenase subunit 2 | 15 | 19 | 14 | 18 | 13 | 16 | 3 | 3 | 2 | 2 | |||||||||||||||||
NADH dehydrogenase subunit 4 | 15 | 17 | 15 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | 11 | 11 | 6 | 6 | |||||||||||||
Tubulin family | 14 | 34 | 13 | 31 | 13 | 30 | 11 | 24 | 7 | 9 | 7 | 9 | 6 | 6 | |||||||||||||
RAS oncogene family | 14 | 26 | 11 | 17 | 9 | 13 | 8 | 11 | 3 | 3 | 3 | 3 | 3 | 3 | |||||||||||||
Heat shock protein family | 14 | 23 | 10 | 18 | 10 | 18 | 6 | 12 | 3 | 7 | 3 | 7 | 2 | 3 | |||||||||||||
Disulfide isomerase/thioredoxin family | 14 | 22 | 12 | 15 | 7 | 9 | 6 | 8 | 2 | 2 | |||||||||||||||||
28S Ribosomal RNA gene/ gamma-aminobutyric acid A receptor–associated protein | 14 | 21 | 14 | 20 | 9 | 11 | 8 | 10 | 7 | 8 | 7 | 7 | |||||||||||||||
Ferritin 1 family | 14 | 18 | 14 | 18 | 14 | 17 | 12 | 14 | 11 | 13 | 9 | 10 | 7 | 8 | |||||||||||||
NADH dehydrogenase subunit 1 | 14 | 18 | 14 | 18 | 13 | 16 | 11 | 12 | 9 | 9 | 8 | 8 | 4 | 4 | |||||||||||||
MP20/CalPoNin | 14 | 17 | 13 | 16 | 13 | 15 | 11 | 12 | 10 | 11 | 7 | 8 | 6 | 6 |
NOTE.—NADH, reduced form of nicotinamide adenine dinucleotide.
The number of clusters and their size were strongly dependent on the significance level of the Blast search partly due to the separation of paralogs at higher stringency. This was evident in tubulins (breaking up into alpha and beta superfamilies at higher stringency), myosins (separating to light chain I and regulatory light chain II), and troponin I (separating to troponin I a1 and troponin I b1). Table 4 presents those clusters with the number of unique sequences equal to the number of taxa (libraries), i.e., where each taxon contributes only one ortholog. Such potentially paralogy-free clusters included a maximum of 14 taxa. Many of these were identified as coding RP genes and were used to test the phylogenetic utility of the EST database.
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Orthologous TUG Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
Translationally controlled tumor protein | 14 | 14 | 14 | 14 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 6 | 6 | |||||||||||||
No BlastX ID, no BlastN ID | 13 | 13 | 11 | 11 | 9 | 9 | |||||||||||||||||||||
40S RP S8 | 13 | 13 | 13 | 13 | 13 | 13 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
60S RP L24 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 7 | 7 | 7 | 7 | |||||||||||||||
40S RP S30 | 12 | 12 | 11 | 11 | 11 | 11 | 9 | 9 | 5 | 5 | |||||||||||||||||
60S RP L27 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 10 | 10 | 10 | 10 | 6 | 6 | |||||||||||||
Cytochrome c oxidase subunit Vb | 12 | 12 | 12 | 12 | 11 | 11 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||||||||
CG32230-PA [Drosophila melanogaster] | 11 | 11 | 11 | 11 | 10 | 10 | 2 | 2 | |||||||||||||||||||
CG4692-PB [D. melanogaster] | 11 | 11 | 11 | 11 | 11 | 11 | 7 | 7 | 7 | 7 | 2 | 2 | |||||||||||||||
40S RP S23 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
F1F0-type ATP synthase subunit g/CG6105-PA | 10 | 10 | 10 | 10 | 10 | 10 | 8 | 8 | 6 | 6 | |||||||||||||||||
40S RP S18 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | |||||||||||||
Peroxiredoxin V protein | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 7 | 7 | |||||||||||||
60S RP L11 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | |||||||||||||
60S RP L27A | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||
60S RP L6 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | 6 | 6 | 5 | 5 | 2 | 2 | |||||||||||||
60S RP L8 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
40S RP S17 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 7 | 7 | |||||||||||||
60S RP L19 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
60S RP L15 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | 5 | 5 | |||||||||||||
12S Ribosomal RNA gene | 8 | 8 | 6 | 6 | 6 | 6 | 2 | 2 | |||||||||||||||||||
40S RP S10 | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 5 | 5 | |||||||||||||
60S RP L36 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | |||||||||||||||||
Dynein light chain 2 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||||
40S RP S19 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 5 | 5 | |||||||||||||
Ribosome-associated membrane protein RAMP4 | 8 | 8 | 8 | 8 | 8 | 8 | |||||||||||||||||||||
Nitrogen fixation clusterlike | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | |||||||||||||||
40S RP S20 | 8 | 8 | 8 | 8 | 7 | 7 | 5 | 5 | 5 | 5 | 4 | 4 | |||||||||||||||
Vacuolar ATP synthase subunit G | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 |
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Orthologous TUG Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
Translationally controlled tumor protein | 14 | 14 | 14 | 14 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 6 | 6 | |||||||||||||
No BlastX ID, no BlastN ID | 13 | 13 | 11 | 11 | 9 | 9 | |||||||||||||||||||||
40S RP S8 | 13 | 13 | 13 | 13 | 13 | 13 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
60S RP L24 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 7 | 7 | 7 | 7 | |||||||||||||||
40S RP S30 | 12 | 12 | 11 | 11 | 11 | 11 | 9 | 9 | 5 | 5 | |||||||||||||||||
60S RP L27 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 10 | 10 | 10 | 10 | 6 | 6 | |||||||||||||
Cytochrome c oxidase subunit Vb | 12 | 12 | 12 | 12 | 11 | 11 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||||||||
CG32230-PA [Drosophila melanogaster] | 11 | 11 | 11 | 11 | 10 | 10 | 2 | 2 | |||||||||||||||||||
CG4692-PB [D. melanogaster] | 11 | 11 | 11 | 11 | 11 | 11 | 7 | 7 | 7 | 7 | 2 | 2 | |||||||||||||||
40S RP S23 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
F1F0-type ATP synthase subunit g/CG6105-PA | 10 | 10 | 10 | 10 | 10 | 10 | 8 | 8 | 6 | 6 | |||||||||||||||||
40S RP S18 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | |||||||||||||
Peroxiredoxin V protein | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 7 | 7 | |||||||||||||
60S RP L11 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | |||||||||||||
60S RP L27A | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||
60S RP L6 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | 6 | 6 | 5 | 5 | 2 | 2 | |||||||||||||
60S RP L8 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
40S RP S17 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 7 | 7 | |||||||||||||
60S RP L19 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
60S RP L15 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | 5 | 5 | |||||||||||||
12S Ribosomal RNA gene | 8 | 8 | 6 | 6 | 6 | 6 | 2 | 2 | |||||||||||||||||||
40S RP S10 | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 5 | 5 | |||||||||||||
60S RP L36 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | |||||||||||||||||
Dynein light chain 2 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||||
40S RP S19 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 5 | 5 | |||||||||||||
Ribosome-associated membrane protein RAMP4 | 8 | 8 | 8 | 8 | 8 | 8 | |||||||||||||||||||||
Nitrogen fixation clusterlike | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | |||||||||||||||
40S RP S20 | 8 | 8 | 8 | 8 | 7 | 7 | 5 | 5 | 5 | 5 | 4 | 4 | |||||||||||||||
Vacuolar ATP synthase subunit G | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 |
NOTE.—ATP, adenosine triphosphate.
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Orthologous TUG Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
Translationally controlled tumor protein | 14 | 14 | 14 | 14 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 6 | 6 | |||||||||||||
No BlastX ID, no BlastN ID | 13 | 13 | 11 | 11 | 9 | 9 | |||||||||||||||||||||
40S RP S8 | 13 | 13 | 13 | 13 | 13 | 13 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
60S RP L24 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 7 | 7 | 7 | 7 | |||||||||||||||
40S RP S30 | 12 | 12 | 11 | 11 | 11 | 11 | 9 | 9 | 5 | 5 | |||||||||||||||||
60S RP L27 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 10 | 10 | 10 | 10 | 6 | 6 | |||||||||||||
Cytochrome c oxidase subunit Vb | 12 | 12 | 12 | 12 | 11 | 11 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||||||||
CG32230-PA [Drosophila melanogaster] | 11 | 11 | 11 | 11 | 10 | 10 | 2 | 2 | |||||||||||||||||||
CG4692-PB [D. melanogaster] | 11 | 11 | 11 | 11 | 11 | 11 | 7 | 7 | 7 | 7 | 2 | 2 | |||||||||||||||
40S RP S23 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
F1F0-type ATP synthase subunit g/CG6105-PA | 10 | 10 | 10 | 10 | 10 | 10 | 8 | 8 | 6 | 6 | |||||||||||||||||
40S RP S18 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | |||||||||||||
Peroxiredoxin V protein | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 7 | 7 | |||||||||||||
60S RP L11 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | |||||||||||||
60S RP L27A | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||
60S RP L6 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | 6 | 6 | 5 | 5 | 2 | 2 | |||||||||||||
60S RP L8 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
40S RP S17 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 7 | 7 | |||||||||||||
60S RP L19 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
60S RP L15 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | 5 | 5 | |||||||||||||
12S Ribosomal RNA gene | 8 | 8 | 6 | 6 | 6 | 6 | 2 | 2 | |||||||||||||||||||
40S RP S10 | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 5 | 5 | |||||||||||||
60S RP L36 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | |||||||||||||||||
Dynein light chain 2 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||||
40S RP S19 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 5 | 5 | |||||||||||||
Ribosome-associated membrane protein RAMP4 | 8 | 8 | 8 | 8 | 8 | 8 | |||||||||||||||||||||
Nitrogen fixation clusterlike | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | |||||||||||||||
40S RP S20 | 8 | 8 | 8 | 8 | 7 | 7 | 5 | 5 | 5 | 5 | 4 | 4 | |||||||||||||||
Vacuolar ATP synthase subunit G | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 |
. | Cutoff E Value . | . | . | . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Top Orthologous TUG Clusters . | 10−10 . | . | 10−15 . | . | 10−20 . | . | 10−40 . | . | 10−50 . | . | 10−60 . | . | 10−80 . | . | |||||||||||||
Translationally controlled tumor protein | 14 | 14 | 14 | 14 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 6 | 6 | |||||||||||||
No BlastX ID, no BlastN ID | 13 | 13 | 11 | 11 | 9 | 9 | |||||||||||||||||||||
40S RP S8 | 13 | 13 | 13 | 13 | 13 | 13 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
60S RP L24 | 12 | 12 | 9 | 9 | 9 | 9 | 8 | 8 | 7 | 7 | 7 | 7 | |||||||||||||||
40S RP S30 | 12 | 12 | 11 | 11 | 11 | 11 | 9 | 9 | 5 | 5 | |||||||||||||||||
60S RP L27 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 10 | 10 | 10 | 10 | 6 | 6 | |||||||||||||
Cytochrome c oxidase subunit Vb | 12 | 12 | 12 | 12 | 11 | 11 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||||||||
CG32230-PA [Drosophila melanogaster] | 11 | 11 | 11 | 11 | 10 | 10 | 2 | 2 | |||||||||||||||||||
CG4692-PB [D. melanogaster] | 11 | 11 | 11 | 11 | 11 | 11 | 7 | 7 | 7 | 7 | 2 | 2 | |||||||||||||||
40S RP S23 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 10 | 10 | 9 | 9 | |||||||||||||
F1F0-type ATP synthase subunit g/CG6105-PA | 10 | 10 | 10 | 10 | 10 | 10 | 8 | 8 | 6 | 6 | |||||||||||||||||
40S RP S18 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | |||||||||||||
Peroxiredoxin V protein | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 7 | 7 | |||||||||||||
60S RP L11 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | |||||||||||||
60S RP L27A | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||
60S RP L6 | 9 | 9 | 9 | 9 | 9 | 9 | 6 | 6 | 6 | 6 | 5 | 5 | 2 | 2 | |||||||||||||
60S RP L8 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
40S RP S17 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 7 | 7 | |||||||||||||
60S RP L19 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | |||||||||||||
60S RP L15 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | 5 | 5 | |||||||||||||
12S Ribosomal RNA gene | 8 | 8 | 6 | 6 | 6 | 6 | 2 | 2 | |||||||||||||||||||
40S RP S10 | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 5 | 5 | |||||||||||||
60S RP L36 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | |||||||||||||||||
Dynein light chain 2 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | |||||||||||||||
40S RP S19 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 5 | 5 | |||||||||||||
Ribosome-associated membrane protein RAMP4 | 8 | 8 | 8 | 8 | 8 | 8 | |||||||||||||||||||||
Nitrogen fixation clusterlike | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 6 | 6 | 6 | 6 | |||||||||||||||
40S RP S20 | 8 | 8 | 8 | 8 | 7 | 7 | 5 | 5 | 5 | 5 | 4 | 4 | |||||||||||||||
Vacuolar ATP synthase subunit G | 8 | 8 | 8 | 8 | 8 | 8 | 7 | 7 |
NOTE.—ATP, adenosine triphosphate.
The Higher Coleopteran (Beetle) Phylogeny from RPs
Out of a complete set of 76 nonacidic RPs found in insects (Landais et al. 2003), our typical EST libraries (with between 200–500 double-strand sequenced ESTs) recovered between 10 and 30 RPs, with notably fewer copies detected in smaller libraries (fig. 1). We also included data with existing larger EST resources for Tribolium (1,825 ESTs) and Ips (1,671 ESTs), which yielded a much higher proportion of RP genes and higher transcript redundancy (e.g., T. castaneum had a mean of five ESTs per RP ± 3 standard deviation). In each taxon, redundant sequences easily grouped together to generate a single transcript for each RP. The ease of grouping redundant transcripts increased the confidence that most, if not all, RPs have a single expressed copy in Coleoptera and hence that phylogenetic analyses were conducted across orthologous sequences.
Overall, these data suggest that the number of detected RP genes increases linearly with greater numbers of ESTs (fig. 1, R2 = 0.7006; y = 0.0278x) and further predict that libraries of ∼2,000 ESTs obtained from whole adult specimens can yield complete sets of RPs. The linear increase is consistent with the fact that different RP genes were recovered in different organisms (fig. 2), even if a similar total number of RPs was detected. This might indicate that most RPs genes have a similar chance to be cloned in our relatively small libraries, but the total ESTs sequenced needs to be higher than a few hundred ESTs to obtain the complete set of 76 RPs.
Phylogenetic analysis was conducted to establish basal relationships in the Coleoptera with 66 RP genes using both a “supertree” (derived from topology of individual gene trees) and a “supermatrix” (derived by simultaneous phylogenetic analysis of all sequence information). Ten additional RP genes were detected in less than 4 out of the 20 Coleoptera species and could not be used for phylogenetic analysis. After removing these sequences and the alignment-sensitive regions from all other genes (Materials and Methods), the final data matrix included a total of 10,403 amino acid residues, with individual taxa represented by 447 (Platystomos) to 9,151 (Tribolium) residues with an average 2,976 ± 1,892 residues and an overall degree of matrix completion of 28.6%. Individual genes were represented in between 4 and 10 taxa. When all 76 RPs are considered, the mean number of taxa per gene was 5.68. All methods of tree construction (maximum parsimony, ML, Bayesian, and supertree) produced similar tree topologies (fig. 3). At the deepest nodes, when rooted with the suborder Archostemata, the remaining coleopteran suborders resolved as (Adephaga (Myxophaga, Polyphaga)), although the supertree analyses placed Myxophaga (Sphaerius sp.) within the Polyphaga. In all analyses, the Elateriformia (one of the five Series of families of Polyphaga) was a paraphyletic assemblage of basal Polyphaga, with the Eucinetidae (Eucinetus sp.) sister to the remaining Series, Staphyliniformia, Scarabaeiformia, and Cucujiformia. The close relationship of Scarabaeus laticollis (Scarabaeiformia), Georissus sp., and Hister sp. (Staphyliniformia) supported the Haplogastra uniting both Series (Crowson 1955) but rendered the Staphyliniformia paraphyletic in accordance with recent findings (Korte et al. 2004; Caterino, Hunt, and Vogler 2005). The monophyly of Cucujiformia, a group of derived polyphagan beetles containing about half of all beetle species, was recovered, and the well-established superfamilies Tenebrionoidea, Chrysomeloidea, and Curculionoidea each were monophyletic, with Biphyllus lunatus (Biphyllidae) placed at the base, as expected. The supertree approach yielded generally less resolution and misplaced Platystomos (the smallest library) outside of the Phytophaga, as did the Bayesian analysis.
Discussion
EST databases are rapidly growing, with approximately 27.6 million entries in GenBank as of June 2005 (http://www.ncbi.nlm.nih.gov/dbEST/). Yet, until recently, the taxonomic coverage of the Class Insecta has been limited to 8 of the 25 or so insect orders. Within the largest order, Coleoptera, three libraries have become available recently, but taxonomically, these represent only a very limited group within one of the Series of Polyphaga. (Two further libraries were added to dbEST since our analysis was conducted.) EST representation in the insects has been severely biased toward Diptera, comprising 15 of 47 holometabolan insects as of June 2005 and ∼628,300 out of ∼919,200 EST sequences (excluding our data). Although the EST data sets presented here are small in comparison with other arthropod EST projects, we have almost doubled the taxonomic coverage of arthropod orders, including the first EST libraries for Strepsiptera, Rhaphidiodea, Trichoptera, Mecoptera, and Thysanura, and added over 11,000 ESTs from the Coleoptera, arguably the most diverse insect order, sampled from the broadest possible taxonomic diversity.
Our main aim was to test whether generating a small number of ESTs from a broad sample of taxa would be a suitable approach to phylogeny reconstruction. The findings confirm that even small libraries (<1,000 clones) show high levels of matching TUGs. Even with an average library size of 257 unique sequences, we recovered a conserved core of genes represented consistently across libraries. Many of these genes had not previously been used for phylogeny reconstruction, increasing the spectrum of molecular markers available to insect systematics. The most widely detected clusters contained mitochondrial DNA transcripts, enzymes, and RPs. However, tree construction was impeded by the great proportion of missing data entries, in particular due to several of the smaller libraries in our data set. Based on the completeness of RP representation in the libraries (fig. 1), we extrapolate that approximately 2,000 ESTs are needed to recover these highly expressed genes consistently when extracting total RNA from a whole adult specimen. Using embryonic tissues, for example, with a high rate of biosynthesis may increase the proportion of RPs in the libraries and lower the number of ESTs needed to generate the complete set of RPs in each taxon.
Such a large number of sequences may appear to be a costly way to establish phylogenetic relationships between taxa. However, the success of sequencing multiple single-copy loci to resolve the deeper nodes within the Tree of Life (e.g., in mammals: Murphy et al. 2001; Teeling et al. 2005) cannot easily be extended to most groups via traditional PCR methods using degenerate primers. Our efforts to amplify even a few nonstandard single-copy genes consistently within or across different superfamilies of the Coleoptera have largely failed (unpublished data), and the best results to date were obtained when the primers have been based on the EST sequences obtained here (Pons et al. 2004). As automation advances and the cost of sequencing decreases, dense EST sampling is likely to become a more cost-effective approach for acquiring single-copy nuclear markers for the deep-level molecular systematics of many groups.
A perhaps unexpected finding was the high degree of paralogy in most clusters evident from the large number of within-library similarity hits. Paralogs can prohibit the determination of species relationships and mislead phylogenetic inferences if they are not detected. However, tentative orthologous clusters (i.e., with only a single member per taxon) were readily detected and included up to 14 of the 34 taxa (some of which were present in very small libraries). In future, some of these genes may prove not to be paralogy free, but it is reassuring that they include a number of housekeeping genes, such as RPs, which are already known to be largely paralogy free across Metazoa (Landais et al. 2003; Philippe et al. 2004). Other large clusters that were paralogy free under high clustering stringency only (table 3) will require further analyses to separate different paralogy groups.
For molecular systematics, EST sequencing exposes us to hundreds of loci for which we have no existing information about the pattern of molecular variation and phylogenetic information content. At this early stage of comparative EST sequencing, it already seems obvious that only a minority of the available genes will emerge as useful for reconstructing phylogenetic relationships at the deeper hierarchical levels, whereas most gene sequences will be shown to suffer from shallow paralogy possibly linked to functional diversity. As EST sequences tend to be short, well-supported phylogenetic trees will only emerge when several genes of overlapping resolution are combined, together enhancing the phylogenetic signal (Olmstead and Sweere 1994; Gatesy et al. 1999). However, simultaneous analysis is only justified once orthology has been established.
Clearly, the RP genes provide such a resource and were used here to provide valuable insights into the phylogeny of Coleoptera (fig. 3). The relationships among the four suborders of Coleoptera have long been controversial (Hennig 1981; Lawrence and Newton 1995; Beutel and Haas 2000), with each of the three possible arrangements supported by reputable studies (Kukalova-Peck and Lawrence 1993; Beutel and Haas 2000; Caterino et al. 2002). The supermatrix analysis based on 66 RPs suggests the placement of Myxophaga as sister to Polyphaga which is consistent with the traditional view, going back to Crowson (1955, 1960), and several later studies based on various morphological character systems. These results conflict with those from 18S rRNA, which place Polyphaga with Adephaga as the sister, not Myxophaga (Caterino et al. 2002), but phylogenetic conclusions from this gene are affected by length variation and the rate heterogeneity, and hence, independent evidence from RPs is very valuable. Within the Polyphaga, the EST data supported the general ideas about basal relationships of the Series (the five traditional family groups of Polyphaga), including the paraphyly of Staphyliniformia with respect to Scarabaeiformia (Korte et al. 2004; Caterino et al. 2005), the paraphyly of Elateriformia and their basal position within Polyphaga (Caterino et al. 2002), and the monophyly of the large Cucujiformia and the large phytophagous Chrysomeloidea and Curculionoidea (“Phytophaga”).
In conclusion, we used dense EST sampling for molecular systematics, to avoid difficult PCR-based methods and extend the range of gene markers for multigene phylogenetics. Comparable studies in nematodes (Parkinson et al. 2004b) and Apicomplexa (Li et al. 2003) focused on gene discovery and comparative genomics, and it will be interesting to use these much larger EST data for phylogenetic analysis in the way proposed here. It is evident from our analysis that phylogenetic inferences will suffer from the unexpectedly high level of paralogy affecting most of the highly expressed loci, unless paralogy groups whose origin precedes the separation of the focal taxa can be separated a priori (Philippe, Lartillot, and Brinkmann 2005; Rodriguez-Ezpeleta et al. 2005).
Many questions remain for the use of the broad EST approach, for example, which molecular techniques are most suitable for enriching the desired loci prior to sequencing or the utility of tissue-specific libraries to reduce the recovery of paralogous sequences. For example, libraries of P. dardanus were obtained from wing discs and included a much higher proportion of RPs than most of the other libraries which were obtained from total adult tissue (table 2). Furthermore, for comparative studies, bidirectional sequencing of ESTs and careful curation of redundant sequences is important to mitigate problems otherwise introduced by sequencing and partial gene sequences. However, the best strategy might be to sequence the majority of ESTs in a single direction and only sequence the reverse direction when the full length of specific genes is missing.
RP genes apparently were little affected by recent paralogy and provide a formidable resource for deep-level phylogenetics. With 66 genes included here in an analysis of Coleoptera, this represents a great advance over the existing trees from single genes (Howland and Hewitt 1995; Caterino et al. 2002). However, as the matrix includes some 71.4% of missing data, support levels inevitably will be low (Wiens 2003; Hughes and Vogler 2004; Philippe et al. 2004) even if the effect may be less pronounced with a greater number of genes (Driskell et al. 2004). Yet, presenting just under 9,000 unique nuclear sequences, the current study provides a foundation for multilocus phylogenetics of Coleoptera and other insect groups. Dense taxonomic EST sampling will offer us new opportunities for phylogenetic analysis while also providing a less myopic glimpse of the functional and evolutionary diversity in the most species-rich lineage on Earth.
Present address: Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, Graham Kerr Building, University of Glasgow, Glasgow, United Kingdom
These authors contributed equally to the work.
Herve Philippe, Associate Editor
We are grateful to Sue Lomas and Francis Wright at the sequencing facilities at Silwood Park and Derek Huntley, James Abbott, and Gail Bartlett from the Bioinformatics Support Service at Imperial College. We thank Hans Pohl, Ignacio Ribera, Michael Balke, and Peter Hammond for contributing insect specimens. We greatly thank Herve Philippe and anonymous reviewers for useful comments, and Miquel Arnedo, Alexandra Cieslak, Jose Galián, Jesus Gómez-Zurita, Fatos Kopliku, and Nathalie Tristem for contributing additional library construction and sequencing. This project was funded by Biotechnology and Biological Sciences Research Council grant 49/G14548 to Michael Caterino, A.P.V. and P.G.F. and a Ph.D. studentship to S.J.L. Additional funding were from the Department of Trade and Industry, United Kingdom and an Alexander S. Onassis foundation scholarship to A.P.
References
Aguinaldo, A. M., J. M. Turbeville, L. S. Linford, M. C. Rivera, J. R. Garey, R. A. Raff, and J. A. Lake.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
Bapteste, E., H. Brinkmann, J. A. Lee et al. (11 co-authors).
Baum, B. R.
Beutel, R. G., and F. Haas.
Blair, J. E., K. Ikeo, T. Gojobori, and S. B. Hedges.
Castresana, J.
Caterino, M. S., S. Cho, and F. A. Sperling.
Caterino, M. S., T. Hunt, and A. P. Vogler.
Caterino, M. S., V. L. Shull, P. M. Hammond, and A. P. Vogler.
Creevey, C. J., and J. O. McInerney.
Crowson, R. A.
Dopazo, H., and J. Dopazo.
Driskell, A. C., C. Ane, J. G. Burleigh, M. M. McMahon, B. C. O'Meara, and M. J. Sanderson.
Eigenheer, A. L., C. I. Keeling, S. Young, and C. Tittiger.
Erwin, T. L.
Ewing, B., and P. Green.
Ewing, B., L. Hillier, M. C. Wendl, and P. Green.
Gatesy, J., M. Milinkovitch, V. Waddell, and M. Stanhope.
Guindon, S., and O. Gascuel.
Hammond, P. M.
Hedges, S. B., J. E. Blair, M. L. Venturi, and J. L. Shoe.
Howland, D. E., and G. M. Hewitt.
Hsiao, L. L., F. Dangond, T. Yoshida et al. (22 co-authors).
Huelsenbeck, J. P., and F. Ronquist.
Hughes, J., and A. P. Vogler.
Korte, A., I. Ribera, R. G. Beutal, and D. Bernhard.
Kukalova-Peck, J., and J. F. Lawrence.
Landais, I., M. Ogliastro, K. Mita, J. Nohata, M. Lopez-Ferber, M. Duonor-Cerutti, T. Shimada, P. Fournier, and G. Devauchelle.
Lawrence, J. F., and A. F. Newton Jr.
Li, L., B. P. Brunk, J. C. Kissinger et al. (20 co-authors).
Mita, K., M. Morimyo, K. Okano et al. (12 co-authors).
Murphy, W. J., E. Eizirik, S. J. O'Brien et al. (11 co-authors).
Olmstead, R. G., and J. A. Sweere.
Parkinson, J., A. Anthony, J. Wasmuth, R. Schmid, A. Hedley, and M. Blaxter.
Parkinson, J., and M. Blaxter.
Parkinson, J., D. B. Guiliano, and M. Blaxter.
Parkinson, J., M. Mitreva, C. Whitton et al. (12 co-authors).
Philip, G. K., C. J. Creevey, and J. O. McInerney.
Philippe, H., N. Lartillot, and H. Brinkmann.
Philippe, H., E. A. Snell, E. Bapteste, P. Lopez, P. W. Holland, and D. Casane.
Pons, J., T. Barraclough, K. Theodorides, A. Cardoso, and A. Vogler.
Ragan, M. A.
Rodriguez-Ezpeleta, N., H. Brinkmann, S. C. Burey, B. Roure, G. Burger, W. Loffelhardt, H. J. Bohnert, H. Philippe, and B. F. Lang.
Rubin, G. M., L. Hong, P. Brokstein, M. Evans-Holm, E. Frise, M. Stapleton, and D. A. Harvey.
Rudd, S.
Teeling, E. C., M. S. Springer, O. Madsen, P. Bates, J. O'Brien S, and W. J. Murphy.
Theodorides, K., A. De Riva, J. Gomez-Zurita, P. G. Foster, and A. P. Vogler.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins.
Wheeler, W. C., M. Whiting, Q. D. Wheeler, and J. M. Carpenter.
Wiens, J. J.
Author notes
*Department of Entomology, The Natural History Museum, London, United Kingdom; †Department of Biological Sciences, Imperial College London, Silwood Park Campus, Ascot, United Kingdom; and ‡Department of Zoology, The Natural History Museum, London, United Kingdom