Toxicity by descent: A comparative approach for chemical hazard assessment

Toxicology is traditionally divided between human and eco-toxicology. In the shared pursuit of environmental health, this separation does not account for discoveries made in the comparative studies of animal genomes. Here, we provide evidence on the feasibility of understanding the health impact of chemicals on all animals, including ecological keystone species and humans, based on a significant number of conserved genes and their functional associations to health-related outcomes across much of animal diversity. We test four conditions to understand the value of comparative genomics data to inform mechanism-based human and environmental hazard assessment: (1) genes that are most fundamental for health evolved early during animal evolution; (2) the molecular functions of pathways are better conserved among distantly related species than the individual genes that are members of these pathways; (3) the most conserved pathways among animals are those that cause adverse health outcomes when disrupted; (4) gene sets that serve as molecular signatures of biological processes or disease-states are largely enriched by evolutionarily conserved genes across the animal phylogeny. The concept of homology is applied in a comparative analysis of gene families and pathways among invertebrate and vertebrate species compared with humans. Results show that over 70% of gene families associated with disease are shared among the greatest variety of animal species through evolution. Pathway conservation between invertebrates and humans is based on the degree of conservation within vertebrates and the number of interacting genes within the human network. Human gene sets that already serve as biomarkers are enriched by evolutionarily conserved genes across the animal phylogeny. By implementing a comparative method for chemical hazard assessment, human and eco-toxicology converge towards a more holistic and mechanistic understanding of toxicity disrupting biological processes that are important for health and shared among animals (including humans).


Figures A1 to A4
Only gene sets greater than 3 and less than 500 gene families were used in the gene set enrichment analysis.
Table A8.The results of the gene set enrichment analysis (enrichment scores and significance thresholds) for the fifty MSigDB hallmark gene sets and the 1266 gene sets that consist of Reactome pathways measured against the homologous gene sets ranked

Figure A1 .Figure A2 .
Figure A1.Venn diagram of the distribution of 7644 genes within the MSigDB Hallmark genes according to their relative evolutionary conservation by their Shannon indices (H').Size = number of gene families.ES = enrichment score.NES = normalized enrichment score.NOM p-val = nominal p-value.FDR q-val = false discovery rate.FWER p-val = familywise-error rate.Rank at Max = the position in the ranked list at which the maximum enrichment score occurred.Table A9.The mapping of biomarkers for cancers, DNA damage, reproductive abnormalities, endocrine disruption and neurotoxicity to 437 human pathways retrieved from the Ingenuity Pathway Analysis (IPA, Qiagen, (Kramer et al. 2014)) commercial database.Table A10.The results of the gene set enrichment analysis (enrichment scores and significance thresholds) for the 416 gene sets of biomarkers (for cancers, DNA damage, reproductive abnormalities, endocrine disruption and neurotoxicity) identified from the Ingenuity Pathway Analysis (IPA, Qiagen, [39]) commercial database, measured against the homologous gene sets ranked according to their relative evolutionary conservation by their Shannon indices (H').Size = number of gene families.ES = enrichment score.NES = normalized enrichment score.NOM p-val = nominal p-value.FDR q-val = false discovery rate.FWER p-val = familywise-error rate.Rank at Max = the position in the ranked list at which the maximum enrichment score occurred.

Table legends Table A1 .
List of human genes listed in the OMIM database with their various accession numbers.

Table A2 .
Number of genes within 12 animal genomes (Suite 1) belonging to gene families present in the human genome.Data were obtained from the OrthoDB database.
TableA3.Number of genes within 12 animal genomes (Suite 1) belonging to gene families that map to 18,810 loci of the human genome, which are annotated as disease, non-disease and unknown based on information obtained from the OMIM database.TableA4.Number of genes and number of reactions found for 2180 human pathways shared among eight species (Suite 2), including gene identifiers for Mus and Drosophila and their proportions of disease-associated genes within pathways.Pathways are also annotated as lowest level and disease pathways, obtained from the Reactome pathway database.

Table A5 .
List of gene families within each of the fifty MSigDB hallmark gene sets.

Table A6 .
Calculation of the Shannon index (H') for each gene family based on the distribution of orthologs across the 12 (Suite 1) species.

Table A7 .
List of gene families within each gene set that consists of Reactome pathways.

Table A11 .
List of 100 sets of homolog gene sets representing random subsets of 100 genes from the United States National Toxicology Program's s1500+ panel of markers used in a gene set enrichment analysis.

Table A12 .
List of 100 sets of homolog gene sets representing random subsets of the human genome used in a gene set enrichment analysis to contrast results against the results obtained from the Program's s1500+ panel of markers.