research-article

Open Access

Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

Authors:
Mohannad Elhamod

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0000-0002-2383-947X
View Profile

,
Mridul Khurana

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0009-0003-9346-3206
View Profile

,
Harish Babu Manogaran

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0000-0003-3709-4656
View Profile

,
Josef C. Uyeda

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0000-0003-4624-9680
View Profile

,
Meghan A. Balk

Battelle, Columbus, OH, USA

Battelle, Columbus, OH, USA

0000-0003-2699-3066
View Profile

,
Wasila Dahdul

University of California, Irvine, Irvine, CA, USA

University of California, Irvine, Irvine, CA, USA

0000-0003-3162-7490
View Profile

,
Yasin Bakis

Tulane University, New Orleans, LA, USA

Tulane University, New Orleans, LA, USA

0000-0001-6144-9440
View Profile

,
Henry L. Bart

Tulane University, New Orleans, LA, USA

Tulane University, New Orleans, LA, USA

0000-0002-5662-9444
View Profile

,
Paula M. Mabee

Battelle, Columbus, OH, USA

Battelle, Columbus, OH, USA

0000-0002-8455-3213
View Profile

,
Hilmar Lapp

Duke University, Durham, NC, USA

Duke University, Durham, NC, USA

0000-0001-9107-0714
View Profile

,
James P. Balhoff

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

0000-0002-8688-6599
View Profile

,
Caleb Charpentier

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0000-0002-9787-7081
View Profile

,
David Carlyn

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA

0000-0002-8323-0359
View Profile

,
Wei-Lun Chao

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA

0000-0003-1269-7231
View Profile

,
Charles V. Stewart

Rensselaer Polytechnic Institute, Troy, NY, USA

Rensselaer Polytechnic Institute, Troy, NY, USA

0000-0001-6532-6675
View Profile

,
Daniel I. Rubenstein

Princeton University, Princeton , NJ, USA

Princeton University, Princeton , NJ, USA

0000-0001-9049-5219
View Profile

,
Tanya Berger-Wolf

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA

0000-0001-7610-1412
View Profile

,
Anuj Karpatne

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA

0000-0003-1647-3534
View Profile

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2023Pages 3966–3978https://doi.org/10.1145/3580305.3599808

Published:04 August 2023Publication History

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3966–3978

ABSTRACT

Discovering evolutionary traits that are heritable across species on the tree of life (also referred to as a phylogenetic tree) is of great interest to biologists to understand how organisms diversify and evolve. However, the measurement of traits is often a subjective and labor-intensive process, making trait discovery a highly label-scarce problem. We present a novel approach for discovering evolutionary traits directly from images without relying on trait labels. Our proposed approach, Phylo-NN, encodes the image of an organism into a sequence of quantized feature vectors -or codes- where different segments of the sequence capture evolutionary signals at varying ancestry levels in the phylogeny. We demonstrate the effectiveness of our approach in producing biologically meaningful results in a number of downstream tasks including species image generation and species-to-species image translation, using fish species as a target example

Supplemental Material

3580305.3599808-2min-promo.mp4

mp4

33 MB

Download

References

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. Advances in neural information processing systems 31 (2018).Google Scholar
Brandon Anderson, Truong Son Hy, and Risi Kondor. 2019. Cormorant: Covariant Molecular Neural Networks. Advances in Neural Information Processing Systems 32 (2019), 14537--14546.Google Scholar
Jonathan Chang, Daniel L Rabosky, Stephen A Smith, and Michael E Alfaro. 2019. An R package and online resource for macroevolutionary studies using the ray-finned fish tree of life. Methods in Ecology and Evolution 10, 7 (2019), 1118--1124.Google ScholarCross Ref
Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This Looks Like That: Deep Learning for Interpretable Image Recognition. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/ adf7ee2dcf142b0e11888e72b43fcb75-Paper.pdfGoogle Scholar
Zhi Chen, Yijie Bei, and Cynthia Rudin. 2020. Concept whitening for interpretable image recognition. Nature Machine Intelligence 2, 12 (2020), 772--782.Google ScholarCross Ref
Julien Clavel, Gilles Escarguel, and Gildas Merceron. 2015. mvMORPH: an R package for fitting multivariate evolutionary models to morphometric data. Methods in Ecology and Evolution 6, 11 (2015), 1311--1319.Google ScholarCross Ref
Michael L. Collyer and Dean C. Adams. 2021. Phylogenetically aligned component analysis. Methods in Ecology and Evolution 12, 2 (2021), 359--372. https://doi.org/10.1111/2041--210X.13515 arXiv:https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041- 210X.13515Google ScholarCross Ref
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. 2023. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).Google ScholarDigital Library
Arka Daw, Anuj Karpatne, William D Watkins, Jordan S Read, and Vipin Kumar. 2017. Physics-guided neural networks (pgnn): An application in lake temperature modeling. In Knowledge-Guided Machine Learning. Chapman and Hall/CRC, 353-- 372.Google Scholar
Anderson Aparecido dos Santos and Wesley Nunes Gonçalves. 2019. Improving Pantanal fish species recognition through taxonomic ranks in convolutional neural networks. Ecological Informatics 53 (2019), 100977.Google ScholarCross Ref
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for interpretable machine learning. Commun. ACM 63, 1 (2019), 68--77.Google ScholarDigital Library
Mohannad Elhamod, Kelly M. Diamond, A. Murat Maga, Yasin Bakis, Henry L. Bart Jr., Paula Mabee, Wasila Dahdul, Jeremy Leipzig, Jane Greenberg, Brian Avants, and Anuj Karpatne. 2022. Hierarchy-guided neural network for species classification. Methods in Ecology and Evolution 13, 3 (2022), 642--652. https://doi.org/10.1111/2041--210X. 13768 arXiv:https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041- 210X.13768Google ScholarCross Ref
Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12873--12883.Google ScholarCross Ref
Raissa Garozzo, Cettina Santagati, Concetto Spampinato, and Giuseppe Vecchio. 2021. Knowledge-based generative adversarial networks for scene understanding in Cultural Heritage. Journal of Archaeological Science: Reports 35 (2021), 102736.Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139--144.Google ScholarDigital Library
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.Google ScholarCross Ref
David Houle and Daniela M Rossoni. 2022. Complexity, Evolvability, and the Process of Adaptation. Annual Review of Ecology, Evolution, and Systematics 53 (2022).Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-toImage Translation with Conditional Adversarial Networks. CVPR (2017).Google Scholar
George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning. Nature Reviews Physics 3, 6 (2021), 422--440.Google ScholarCross Ref
Anuj Karpatne, Gowtham Atluri, James H Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, and Vipin Kumar. 2017. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Transactions on knowledge and data engineering 29, 10 (2017), 2318--2331.Google ScholarDigital Library
Anuj Karpatne, Ramakrishnan Kannan, and Vipin Kumar. 2022. Knowledge Guided Machine Learning: Accelerating Discovery using Scientific Knowledge and Data. CRC Press.Google Scholar
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852--863.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.Google ScholarCross Ref
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.Google ScholarCross Ref
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings. neurips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdfGoogle Scholar
Xiao Li, Chenghua Lin, Ruizhe Li, Chaozheng Wang, and Frank Guerin. 2020. Latent space factorisation and manipulation via matrix subspace projection. In International Conference on Machine Learning. PMLR, 5916--5926.Google Scholar
Moritz D Lürig, Seth Donoughe, Erik I Svensson, Arthur Porto, and Masahito Tsuboi. 2021. Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology. Frontiers in Ecology and Evolution 9 (2021), 642774.Google ScholarCross Ref
Michael Lynch. 1991. Methods for the analysis of comparative data in evolutionary biology. Evolution 45, 5 (1991), 1065--1080.Google ScholarCross Ref
ML Menéndez, JA Pardo, L Pardo, and MC Pardo. 1997. The jensen-shannon divergence. Journal of the Franklin Institute 334, 2 (1997), 307--318.Google ScholarCross Ref
Meike Nauta, Ron van Bree, and Christin Seifert. 2021. Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14933--14943.Google ScholarCross Ref
NSF HDR Imageomics Institute. 2021. Imageomics: A new frontier of biological information powered by knowledge-guided machine learning. https: //imageomics.osu.edu/.Google Scholar
Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. https://doi.org/10.48550/ARXIV.1711.00937Google Scholar
Stanislav Pidhorskyi, Donald A Adjeroh, and Gianfranco Doretto. 2020. Adversarial latent autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14104--14113.Google ScholarCross Ref
Samantha A Price, Sarah T Friedman, Katherine A Corn, Olivier Larouche, Kasey Brockelsby, Anna J Lee, Maya Nagaraj, Nick G Bertrand, Mailee Danao, Megan C Coyne, et al. 2022. FishShapes v1: Functionally relevant measurements of teleost shape and size on three dimensions.Google Scholar
Mengshi Qi, Yunhong Wang, Jie Qin, and Annan Li. 2019. Ke-gan: Knowledge embedded generative adversarial networks for semi-supervised scene parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5237--5246.Google ScholarCross Ref
Daniel L Rabosky, Jonathan Chang, Peter F Cowman, Lauren Sallan, Matt Friedman, Kristin Kaschner, Cristina Garilao, Thomas J Near, Marta Coll, Michael E Alfaro, et al. 2018. An inverse latitudinal gradient in speciation rate for marine fishes. Nature 559, 7714 (2018), 392--395.Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378 (2019), 686--707.Google ScholarCross Ref
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2287--2296.Google ScholarCross Ref
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.Google ScholarCross Ref
Tiago R Simões, Michael W Caldwell, Alessandro Palci, and Randall L Nydam. 2017. Giant taxon-character matrices: quality of character constructions remains critical regardless of size. Cladistics 33, 2 (2017), 198--219.Google ScholarCross Ref
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google Scholar
Randal A Singer, Kevin J Love, and Lawrence M Page. 2018. A survey of digitized data from US fish collections in the iDigBio data aggregator. PloS one 13, 12 (2018), e0207636.Google ScholarCross Ref
Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. 2016. Conditional image generation with pixelcnn decoders. Advances in neural information processing systems 29 (2016).Google Scholar
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579--2605. http: //jmlr.org/papers/v9/vandermaaten08a.htmlGoogle Scholar
Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8769--8778.Google ScholarCross Ref
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).Google Scholar
Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, and Stella X. Yu. 2020. Orthogonal Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Rui Wang, Robin Walters, and Rose Yu. 2020. Incorporating symmetry into deep dynamics models for improved generalization. arXiv preprint arXiv:2002.03061 (2020).Google Scholar

Index Terms

Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
      1. Imaging
2. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Regularization
    2. Machine learning approaches
      1. Neural networks

Recommendations

Identification of evolutionarily conserved Momordica charantia microRNAs using computational approach and its utility in phylogeny analysis

Display Omitted Twenty four pre-miRNAs were reported from Momordica charantia developing seed transcriptome.Phylogeny analysis with binary data were unreliable.Identified miRNAs held sequence conservation in mature miRNAs.Phylogeny analysis of pre-miRNA ...
Read More
Lateral gene transfer in phylogeny of azoreductase enzyme

This paper attempts to reconstruct the phylogeny of azoreductase enzyme from different organisms and compare it with the small subunit rRNA-based phylogeny of the organisms. The two phylogenies were found to be incongruent, indicating several events of ...
Read More
Animal Actin Phylogeny and RNA Secondary Structure Study

Animal actin is a diverse and evolutionarily ancient protein. Actin genes and their corresponding protein sequences were used to infer phylogenetic affiliations. The study indicated that several species appear to be polyphyletic and several unrelated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
knowledge-guided machine learning
morphology
neural networks
phylogeny
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 494
  Total Downloads
- Downloads (Last 12 months)494
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Identification of evolutionarily conserved Momordica charantia microRNAs using computational approach and its utility in phylogeny analysis

Lateral gene transfer in phylogeny of azoreductase enzyme

Animal Actin Phylogeny and RNA Secondary Structure Study