Accelerating science with human-aware artificial intelligence

Sourati, Jamshid; Evans, James A.

doi:10.1038/s41562-023-01648-z

Article
Published: 13 July 2023

Accelerating science with human-aware artificial intelligence

Nature Human Behaviour volume 7, pages 1682–1696 (2023)Cite this article

5784 Accesses
4 Citations
95 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence (AI) models trained on published scientific findings have been used to invent valuable materials and targeted therapies, but they typically ignore the human scientists who continually alter the landscape of discovery. Here we show that incorporating the distribution of human expertise by training unsupervised models on simulated inferences that are cognitively accessible to experts dramatically improves (by up to 400%) AI prediction of future discoveries beyond models focused on research content alone, especially when relevant literature is sparse. These models succeed by predicting human predictions and the scientists who will make them. By tuning human-aware AI to avoid the crowd, we can generate scientifically promising ‘alien’ hypotheses unlikely to be imagined or pursued without intervention until the distant future, which hold promise to punctuate scientific advance beyond questions currently pursued. By accelerating human discovery or probing its blind spots, human-aware AI enables us to move towards and beyond the contemporary scientific frontier.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Motivation and design of our approach to simulate human-accessible scientific inferences.**

**Fig. 2: Evaluating human-accessible discovery predictions against various baselines.**

**Fig. 3: A prediction example of progesterone as a COVID-19 therapy.**

**Fig. 4: The contribution of human expert awareness for predicting discoveries and discoverers.**

**Fig. 5: Motivation and design of our approach to generate complementary scientific predictions by avoiding human scientists.**

**Fig. 6: The wait time for published discoveries increases with human inaccessibility (higher β values).**

**Fig. 7: Precision in predicting human discovery falls before a comparable drop in theoretical expectations.**

**Fig. 8: Complementary AI predictions outperform human discoveries.**

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Data availability

The DOIs of papers used for the electrochemical properties together with the PubMed identifiers of the MEDLINE entries used in our experiments can be found in our GitHub repository: https://github.com/jsourati/accelerate-discoveries. The abstracts of papers for electrochemical properties could not be shared due to copyright issues, but MEDLINE abstracts are accessible through their identifiers from the PubMed website. Source data are provided with this paper.

Code availability

All code for our algorithms can be found in the following GitHub repository: https://github.com/jsourati/accelerate-discoveries.

References

Khadherbhi, S. R. & Babu, K. S. Big data search space reduction based on user perspective using map reduce. Int. J. Adv. Technol. Innov. Res. 7, 3642–3647 (2015).
Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Article CAS PubMed Google Scholar
Smalley, E. AI-powered drug discovery captures pharma interest. Nat. Biotechnol. 35, 604–605 (2017).
Article CAS PubMed Google Scholar
Teruya, E., Takeuchi, T., Morita, H., Hayashi, T. & Ono, K. ARTS: autonomous research topic selection system using word embeddings and network analysis. Mach. Learn. Sci. Technol. 3, 025005 (2022).
Article Google Scholar
Shi, F., Foster, J. G. & Evans, J. A. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc. Netw. 43, 73–85 (2015).
Article Google Scholar
Singer, U., Radinsky, K. & Horvitz, E. On biases of attention in scientific discovery. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa1036 (2020).
Article PubMed PubMed Central Google Scholar
Tversky, A. & Kahneman, D. Availability: a heuristic for judging frequency and probability. Cogn. Psychol. 5, 207–232 (1973).
Article Google Scholar
Evans, J. S. B. T. Bias in Human Reasoning: Causes and Consequences (Psychology Press, 1989).
Ehrlinger, J., Readinger, W. O. & Kim, B. in Encyclopedia of Mental Health 2nd edn (ed. Friedman, H. S.) 5–12 (Academic Press, 2016).
Chadwick, A. T. & Segall, M. D. Overcoming psychological barriers to good discovery decisions. Drug Discov. Today 15, 561–569 (2010).
Article PubMed Google Scholar
Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112, 14569–14574 (2015).
Article CAS PubMed PubMed Central Google Scholar
Mikolov, T., Yih, W.-T. & Zweig, G. Linguistic regularities in continuous space word representations. In Proc. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Vanderwende, L. et al.) 746–751 (Association for Computational Linguistics, 2013).
Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Macskassy, S. et al.) 701–710 (Association for Computing Machinery, 2014).
Chitra, U. & Raphael, B. Random walks on hypergraphs with edge-dependent vertex weights. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 1172–1181 (PMLR, 2019).
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
Article CAS PubMed Google Scholar
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Article CAS PubMed Google Scholar
Swanson, D. R. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986).
Article CAS PubMed Google Scholar
Swanson, D. R. Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78, 29–37 (1990).
CAS PubMed PubMed Central Google Scholar
Weeber, M., Klein, H., de Jong-van den Berg, L. T. W. & Vos, R. Using concepts in literature-based discovery: simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52, 548–557 (2001).
Article CAS Google Scholar
Evans, J. & Rzhetsky, A. Machine science. Science 329, 399–400 (2010).
Article CAS PubMed PubMed Central Google Scholar
Digiacomo, R. A., Kremer, J. M. & Shah, D. M. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86, 158–164 (1989).
Article CAS PubMed Google Scholar
Chiu, H.-Y., Yeh, T.-H., Huang, Y.-C. & Chen, P.-Y. Effects of intravenous and oral magnesium on reducing migraine: a meta-analysis of randomized controlled trials. Pain. Physician 19, E97–E112 (2016).
PubMed Google Scholar
Chu, J. S. G. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118, e2021636118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Davis, A. P. et al. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Res. 47, D948–D954 (2019).
Article CAS PubMed Google Scholar
Morselli Gysi, D. et al. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc. Natl Acad. Sci. USA 118, e2025581118 (2021).
Article PubMed PubMed Central Google Scholar
Ghandehari, S. et al. Progesterone in addition to standard of care versus standard of care alone in the treatment of men hospitalized with moderate to severe COVID-19: a randomized, controlled pilot trial. Chest https://doi.org/10.1016/j.chest.2021.02.024 (2021).
Article PubMed Google Scholar
Estradiol and progesterone in hospitalized COVID-19 patients https://clinicaltrials.gov/ct2/show/NCT04865029 (2022).
Mehdizadeh Dehkordi, A., Zebarjadi, M., He, J. & Tritt, T. M. Thermoelectric power factor: enhancement mechanisms and strategies for higher performance thermoelectric materials. Mater. Sci. Eng. R. Rep. 97, 1–22 (2015).
Article Google Scholar
Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4, 170085 (2017).
Article CAS PubMed PubMed Central Google Scholar
Smidt, T. E., Mack, S. A., Reyes-Lillo, S. E., Jain, A. & Neaton, J. B. An automatically curated first-principles database of ferroelectrics. Sci. Data 7, 72 (2020).
Article PubMed PubMed Central Google Scholar
Belikov, A. V., Rzhetsky, A. & Evans, J. Prediction of robust scientific facts from literature. Nat. Mach. Intell. 4, 445–454 (2022).
Article Google Scholar
Sourati, J. & Evans, J. Complementary artificial intelligence designed to augment human discovery. Preprint at arXiv https://doi.org/10.48550/arXiv.2207.00902 (2022).
Xu, J. et al. Building a PubMed knowledge graph. Sci. Data 7, 205 (2020).
Article PubMed PubMed Central Google Scholar
Torvik, V. I. & Smalheiser, N. R. Author name disambiguation in MEDLINE. ACM Trans. Knowl. Discov. Data 3, 1–29 (2009).
Article Google Scholar
Ammar, W. et al. Construction of the literature graph in Semantic Scholar. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 3 (Industry Papers) 84–91 (Association for Computational Linguistics, 2018).
Ong, S. P. et al. Python Materials Genomics (pymatgen): a robust, open-source Python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).
Article CAS Google Scholar
Sun, Y., Han, J., Yan, X., Yu, P. S. & Wu, T. PathSim: meta path-based top-K similarity search in heterogeneous information networks. Proc. VLDB Endow. 4, 992–1003 (2011).
Article Google Scholar
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. KDD 2016, 855–864 (2016).
PubMed PubMed Central Google Scholar
Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 1025–1035 (Curran Associates, 2017).
Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.07308 (2016).
Coakley, C. W. Practical nonparametric statistics. J. Am. Stat. Assoc. 95, 332–333 (2000).
Article Google Scholar
Schaffer, R. Study examines progesterone to reduce inflammation in COVID-19. Healio—EndocrineToday https://www.healio.com/news/endocrinology/20200507/study-examines-progesterone-to-reduce-inflammation-in-covid19 (7 May 2020).

Download references

Acknowledgements

We thank our funders for their generous support: the National Science Foundation (grant no. 1829366), the Air Force Office of Scientific Research (grant nos. FA9550-19-1-0354 and FA9550-15-1-0162) and DARPA (grant no. HR00111820006). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank L. Barabasi and D. Morselli Gysi for helpful data related to their network-based forecast of COVID-19 drugs and vaccines with protein–protein interactions²⁵, and A. Jain, V. Tshitoyan and A. Dunn for sharing data and code to help replicate their work on unsupervised word embeddings and latent knowledge about materials science¹⁵. We also thank the participants of the Santa Fe Institute workshop ‘Foundations of Intelligence in Natural and Artificial Systems’, the University of Wisconsin at Madison’s HAMLET workshop and colleagues at the Knowledge Lab for helpful comments.

Author information

Authors and Affiliations

Department of Sociology, University of Chicago, Chicago, IL, USA
Jamshid Sourati & James A. Evans
Santa Fe Institute, Santa Fe, NM, USA
James A. Evans

Authors

Jamshid Sourati
View author publications
You can also search for this author in PubMed Google Scholar
James A. Evans
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S.: conceptualization, methodology, software, validation, investigation, writing—original draft and visualization. J.A.E.: conceptualization, methodology, writing—original draft, visualization and funding acquisition.

Corresponding author

Correspondence to James A. Evans.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Chao Min, Roger Guimerà and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Hypergraph-induced transition probabilities, and schematic of our experimental design.

(a-b) Sanity checks for our hypergraph-induced transition probability similarity metric: (a) Between author and conceptual nodes: Histogram of similarities between nodes of two sets of authors and the conceptual node “coronavirus”. The two sets of authors include the authors of 5,000 randomly selected papers from journals Nature Medicine (dark purple) and Applied Optics (light purple) between 1990 and 2019. Similarities between the hypernodes comprise the logarithm of the average transition probabilities with one and two random walk steps. Histograms are plotted considering only non-zero transition probabilities: 92% of the authors of Nature Medicine (28,396 in total) but only 51% of the selected Applied Optics authors (18,530 in total) had non-zero similarity values. Average non-zero similarities associated with Nature Medicine authors (red dashed line) is nearly 5 times larger than that of Applied Optics authors (blue dashed line), implying that based on our hypergraph-induce similarity metric, authors publishing in Nature Medicine write papers much more relevant to coronavirus in comparison with those publishing in Applied Optics. (b) Between two conceptual nodes: Similarities between conceptual keywords shown on the x-axis and “coronavirus”. Similarities between the hypernodes are computed as the average transition probabilities with one and two intermediate nodes. Terms and symptoms known to be more relevant to coronavirus have larger average transition probabilities. (c) Schematic of our experimental design: Starting and ending dates of experiments are shown. For energy-related functions and 100 human diseases, we used the beginning of 2001 as prediction year and the end of 2018 as a single evaluation date (V1). For COVID-19, the prediction year is the beginning of 2020, and we cumulatively reported monthly precision values until July of 2021 (V1 to V19).

Source data

Extended Data Fig. 2 Precision-Recall (PR) curves for human-accessible predictions.

Precision-Recall (PR) curves and area under the curves (AUCs) for various human-accessible predictions: energy-related material science properties, that is, thermoelectrics (a), ferroelectrics (b) and photovoltaics (c), therapies and vaccines for COVID-19 (d), and generic drug repurposing (e). Except for COVID-19, we only displayed the PR-AUC values for the selected prediction years skipping the PR curves themselves. Note that for Receiver Operating Curves (ROC) random predictions always result in AUC of 0.5, but the PR-AUC of the random baseline depends on the ratio of positive samples in the data.

Source data

Extended Data Fig. 3 Expert density calculataion.

Calculation of expert density between property (node P) and each material (node M). Density is defined as the Jaccard index between the set of authors who have published on the property (denoted by A_p) and those who have mentioned the material in their publications (denoted by A_M). The Jaccard formulation involves taking the ratio of the size of the intersection (that is, the number of overlapping authors) denoted by A_∩ to the size of the union of the two sets (that is, the total number of authors) denoted by A_P∪A_M.

Extended Data Fig. 4 Correlations between expert density and time to discovery.

Spearman correlation coefficients between the human expert density (Jaccard index) linking properties with materials and their date of discovery if discovered. Negative correlations imply that materials with higher expert densities are likely to be discovered earlier than others. These results were obtained with the prediction year set to 2001 for energy-related properties and drug repurposing applications, and set to the beginning of 2020 for COVID-19. Turquoise and red bars represent negative and positive correlations, respectively. For seven diseases in the CTD (shown in the bottom of the figure), all discoveries were established in a single year and therefore no correlation coefficients could be obtained. This is because we did not have accurate access to the month or day of discoveries in our database. Results indicate that energy-related properties and COVID-19 all post strong negative correlations. In the case of CTD database, 67 out of 100 diseases (that is, properties) showed statistically significant correlations, among which only one disease had a positive coefficient. The mean correlation coefficients across these 67 diseases was −0.18.

Source data

Extended Data Fig. 5 Distribution of human expert densities.

Distribution of human expert densities between discovery predictions and properties: (a) drug repurposing application (considering only the 67 diseases with statistically significant Spearman correlation coefficients, see Extended Data Fig. 3); (b-d) energy-related materials science properties, that is, thermoelectricity, ferroelectricity and photovoltaic capacity, respectively; and (e) therapies and vaccines for COVID-19. Curves measure normalized histograms over the logarithm of human expert densities plotted by fitting a Beta distribution over expert densities for predictions. Solid and dashed vertical lines represent mean values for corresponding densities. It is clear that the distribution of human expert densities for hypergarph-induced metrics (transition probability and deepwalk-based similarity) are concentrated around larger Jaccard index values than word embedding models tracing content alone. In content models, all estimated densities peak at zero (0<a < 1<b, with a,b shape parameters of Beta distributions). CTD diseases are sorted by average expert similarity between them and the complete pool of drugs.

Source data

Extended Data Fig. 6 Precision-Recall area under the curve for predicting human discoverers.

Precision-Recall Area Under the Curve (PR-AUC) for predicting the human experts who will discover (discoverers of) materials possessing the following specific properties: (a) thermoelectrics, (b) ferroelectrics, and (c) photovoltaics. Materials selected were among True Positive discovery predictions of our deepwalk-based predictor (α=1). Our evaluation compares scores assigned to candidates and actual discovering experts who ultimately discovered and published the property associated with True Positives. We developed a deepwalk-based scoring function for this purpose. Expert candidates we considered here are those sampled at least once in deepwalk trajectories, produced over our five-year hypergraph. For a discovered material, scores were computed based on the proximity of experts to both property and material. An expert is a good candidate discoverer if she is close (in cosine similarity) to both property and material nodes in the embedding space. Discovered associations whose discoverers were not present in sampled deepwalk trajectories were ignored. In order to summarize the two similarities and generate a single set of human expert predictions, we ranked experts based on their proximity to the property and the material and combined the two rankings using average aggregation. This ranking was used as the final expert score in our PR-AUC computations. We compared the log-PR-AUC of this algorithm with a random selection of experts and also with a curve simulating an imaginary method whose log-PR-AUC is five times higher than the random baseline. Results reveal that predictions were notably superior to random expert selection for all electrochemical properties.

Source data

Extended Data Fig. 7 Decaying discoverability in complementary predictions.

Illustration of decaying discoverability for predictions as β, the parameter for human expert avoidance, increases. Discoverability of predictions is measured through computing the precision metric, that is, their overlapping percentage with respect to actual discoveries made after prediction year. Decreasing precision curves and their highly negative Pearson correlation coefficients are shown for (a) thermoelectricity, (b) ferroelectricity, (c) photovoltaics and (d) COVID-19. We also visualize these statistics for the remaining human diseases with a scatterplot of their Pearson correlation coefficients (e).

Source data

Extended Data Fig. 8 Discoverability and scientific merit among drug repurposing predictions.

Discoverability and scientific merit for predictions made with varying β values, our parameter for human expert avoidance, in research that repurposes drugs to treat human disease. (a) Precision values for predictions generated with eight levels of β and computed for all 400 human diseases we considered (except COVID-19). Diseases are sorted in terms of the number of relevant drugs. (b) Average theoretical scores measured through protein-protein similarity between diseases and candidate drugs for predictions generated with the same β values. We compute protein-based theoretical scores for 176 diseases out of 400 total cases (44%). In both subfigures, horizontal lines show average values across all diseases.

Source data

Extended Data Table 1 High-frequency MeSH terms appearing in COVID-19 random walks

Full size table

Extended Data Table 2 True positive predictions for our expert-aware deepwalk algorithm and the word2vec baseline for COVID-19

Full size table

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–5 and Table 1.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 2

Precision values of predictions.

Source Data Fig. 3

Rank ratio of true positive predictions made by our deepwalk algorithm and not by the baseline.

Source Data Fig. 4

Precision shifts in discovery predictions due to adding authors; precision of predicting discoverers of materials possessing a certain property.

Source Data Fig. 6

Average discovery wait times.

Source Data Fig. 7

Overlapping percentages (precision) and average theoretical scores for predictions generated with different β values.

Source Data Fig. 8

Expectation gaps; joint probability of undiscoverability and plausibility.

Source Data Extended Data Fig. 1

Sanity checks on our hypergraph-induced transition probability similarity metric: between authors and a conceptual node, and between two conceptual nodes.

Source Data Extended Data Fig. 2

Precision–recall curves and area under the curves for predictions made for different properties.

Source Data Extended Data Fig. 4

Spearman correlation coefficients between expert density of properties and materials and their date of discovery.

Source Data Extended Data Fig. 5

Parameters of beta distributions fitted to expert densities of different properties.

Source Data Extended Data Fig. 6

Precision–recall area under the curve for predicting discoverers of a property in a particular material.

Source Data Extended Data Fig. 7

Discoverability (precision) for predictions for different β values.

Source Data Extended Data Fig. 8

Discoverability and scientific merit (plausibility) for predictions made with different β values.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sourati, J., Evans, J.A. Accelerating science with human-aware artificial intelligence. Nat Hum Behav 7, 1682–1696 (2023). https://doi.org/10.1038/s41562-023-01648-z

Download citation

Received: 17 August 2022
Accepted: 02 June 2023
Published: 13 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1038/s41562-023-01648-z

This article is cited by

Machine culture
- Levin Brinkmann
- Fabian Baumann
- Iyad Rahwan
Nature Human Behaviour (2023)
Hypotheses devised by AI could find ‘blind spots’ in research
- Matthew Hutson
Nature (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links