Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Accelerating science with human-aware artificial intelligence

Abstract

Artificial intelligence (AI) models trained on published scientific findings have been used to invent valuable materials and targeted therapies, but they typically ignore the human scientists who continually alter the landscape of discovery. Here we show that incorporating the distribution of human expertise by training unsupervised models on simulated inferences that are cognitively accessible to experts dramatically improves (by up to 400%) AI prediction of future discoveries beyond models focused on research content alone, especially when relevant literature is sparse. These models succeed by predicting human predictions and the scientists who will make them. By tuning human-aware AI to avoid the crowd, we can generate scientifically promising ‘alien’ hypotheses unlikely to be imagined or pursued without intervention until the distant future, which hold promise to punctuate scientific advance beyond questions currently pursued. By accelerating human discovery or probing its blind spots, human-aware AI enables us to move towards and beyond the contemporary scientific frontier.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Motivation and design of our approach to simulate human-accessible scientific inferences.
Fig. 2: Evaluating human-accessible discovery predictions against various baselines.
Fig. 3: A prediction example of progesterone as a COVID-19 therapy.
Fig. 4: The contribution of human expert awareness for predicting discoveries and discoverers.
Fig. 5: Motivation and design of our approach to generate complementary scientific predictions by avoiding human scientists.
Fig. 6: The wait time for published discoveries increases with human inaccessibility (higher β values).
Fig. 7: Precision in predicting human discovery falls before a comparable drop in theoretical expectations.
Fig. 8: Complementary AI predictions outperform human discoveries.

Similar content being viewed by others

Data availability

The DOIs of papers used for the electrochemical properties together with the PubMed identifiers of the MEDLINE entries used in our experiments can be found in our GitHub repository: https://github.com/jsourati/accelerate-discoveries. The abstracts of papers for electrochemical properties could not be shared due to copyright issues, but MEDLINE abstracts are accessible through their identifiers from the PubMed website. Source data are provided with this paper.

Code availability

All code for our algorithms can be found in the following GitHub repository: https://github.com/jsourati/accelerate-discoveries.

References

  1. Khadherbhi, S. R. & Babu, K. S. Big data search space reduction based on user perspective using map reduce. Int. J. Adv. Technol. Innov. Res. 7, 3642–3647 (2015).

    Google Scholar 

  2. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Smalley, E. AI-powered drug discovery captures pharma interest. Nat. Biotechnol. 35, 604–605 (2017).

    Article  CAS  PubMed  Google Scholar 

  4. Teruya, E., Takeuchi, T., Morita, H., Hayashi, T. & Ono, K. ARTS: autonomous research topic selection system using word embeddings and network analysis. Mach. Learn. Sci. Technol. 3, 025005 (2022).

    Article  Google Scholar 

  5. Shi, F., Foster, J. G. & Evans, J. A. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc. Netw. 43, 73–85 (2015).

    Article  Google Scholar 

  6. Singer, U., Radinsky, K. & Horvitz, E. On biases of attention in scientific discovery. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa1036 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Tversky, A. & Kahneman, D. Availability: a heuristic for judging frequency and probability. Cogn. Psychol. 5, 207–232 (1973).

    Article  Google Scholar 

  8. Evans, J. S. B. T. Bias in Human Reasoning: Causes and Consequences (Psychology Press, 1989).

  9. Ehrlinger, J., Readinger, W. O. & Kim, B. in Encyclopedia of Mental Health 2nd edn (ed. Friedman, H. S.) 5–12 (Academic Press, 2016).

  10. Chadwick, A. T. & Segall, M. D. Overcoming psychological barriers to good discovery decisions. Drug Discov. Today 15, 561–569 (2010).

    Article  PubMed  Google Scholar 

  11. Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112, 14569–14574 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Mikolov, T., Yih, W.-T. & Zweig, G. Linguistic regularities in continuous space word representations. In Proc. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Vanderwende, L. et al.) 746–751 (Association for Computational Linguistics, 2013).

  13. Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Macskassy, S. et al.) 701–710 (Association for Computing Machinery, 2014).

  14. Chitra, U. & Raphael, B. Random walks on hypergraphs with edge-dependent vertex weights. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 1172–1181 (PMLR, 2019).

  15. Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).

    Article  CAS  PubMed  Google Scholar 

  16. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Article  CAS  PubMed  Google Scholar 

  17. Swanson, D. R. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986).

    Article  CAS  PubMed  Google Scholar 

  18. Swanson, D. R. Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78, 29–37 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Weeber, M., Klein, H., de Jong-van den Berg, L. T. W. & Vos, R. Using concepts in literature-based discovery: simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52, 548–557 (2001).

    Article  CAS  Google Scholar 

  20. Evans, J. & Rzhetsky, A. Machine science. Science 329, 399–400 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Digiacomo, R. A., Kremer, J. M. & Shah, D. M. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86, 158–164 (1989).

    Article  CAS  PubMed  Google Scholar 

  22. Chiu, H.-Y., Yeh, T.-H., Huang, Y.-C. & Chen, P.-Y. Effects of intravenous and oral magnesium on reducing migraine: a meta-analysis of randomized controlled trials. Pain. Physician 19, E97–E112 (2016).

    PubMed  Google Scholar 

  23. Chu, J. S. G. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118, e2021636118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Davis, A. P. et al. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Res. 47, D948–D954 (2019).

    Article  CAS  PubMed  Google Scholar 

  25. Morselli Gysi, D. et al. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc. Natl Acad. Sci. USA 118, e2025581118 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ghandehari, S. et al. Progesterone in addition to standard of care versus standard of care alone in the treatment of men hospitalized with moderate to severe COVID-19: a randomized, controlled pilot trial. Chest https://doi.org/10.1016/j.chest.2021.02.024 (2021).

    Article  PubMed  Google Scholar 

  27. Estradiol and progesterone in hospitalized COVID-19 patients https://clinicaltrials.gov/ct2/show/NCT04865029 (2022).

  28. Mehdizadeh Dehkordi, A., Zebarjadi, M., He, J. & Tritt, T. M. Thermoelectric power factor: enhancement mechanisms and strategies for higher performance thermoelectric materials. Mater. Sci. Eng. R. Rep. 97, 1–22 (2015).

    Article  Google Scholar 

  29. Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4, 170085 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Smidt, T. E., Mack, S. A., Reyes-Lillo, S. E., Jain, A. & Neaton, J. B. An automatically curated first-principles database of ferroelectrics. Sci. Data 7, 72 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Belikov, A. V., Rzhetsky, A. & Evans, J. Prediction of robust scientific facts from literature. Nat. Mach. Intell. 4, 445–454 (2022).

    Article  Google Scholar 

  32. Sourati, J. & Evans, J. Complementary artificial intelligence designed to augment human discovery. Preprint at arXiv https://doi.org/10.48550/arXiv.2207.00902 (2022).

  33. Xu, J. et al. Building a PubMed knowledge graph. Sci. Data 7, 205 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Torvik, V. I. & Smalheiser, N. R. Author name disambiguation in MEDLINE. ACM Trans. Knowl. Discov. Data 3, 1–29 (2009).

    Article  Google Scholar 

  35. Ammar, W. et al. Construction of the literature graph in Semantic Scholar. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 3 (Industry Papers) 84–91 (Association for Computational Linguistics, 2018).

  36. Ong, S. P. et al. Python Materials Genomics (pymatgen): a robust, open-source Python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).

    Article  CAS  Google Scholar 

  37. Sun, Y., Han, J., Yan, X., Yu, P. S. & Wu, T. PathSim: meta path-based top-K similarity search in heterogeneous information networks. Proc. VLDB Endow. 4, 992–1003 (2011).

    Article  Google Scholar 

  38. Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. KDD 2016, 855–864 (2016).

    PubMed  PubMed Central  Google Scholar 

  39. Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 1025–1035 (Curran Associates, 2017).

  40. Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.07308 (2016).

  41. Coakley, C. W. Practical nonparametric statistics. J. Am. Stat. Assoc. 95, 332–333 (2000).

    Article  Google Scholar 

  42. Schaffer, R. Study examines progesterone to reduce inflammation in COVID-19. Healio—EndocrineToday https://www.healio.com/news/endocrinology/20200507/study-examines-progesterone-to-reduce-inflammation-in-covid19 (7 May 2020).

Download references

Acknowledgements

We thank our funders for their generous support: the National Science Foundation (grant no. 1829366), the Air Force Office of Scientific Research (grant nos. FA9550-19-1-0354 and FA9550-15-1-0162) and DARPA (grant no. HR00111820006). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank L. Barabasi and D. Morselli Gysi for helpful data related to their network-based forecast of COVID-19 drugs and vaccines with protein–protein interactions25, and A. Jain, V. Tshitoyan and A. Dunn for sharing data and code to help replicate their work on unsupervised word embeddings and latent knowledge about materials science15. We also thank the participants of the Santa Fe Institute workshop ‘Foundations of Intelligence in Natural and Artificial Systems’, the University of Wisconsin at Madison’s HAMLET workshop and colleagues at the Knowledge Lab for helpful comments.

Author information

Authors and Affiliations

Authors

Contributions

J.S.: conceptualization, methodology, software, validation, investigation, writing—original draft and visualization. J.A.E.: conceptualization, methodology, writing—original draft, visualization and funding acquisition.

Corresponding author

Correspondence to James A. Evans.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Chao Min, Roger Guimerà and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Hypergraph-induced transition probabilities, and schematic of our experimental design.

(a-b) Sanity checks for our hypergraph-induced transition probability similarity metric: (a) Between author and conceptual nodes: Histogram of similarities between nodes of two sets of authors and the conceptual node “coronavirus”. The two sets of authors include the authors of 5,000 randomly selected papers from journals Nature Medicine (dark purple) and Applied Optics (light purple) between 1990 and 2019. Similarities between the hypernodes comprise the logarithm of the average transition probabilities with one and two random walk steps. Histograms are plotted considering only non-zero transition probabilities: 92% of the authors of Nature Medicine (28,396 in total) but only 51% of the selected Applied Optics authors (18,530 in total) had non-zero similarity values. Average non-zero similarities associated with Nature Medicine authors (red dashed line) is nearly 5 times larger than that of Applied Optics authors (blue dashed line), implying that based on our hypergraph-induce similarity metric, authors publishing in Nature Medicine write papers much more relevant to coronavirus in comparison with those publishing in Applied Optics. (b) Between two conceptual nodes: Similarities between conceptual keywords shown on the x-axis and “coronavirus”. Similarities between the hypernodes are computed as the average transition probabilities with one and two intermediate nodes. Terms and symptoms known to be more relevant to coronavirus have larger average transition probabilities. (c) Schematic of our experimental design: Starting and ending dates of experiments are shown. For energy-related functions and 100 human diseases, we used the beginning of 2001 as prediction year and the end of 2018 as a single evaluation date (V1). For COVID-19, the prediction year is the beginning of 2020, and we cumulatively reported monthly precision values until July of 2021 (V1 to V19).

Source data

Extended Data Fig. 2 Precision-Recall (PR) curves for human-accessible predictions.

Precision-Recall (PR) curves and area under the curves (AUCs) for various human-accessible predictions: energy-related material science properties, that is, thermoelectrics (a), ferroelectrics (b) and photovoltaics (c), therapies and vaccines for COVID-19 (d), and generic drug repurposing (e). Except for COVID-19, we only displayed the PR-AUC values for the selected prediction years skipping the PR curves themselves. Note that for Receiver Operating Curves (ROC) random predictions always result in AUC of 0.5, but the PR-AUC of the random baseline depends on the ratio of positive samples in the data.

Source data

Extended Data Fig. 3 Expert density calculataion.

Calculation of expert density between property (node P) and each material (node M). Density is defined as the Jaccard index between the set of authors who have published on the property (denoted by Ap) and those who have mentioned the material in their publications (denoted by AM). The Jaccard formulation involves taking the ratio of the size of the intersection (that is, the number of overlapping authors) denoted by A to the size of the union of the two sets (that is, the total number of authors) denoted by APAM.

Extended Data Fig. 4 Correlations between expert density and time to discovery.

Spearman correlation coefficients between the human expert density (Jaccard index) linking properties with materials and their date of discovery if discovered. Negative correlations imply that materials with higher expert densities are likely to be discovered earlier than others. These results were obtained with the prediction year set to 2001 for energy-related properties and drug repurposing applications, and set to the beginning of 2020 for COVID-19. Turquoise and red bars represent negative and positive correlations, respectively. For seven diseases in the CTD (shown in the bottom of the figure), all discoveries were established in a single year and therefore no correlation coefficients could be obtained. This is because we did not have accurate access to the month or day of discoveries in our database. Results indicate that energy-related properties and COVID-19 all post strong negative correlations. In the case of CTD database, 67 out of 100 diseases (that is, properties) showed statistically significant correlations, among which only one disease had a positive coefficient. The mean correlation coefficients across these 67 diseases was −0.18.

Source data

Extended Data Fig. 5 Distribution of human expert densities.

Distribution of human expert densities between discovery predictions and properties: (a) drug repurposing application (considering only the 67 diseases with statistically significant Spearman correlation coefficients, see Extended Data Fig. 3); (b-d) energy-related materials science properties, that is, thermoelectricity, ferroelectricity and photovoltaic capacity, respectively; and (e) therapies and vaccines for COVID-19. Curves measure normalized histograms over the logarithm of human expert densities plotted by fitting a Beta distribution over expert densities for predictions. Solid and dashed vertical lines represent mean values for corresponding densities. It is clear that the distribution of human expert densities for hypergarph-induced metrics (transition probability and deepwalk-based similarity) are concentrated around larger Jaccard index values than word embedding models tracing content alone. In content models, all estimated densities peak at zero (0<a < 1<b, with a,b shape parameters of Beta distributions). CTD diseases are sorted by average expert similarity between them and the complete pool of drugs.

Source data

Extended Data Fig. 6 Precision-Recall area under the curve for predicting human discoverers.

Precision-Recall Area Under the Curve (PR-AUC) for predicting the human experts who will discover (discoverers of) materials possessing the following specific properties: (a) thermoelectrics, (b) ferroelectrics, and (c) photovoltaics. Materials selected were among True Positive discovery predictions of our deepwalk-based predictor (α=1). Our evaluation compares scores assigned to candidates and actual discovering experts who ultimately discovered and published the property associated with True Positives. We developed a deepwalk-based scoring function for this purpose. Expert candidates we considered here are those sampled at least once in deepwalk trajectories, produced over our five-year hypergraph. For a discovered material, scores were computed based on the proximity of experts to both property and material. An expert is a good candidate discoverer if she is close (in cosine similarity) to both property and material nodes in the embedding space. Discovered associations whose discoverers were not present in sampled deepwalk trajectories were ignored. In order to summarize the two similarities and generate a single set of human expert predictions, we ranked experts based on their proximity to the property and the material and combined the two rankings using average aggregation. This ranking was used as the final expert score in our PR-AUC computations. We compared the log-PR-AUC of this algorithm with a random selection of experts and also with a curve simulating an imaginary method whose log-PR-AUC is five times higher than the random baseline. Results reveal that predictions were notably superior to random expert selection for all electrochemical properties.

Source data

Extended Data Fig. 7 Decaying discoverability in complementary predictions.

Illustration of decaying discoverability for predictions as β, the parameter for human expert avoidance, increases. Discoverability of predictions is measured through computing the precision metric, that is, their overlapping percentage with respect to actual discoveries made after prediction year. Decreasing precision curves and their highly negative Pearson correlation coefficients are shown for (a) thermoelectricity, (b) ferroelectricity, (c) photovoltaics and (d) COVID-19. We also visualize these statistics for the remaining human diseases with a scatterplot of their Pearson correlation coefficients (e).

Source data

Extended Data Fig. 8 Discoverability and scientific merit among drug repurposing predictions.

Discoverability and scientific merit for predictions made with varying β values, our parameter for human expert avoidance, in research that repurposes drugs to treat human disease. (a) Precision values for predictions generated with eight levels of β and computed for all 400 human diseases we considered (except COVID-19). Diseases are sorted in terms of the number of relevant drugs. (b) Average theoretical scores measured through protein-protein similarity between diseases and candidate drugs for predictions generated with the same β values. We compute protein-based theoretical scores for 176 diseases out of 400 total cases (44%). In both subfigures, horizontal lines show average values across all diseases.

Source data

Extended Data Table 1 High-frequency MeSH terms appearing in COVID-19 random walks
Extended Data Table 2 True positive predictions for our expert-aware deepwalk algorithm and the word2vec baseline for COVID-19

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–5 and Table 1.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 2

Precision values of predictions.

Source Data Fig. 3

Rank ratio of true positive predictions made by our deepwalk algorithm and not by the baseline.

Source Data Fig. 4

Precision shifts in discovery predictions due to adding authors; precision of predicting discoverers of materials possessing a certain property.

Source Data Fig. 6

Average discovery wait times.

Source Data Fig. 7

Overlapping percentages (precision) and average theoretical scores for predictions generated with different β values.

Source Data Fig. 8

Expectation gaps; joint probability of undiscoverability and plausibility.

Source Data Extended Data Fig. 1

Sanity checks on our hypergraph-induced transition probability similarity metric: between authors and a conceptual node, and between two conceptual nodes.

Source Data Extended Data Fig. 2

Precision–recall curves and area under the curves for predictions made for different properties.

Source Data Extended Data Fig. 4

Spearman correlation coefficients between expert density of properties and materials and their date of discovery.

Source Data Extended Data Fig. 5

Parameters of beta distributions fitted to expert densities of different properties.

Source Data Extended Data Fig. 6

Precision–recall area under the curve for predicting discoverers of a property in a particular material.

Source Data Extended Data Fig. 7

Discoverability (precision) for predictions for different β values.

Source Data Extended Data Fig. 8

Discoverability and scientific merit (plausibility) for predictions made with different β values.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sourati, J., Evans, J.A. Accelerating science with human-aware artificial intelligence. Nat Hum Behav 7, 1682–1696 (2023). https://doi.org/10.1038/s41562-023-01648-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-023-01648-z

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics