Skip to main content

The Arrowsmith Project: 2005 Status Report

  • Conference paper
Discovery Science (DS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

Abstract

In the 1980s, Don Swanson proposed the concept of “undiscovered public knowledge,” and published several examples in which two disparate literatures (i.e., sets of articles having no papers in common, no authors in common, and few cross-citations) nevertheless held complementary pieces of knowledge that, when brought together, made compelling and testable predictions about potential therapies for human disorders. In the 1990s, Don and I published more predictions together and created a computer-assisted search strategy (“Arrowsmith”). At first, the so-called one-node search was emphasized, in which one begins with a single literature (e.g., that dealing with a disease) and searches for a second unknown literature having complementary knowledge (e.g. that dealing with potential therapies). However, we soon realized that the two-node search is better aligned to the information practices of most biomedical investigators: in this case, the user chooses two literatures and then seeks to identify meaningful links between them. Could typical biomedical investigators learn to carry out Arrowsmith analyses? Would they find routine occasions for using such a sophisticated tool? Would they uncover significant links that affect their experiments? Four years ago, we initiated a project to answer these questions, working with several neuroscience field testers. Initially we expected that investigators would spend several days learning how to carry out searches, and would spend several days analyzing each search. Instead, we completely re-designed the user interface, the back-end databases, and the methods of processing linking terms, so that investigators could use Arrowsmith without any tutorial at all, and requiring only minutes to carry out a search. The Arrowsmith Project now hosts a suite of free, public tools. It has launched new research spanning medical informatics, genomics and social informatics, and has, indeed, assisted investigators in formulating new experiments, with direct impact on basic science and neurological diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Swanson, D.R.: Fish oil, Raynaud’s Syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)

    Google Scholar 

  2. Swanson, D.R.: Undiscovered public knowledge. Library Q 56, 103–118 (1986)

    Article  Google Scholar 

  3. Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. JASIS 38, 228–233 (1987)

    Article  Google Scholar 

  4. Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988)

    Google Scholar 

  5. Smalheiser, N.R., Swanson, D.R.: Assessing a gap in the biomedical literature: magnesium deficiency & neurologic disease. Neurosci. Res. Commun. 15, 1–9 (1994)

    Google Scholar 

  6. Smalheiser, N.R., Swanson, D.R.: Linking estrogen to Alzheimer’s Disease: an informatics approach. Neurology 47, 809–810 (1996)

    Google Scholar 

  7. Smalheiser, N.R., Swanson, D.R.: Indomethacin and Alzheimer s Disease. Neurology 46, 583 (1996)

    Google Scholar 

  8. Smalheiser, N.R., Swanson, D.R.: Calcium-independent phospholipase A2 and schizophrenia. Arch. Gen. Psychiat. 55, 752–753 (1998)

    Article  Google Scholar 

  9. Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intelligence 91, 183–203 (1997)

    Article  MATH  Google Scholar 

  10. Smalheiser, N.R., Swanson, D.R.: Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 57, 149–153 (1998)

    Article  Google Scholar 

  11. Smalheiser, N.R.: Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689–693 (2001)

    Article  Google Scholar 

  12. Swanson, D.R., Smalheiser, N.R., Bookstein, A.: Information discovery from complementary literatures: categorizing viruses as potential weapons. JASIST 52, 797–812 (2001)

    Article  Google Scholar 

  13. Weeber, M., Vos, R., Baayen, R.H.: Using concepts in literature-based discovery: Simulating Swanson’s raynaud - fish oil and migraine - magnesium discoveries. JASIST 52, 548–557 (2001)

    Article  Google Scholar 

  14. Weeber, M., Vos, R., Klein, H., De Jong-Van Den Berg, L.T., Aronson, A.R., Molema, G.: Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. JAMIA 10, 252–259 (2003)

    Google Scholar 

  15. Torvik, V.I., Triantaphyllou, E.: Guided Inference of Nested Monotone Boolean Functions. Information Sciences 151, 171–200 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  16. Torvik, V.I., Triantaphyllou, E.: Discovering rules that govern monotone phenomena. In: Triantaphyllou, Felici (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing Series, Ch. 4, pp. 149–192. Springer, Heidelberg (2005) (in press)

    Google Scholar 

  17. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc AMIA Symp., pp. 17–21 (2001)

    Google Scholar 

  18. Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The Unified Medical Language System. Methods Inf Med. 32(4), 281–291 (1993) Related Articles, Links

    Google Scholar 

  19. Tanabe, L., Wilbur, W.J.: Generation of a large gene/protein lexicon by morphological pattern analysis. J. Bioinform Comput Biol. 1(4), 611–626 (2004)

    Article  Google Scholar 

  20. Torvik, V.I., Weeber, M., Swanson, D.R., Smalheiser, N.R.: A probabilistic similarity metric for MEDLINE records: a model for author name disambiguation. JASIST 56(2), 140–158 (2005)

    Article  Google Scholar 

  21. Smalheiser, N.R., Perkins, G.A., Jones, S.: Guidelines for negotiating scientific collaborations. PLoS Biology 3(6), e217 (2005)

    Article  Google Scholar 

  22. Palmer, C.L., Cragin, M.H., Hogan, T.P.: Information at the Intersections of Discovery: Case Studies in Neuroscience. In: Proc. ASIST annual meeting, pp. 448–455 (2004)

    Google Scholar 

  23. Kostoff, R.N., Block, J.A., Stump, J.A., Pfeil, K.M.: Information content in MEDLINE record fields. Int. J. Med Inform. 73(6), 515–527 (2004)

    Article  Google Scholar 

  24. Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: abstracts, sentences, or phrases? In: Pac. Symp. Biocomput., pp. 326–337 (2002)

    Google Scholar 

  25. Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatic 4, 20 (2003)

    Google Scholar 

  26. Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1214, 1216–1217 (1999)

    Google Scholar 

  27. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 5(1), 147 (2004)

    Article  Google Scholar 

  28. Divoli, A., Attwood, T.: BioIE: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)

    Article  Google Scholar 

  29. Chen, H., Martinez, J., Ng, T.D., Schatz, B.R.: A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. JASIST 48(1), 17–31 (1997)

    Article  Google Scholar 

  30. Lindsay, R.K., Gordon, M.D.: Literature-based discovery by lexical statistics. JASIS 50, 574–587 (1999)

    Article  Google Scholar 

  31. Gordon, M.D., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49, 674–685 (1998)

    Article  Google Scholar 

  32. Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005)

    Article  Google Scholar 

  33. Srinivasan, P.: Text Mining: Generating Hypotheses from MEDLINE. JASIST 55(5), 396–413 (2004)

    Article  Google Scholar 

  34. Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3), 389–398 (2004)

    Article  Google Scholar 

  35. Wren, J.D., Garner, H.R.: Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20, 191–198 (2004)

    Article  Google Scholar 

  36. Wren, J.D.: Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5(1), 145 (2004)

    Article  Google Scholar 

  37. Pratt, W., Yetisgen-Yildiz, M.: LitLinker: Capturing Connections across the Biomedical Literature. In: Proceedings of the International Conference on Knowledge Capture (K-Cap 2003), Florida, October 2003, pp. 105–112 (2003)

    Google Scholar 

  38. Hearst, M.A.: Untangling text data mining. In: Proc. Assoc. Comp. Ling. (1999)

    Google Scholar 

  39. Smalheiser, N.R.: EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues. Genome Biology 4, 403 (2003)

    Article  Google Scholar 

  40. Smalheiser, N.R., Torvik, V.I.: A population-based statistical approach identifies parameters characteristic of human microRNA-mRNA interactions. BMC Bioinformatics 5, 139 (2004)

    Article  Google Scholar 

  41. Smalheiser, N.R., Torvik, V.I.: Mammalian microRNAs derived from genomic repeats. Trends in Genetics 21(6), 322–326 (2005)

    Article  Google Scholar 

  42. Smalheiser, N.R., Torvik, V.I.: Complications in mammalian microRNA target prediction. In: Ying, S.-Y. (ed.) MicroRNA: Protocols. Methods in Molecular Biology. Humana Press (2005) (to be published)

    Google Scholar 

  43. Lugli, G., Larson, J., Martone, M.E., Jones, Y., Smalheiser, N.P.: Dicer and eIF2c are enriched at postsynaptic densities in adult mouse brain and are modified by neuronal activity in a calpain-dependent manner. J. Neurochem. (2005) (in press)

    Google Scholar 

  44. Smalheiser, N.R.: Informatics and hypothesis-driven research. EMBO Reports 3, 702 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smalheiser, N.R. (2005). The Arrowsmith Project: 2005 Status Report. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_5

Download citation

  • DOI: https://doi.org/10.1007/11563983_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29230-2

  • Online ISBN: 978-3-540-31698-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics