Skip to main content

A Text Mining-Based Framework for Constructing an RDF-Compliant Biodiversity Knowledge Repository

  • Conference paper
  • First Online:
Book cover Information Management and Big Data (SIMBig 2015, SIMBig 2016)

Abstract

In our aim to make the information encapsulated by biodiversity literature more accessible and searchable, we have developed a text mining-based framework for automatically transforming text into a structured knowledge repository. A text mining workflow employing information extraction techniques, i.e., named entity recognition and relation extraction, was implemented in the Argo platform and was subsequently applied on biodiversity literature to extract structured information. The resulting annotations were stored in a repository following the emerging Open Annotation standard, thus promoting interoperability with external applications. Accessible as a SPARQL endpoint, the repository facilitates knowledge discovery over a huge amount of biodiversity literature by retrieving annotations matching user-specified queries. We present some use cases to illustrate the types of queries that the knowledge repository currently accommodates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.biodiversitylibrary.org.

  2. 2.

    http://argo.nactem.ac.uk.

  3. 3.

    http://wiki.miningbiodiversity.org/doku.php?id=guidelines.

  4. 4.

    http://www.openannotation.org/spec/core.

  5. 5.

    https://jena.apache.org/documentation/tdb.

  6. 6.

    https://jena.apache.org/documentation/fuseki2.

  7. 7.

    http://nactem.ac.uk/copious-demo/annotations/sparql.

  8. 8.

    https://www.getpostman.com.

References

  1. Flora Phenotype Ontology. https://bioportal.bioontology.org/ontologies/FLOPO. Accessed 20 Sep 2016

  2. Gazetteer. http://bioportal.bioontology.org/ontologies/GAZ. Accessed 20 Sep 2016

  3. LingPipe. http://alias-i.com/lingpipe/. Accessed 20 Sep 2016

  4. NERsuite: a named entity recognition toolkit. http://nersuite.nlplab.org/. Accessed 20 Sep 2016

  5. Plant Trait Ontology. http://www.obofoundry.org/ontology/to.html. Accessed 20 Sep 2016

  6. Species 2000 & ITIS Catalogue of Life. Digital resource, September 2016. www.catalogueoflife.org/col. Accessed 20 Sep 2016

  7. Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4(1), 43 (2013)

    Article  Google Scholar 

  8. Cui, H., Jiang, K., Sanyal, P.P.: From text to RDF triple store: an application for biodiversity literature. In: Proceedings of the Association for Information Science and Technology (ASIST 2010) (2010)

    Google Scholar 

  9. Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_29

    Chapter  Google Scholar 

  10. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  11. Miyao, Y., Tsujii, J.: Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34(1), 35–80 (2008)

    Article  MathSciNet  Google Scholar 

  12. Mungall, C.J., Torniai, C., Gkoutos, G.V., Lewis, S.E., Haendel, M.A.: Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13(1), R5 (2012)

    Article  Google Scholar 

  13. Parr, C., Sachs, J., Han, L., Wang, T.: RDF123 and spotter: tools for generating OWL and RDF for biodiversity data in spreadsheets and unstructured text. In: Proceedings of Biodiversity Information Standards Annual Conference (TDWG 2007) (2007)

    Google Scholar 

  14. Rak, R., Rowley, A., Carter, J., Batista-Navarro, R., Ananiadou, S.: Interoperability and customisation of annotation schemata in argo. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 3837–3842. European Language Resources Association (ELRA), May 2014

    Google Scholar 

  15. Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database 2012, bas010 (2012)

    Article  Google Scholar 

  16. Sanderson, R., Ciccarese, P., Van de Sompel, H.: Designing the w3c open annotation data model. In: Proceedings of the 5th Annual ACM Web Science Conference (WebSci 2013), pp. 366–375. ACM, New York (2013)

    Google Scholar 

  17. Stucky, B.J., Deck, J., Conlin, T., Ziemba, L., Cellinese, N., Guralnick, R.: The BiSciCol triplifier: bringing biodiversity data to the semantic web. BMC Bioinform. 15(1), 1–9 (2014)

    Article  Google Scholar 

  18. Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a robust part-of-speech tagger for biomedical text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005). doi:10.1007/11573036_36

    Chapter  Google Scholar 

Download references

Acknowledgments

We would like to thank Prof. Marilou Nicolas for her valuable inputs. This work is funded by the British Council [172722806 (COPIOUS)], and is partially supported by the Engineering and Physical Sciences Research Council [EP/1038099/1 (CDT)].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophia Ananiadou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Batista-Navarro, R., Zerva, C., Nguyen, N.T.H., Ananiadou, S. (2017). A Text Mining-Based Framework for Constructing an RDF-Compliant Biodiversity Knowledge Repository. In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig SIMBig 2015 2016. Communications in Computer and Information Science, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-319-55209-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55209-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55208-8

  • Online ISBN: 978-3-319-55209-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics