Skip to main content

Extraction and Characterization of Citations in Scientific Papers

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 475))

Abstract

We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    PubMed ID Converter API: http://www.ncbi.nlm.nih.gov/pmc/tools/id-converter- api/

  2. 2.

    The D2RQ Platform is a system for accessing relational databases as RDF graphs: http://d2rq.org/

  3. 3.

    http://purl.org/spar/biro

References

  1. Bertin, M., Atanassova, I., Lariviere, V., Gingras, Y.: The distribution of references in scientific papers: an analysis of the IMRaD structure. In: Proceedings of the 14th ISSI Conference, pp. 591–603 (2013)

    Google Scholar 

  2. Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)

    Google Scholar 

  3. Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)

    Google Scholar 

  4. Shotton, D.: Cito, the citation typing ontology. J. Biomed. Semant. 1(Suppl 1), S6 (2010)

    Article  Google Scholar 

  5. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  6. Desclés, J.P.: Contextual exploration processing for discourse and automatic annotations of texts. In: FLAIRS Conference, pp. 281–284 (2006)

    Google Scholar 

  7. Bertin, M., Atanassova, I., Descles, J.P.: Automatic analysis of author judgment in scientific articles based on semantic annotation. In: 22nd International Florida Artificial Intelligence, Research Society Conference, Sanibel Island, Florida. AAAI Press (2009)

    Google Scholar 

  8. Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004 (2004)

    Google Scholar 

  9. Cyganiak, R., Bizer, C.: D2R server: a semantic web front-end to existing relational databases. In: XML Tage, 2006, pp. 171–173 (2006)

    Google Scholar 

  10. Shotton, D., Peroni, S.: DoCo, the document components ontology (2011)

    Google Scholar 

Download references

Acknowledgments

We thank Angelo Di Iorio at the Department of Computer Science and Engineering (DISI) of the University of Bologna for providing the gold standard and the evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Bertin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bertin, M., Atanassova, I. (2014). Extraction and Characterization of Citations in Scientific Papers. In: Presutti, V., et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-12024-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12024-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12023-2

  • Online ISBN: 978-3-319-12024-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics