Extraction and Characterization of Citations in Scientific Papers

Bertin, Marc; Atanassova, Iana

doi:10.1007/978-3-319-12024-9_16

Extraction and Characterization of Citations in Scientific Papers

Marc Bertin¹⁰ &
Iana Atanassova¹⁰

Conference paper
First Online: 01 January 2014

827 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 475))

Abstract

We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
PubMed ID Converter API: http://www.ncbi.nlm.nih.gov/pmc/tools/id-converter- api/
2.
The D2RQ Platform is a system for accessing relational databases as RDF graphs: http://d2rq.org/
3.
http://purl.org/spar/biro

References

Bertin, M., Atanassova, I., Lariviere, V., Gingras, Y.: The distribution of references in scientific papers: an analysis of the IMRaD structure. In: Proceedings of the 14th ISSI Conference, pp. 591–603 (2013)
Google Scholar
Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)
Google Scholar
Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)
Google Scholar
Shotton, D.: Cito, the citation typing ontology. J. Biomed. Semant. 1(Suppl 1), S6 (2010)
Article Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Desclés, J.P.: Contextual exploration processing for discourse and automatic annotations of texts. In: FLAIRS Conference, pp. 281–284 (2006)
Google Scholar
Bertin, M., Atanassova, I., Descles, J.P.: Automatic analysis of author judgment in scientific articles based on semantic annotation. In: 22nd International Florida Artificial Intelligence, Research Society Conference, Sanibel Island, Florida. AAAI Press (2009)
Google Scholar
Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004 (2004)
Google Scholar
Cyganiak, R., Bizer, C.: D2R server: a semantic web front-end to existing relational databases. In: XML Tage, 2006, pp. 171–173 (2006)
Google Scholar
Shotton, D., Peroni, S.: DoCo, the document components ontology (2011)
Google Scholar

Download references

Acknowledgments

We thank Angelo Di Iorio at the Department of Computer Science and Engineering (DISI) of the University of Bologna for providing the gold standard and the evaluation.

Author information

Authors and Affiliations

CIRST, Université du Québec à Montréal, B.P. 8888, Succ. Centre-ville, Montreal, QC, H3C 3P8, Canada
Marc Bertin & Iana Atanassova

Authors

Marc Bertin
View author publications
You can also search for this author in PubMed Google Scholar
Iana Atanassova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Bertin .

Editor information

Editors and Affiliations

Semantic Technology Laboratory, ISTC-CNR, Rome, Italy
Valentina Presutti
Université Paris-Sorbonne,, Paris, France
Milan Stankovic
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Erik Cambria
Universidad Autónoma de Madrid, Madrid, Spain
Iván Cantador
University of Bologna, Bologna, Italy
Angelo Di Iorio
Polytechnic University of Bari, Bari, Italy
Tommaso Di Noia
University of Birmingham, Birmingham, United Kingdom
Christoph Lange
ISTC-CNR, Semantic Technology Laboratory, Rome, Italy
Diego Reforgiato Recupero
Elsevier B.V., Amsterdam, The Netherlands
Anna Tordai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bertin, M., Atanassova, I. (2014). Extraction and Characterization of Citations in Scientific Papers. In: Presutti, V., et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-12024-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-12024-9_16
Published: 04 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12023-2
Online ISBN: 978-3-319-12024-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics