Abstract
We describe how we developed and applied two annotation schemes for journal articles in the field of chemistry. The first involves the criteria for identifying a chemical named entity and assigning it a “type”, roughly speaking deciding whether it was a small molecular species, a process that a small molecular species might be involved in, an enzyme, or an adjective or a prefix. The second involves assigning these chemical named entities a “subtype” which describes the reference, for example whether “imidazole” refers to the imidazole molecule itself, the imidazole motif within a larger molecule, or any of a family of molecules bearing the imidazole motif. We also describe how these guidelines and the resulting corpora and software have subsequently been used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alias-i. Lingpipe 4.10 (2008). Accessed 11 Feb 2015
Batchelor, C.R., Corbett, P.T.: Semantic enrichment of journal articles using chemical named entity recognition. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 45–48, Prague, Czech Republic (2007)
Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9, S4 (2008). doi:10.1186/1471-2105-9-S11-S4
Corbett, P., Murray-Rust, P.: High-throughput identification of chemistry in life science texts. Lect. Notes Comput. Sci. 4216, 107–118 (2006)
Corbett, P., Batchelor, C., Teufel, S.: Annotation of chemical named entities. In: BioNLP 2007: Biological, Translational and Clinical Language Processing, pp. 57–64. Czech Republic, Prague (2007)
Corbett, P., Batchelor, C., Copestake, A.: Pyridines, pyridine and pyridine rings. In: Proceedings of Building and Evaluating Resources for Biomedical Text Mining at LREC2008, Marrakech, Morocco (2008)
Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9, e1002854 (2013)
Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Guedj, M., Ashburner, M.: ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36, D344–D350 (2008)
Gaizauskas, R., Demetriou, G., Artymiuk, P.J., Willett, P.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19, 135–143 (2003)
Jessop, D.M., Adams, S.F., Willighagen, E.I., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminformatics 3, 41 (2011)
Kidd, R.: Changing the face of scientific publishing. Integr. Biol. 1, 293 (2009)
Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: CHEMDNER: the drugs and chemical names extraction challenge. J. Cheminformatics 7(Suppl 1), S1 (2015)
Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Zhiyong, L., Leaman, R., Yanan, L., Ji, D., Lowe, D., Sayle, R., Batista-Navarro, R., Rak, R., Huber, T., Rocktaschel, T., Matos, S., Campos, D., Tang, B., Hua, X., Munkhdalai, T., Ryu, K., Ramanan, S.V., Nathan, S., Zitnik, S., Bajec, M., Weber, L., Irmer, M., Akhondi, S., Kors, J., Xu, S., An, X., Sikdar, U., Ekbal, A., Yoshioka, M., Dieb, T., Choi, M., Verspoor, K., Khabsa, M., Giles, C., Liu, H., Ravikumar, K., Lamurias, A., Couto, F., Dai, H.-J., Tsai, R., Ata, C., Can, T., Usie, A., Alves, R., Segura-Bedmar, I., Martinez, P., Oyarzabal, J., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminformatics 7(Suppl 1), S2 (2015)
Kulick, S., Bies, A., Liberman, M., Mandel, M., McDonald, R., Palmer, M., Schein, A., Ungar, L., Winters, S., White, P.: Integrated annotation for biomedical information extraction. In: HLT-NAACL 2004 Workshop: Biolink 2004, Linking Biological Literature, Ontologies and Databases, pp. 61–68 (2004)
Lowe, D.M., Corbett, P.T., Murray-Rust, P., Glen, R.C.: Chemical name to structure: opsin, an open source solution. J. Chem. Inf. Modell. 53, 739–753 (2011)
Ohta, T., Tateisi, Y., Kim, J.-D., Lee, S.-Z., Tsujii, J.: Genia corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Human Language Technology Conference (HLT 2002), San Diego, CA, USA (2002)
Rupp, C.J., Copestake, A., Corbett, P., Murray-Rust P., Siddharthan, A., Teufel, S., Waldron, B.: Language resources and chemical informatics. In: Proceedings of 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, Morocco (2008)
Savage, A.: Changes in mesh data structure. NLM Tech Bull. p. e2 (2000)
Vander Stouw, G.G., Naznitsky, I., Rush, J.E.: Procedures for converting systematic names of organic compounds into atom-bond connection tables. J. Chem. Doc. 7, 165–169 (1967)
Vander Stouw, G.G., Elliott, P.M., Isenberg, A.C.: Automated conversion of chemical substance names to atom-bond connection tables. J. Chem. Doc. 14, 185–193 (1974)
Teufel, S., Elhadad, N.: Collection and linguistic processing of a large-scale corpus of medical articles. In: Proceedings of the Third LREC (LREC2002), pp. 1214–1219 (2002)
Zhmurov, P.A., Sukhorukov, AYu., Chupakhin, V.I., Khomutova, Y.V., Ioffe, S.L., Tartakovsky, V.A.: Synthesis of PDE IV inhibitors: first asymmetric synthesis of two of GlaxoSmithKline’s highly potent Rolipram analogues. Org. Biomol. Chem. 11, 8082–8091 (2013)
Acknowledgements
We thank the UK eScience Programme and EPSRC (EP/C010035/1) for funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Batchelor, C., Corbett, P., Teufel, S. (2017). Case Study: Chemistry. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_33
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_33
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)