Information and Relation Extraction for Semantic Annotation of eBook Texts

Uddin, Ashraf; Piryani, Rajesh; Singh, Vivek Kumar

doi:10.1007/978-3-319-01778-5_22

Ashraf Uddin⁶,
Rajesh Piryani⁶ &
Vivek Kumar Singh⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 235))

1746 Accesses
3 Citations

Abstract

This paper presents our algorithmic approach for information and relation extraction from unstructured texts (such as from eBook sections or webpages), performing other useful analytics on the text, and automatically generating a semantically meaningful structure (RDF schema). Our algorithmic formulation parses the unstructured text from eBooks and identifies key concepts described in the eBook along with relationship between the concepts. The extracted information is then used for four purposes: (a) for generating some computed metadata about the text source (such as readability of an eBook), (b) generate a concept profile for each distinct part of text, (c) identifying and plotting relationship between key concepts described in the text, and (d) to generate RDF representation for the text source. We have done our experiments on eBook texts from Computer Science domain; however, the approach can be applied to work on different forms of text in other domains as well. The results are not only useful for concept based tagging and navigation of unstructured text documents (such as eBook) but can also be used to design a comprehensive and sophisticated learning recommendation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
Google Scholar
Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Data Mining for Improving Textbooks. SIGKDD Explorations 13(2), 7–19 (2011)
Article Google Scholar
Agrawal, R., Gollapudi, S., Kenthapadi, K., Srivastava, N., Velu, R.: Enriching textbooks through data mining. In: ACM DEV (2010)
Google Scholar
Banko, M.: Open Information Extraction for the Web. Ph. D. dissertation, University of Washington (2009)
Google Scholar
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceedings ACL 2008, pp. 28–36 (2008)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings IJCAI (2007)
Google Scholar
Bitton, D., Faerber, F., Haas, L., Shanmugasundaram, J.: One platform for mining structured and unstructured data: dream or reality? In: Proceedings 32nd VLDB, pp. 1261–1262 (2006)
Google Scholar
Brin, S.: Extracting patterns and relations from the world wide web. In: Proceedings of International Workshop in the World Wide Web and Databases, pp. 172–183 (1998)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open Information Extraction: the Second Generation. In: Proceedings 22nd IJCAI, pp. 3–10 (2011)
Google Scholar
GuoDong, Z., Jian, S., Jie, Z., Min, Z.: Exploring various knowledge in relation extraction. In: Proceedings ACL 2005, pp. 427–434 (2005)
Google Scholar
Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using Factual Density to Measure Informativeness of Web Documents. In: Proceedings of the 19th Nordic Conference on Computational Linguistics (NODALIDA). Linkoping University Electronic Press, Oslo (2013)
Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1) (1995)
Google Scholar
Kambhatla, N.: Combining lexical, syntactic and semantic features with maximum entropy models. In: Proceedings 22nd ACL (2004)
Google Scholar
Lent, B., Agarawal, R., Srikant, R.: Discovering trends in text databases. In: Proceedings KDD (1997)
Google Scholar
Piskorski, J., Yangarber, R.: Multi-source, Multilingual Information Extraction and Summarization, Theory and Applications of Natural Language Processing. In: Poibeau, T., et al. (eds.) Information Extraction: Past, Present and Future. Introductory Survey, Springer, Heidelberg (2012)
Google Scholar
Singh, V.K., Piryani, R., Uddin, A.: An eBook-based eResource Recommender System. In: Proceedings 5th International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India. LNCS. Springer (2013c)
Google Scholar
Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment Analysis of Movie Reviews and Blog Posts: Evaluating SentiWordNet with different Linguistic Features and Scoring Schemes. In: Proceedings of 2013 IEEE International Advanced Computing Conference. IEEE Press, Ghaziabad-India (2013a)
Google Scholar
Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment Analysis of Movie Reviews: A new Feature-based Heuristic for Aspect-Level sentiment classification. In: Proceedings of the International Multi Conference on Automation, Computing, control, Communication and Compressed Sensing. IEEE Press (2013b)
Google Scholar
Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: Proceedings 48th ACL, pp. 118–127 (2010)
Google Scholar
Zhila, A., Gelbukh, A.: Comparison of Open Information Extraction for English and Spanish. In: 19th Annual International Conference Dialog 2013, Bekasovo, Russia, pp. 714–722 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, South Asian University, New Delhi, India
Ashraf Uddin, Rajesh Piryani & Vivek Kumar Singh

Authors

Ashraf Uddin
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Piryani
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technopark Campus Trivandrum, Indian Inst. of Information Technology and Management – Kerala (IIITM-K), Kerala, India
Sabu M. Thampi
Machine Intelligence Research Labs (MIR Labs), Auburn, USA
Ajith Abraham
Indian Statistical Institute, Kolkata, India
Sankar Kumar Pal
Department of Computer Science School of Science, University of Salamanca, Salamanca, Spain
Juan Manuel Corchado Rodriguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uddin, A., Piryani, R., Singh, V.K. (2014). Information and Relation Extraction for Semantic Annotation of eBook Texts. In: Thampi, S., Abraham, A., Pal, S., Rodriguez, J. (eds) Recent Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 235. Springer, Cham. https://doi.org/10.1007/978-3-319-01778-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-01778-5_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01777-8
Online ISBN: 978-3-319-01778-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics