Abstract
This paper presents our algorithmic approach for information and relation extraction from unstructured texts (such as from eBook sections or webpages), performing other useful analytics on the text, and automatically generating a semantically meaningful structure (RDF schema). Our algorithmic formulation parses the unstructured text from eBooks and identifies key concepts described in the eBook along with relationship between the concepts. The extracted information is then used for four purposes: (a) for generating some computed metadata about the text source (such as readability of an eBook), (b) generate a concept profile for each distinct part of text, (c) identifying and plotting relationship between key concepts described in the text, and (d) to generate RDF representation for the text source. We have done our experiments on eBook texts from Computer Science domain; however, the approach can be applied to work on different forms of text in other domains as well. The results are not only useful for concept based tagging and navigation of unstructured text documents (such as eBook) but can also be used to design a comprehensive and sophisticated learning recommendation system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Data Mining for Improving Textbooks. SIGKDD Explorations 13(2), 7–19 (2011)
Agrawal, R., Gollapudi, S., Kenthapadi, K., Srivastava, N., Velu, R.: Enriching textbooks through data mining. In: ACM DEV (2010)
Banko, M.: Open Information Extraction for the Web. Ph. D. dissertation, University of Washington (2009)
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceedings ACL 2008, pp. 28–36 (2008)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings IJCAI (2007)
Bitton, D., Faerber, F., Haas, L., Shanmugasundaram, J.: One platform for mining structured and unstructured data: dream or reality? In: Proceedings 32nd VLDB, pp. 1261–1262 (2006)
Brin, S.: Extracting patterns and relations from the world wide web. In: Proceedings of International Workshop in the World Wide Web and Databases, pp. 172–183 (1998)
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open Information Extraction: the Second Generation. In: Proceedings 22nd IJCAI, pp. 3–10 (2011)
GuoDong, Z., Jian, S., Jie, Z., Min, Z.: Exploring various knowledge in relation extraction. In: Proceedings ACL 2005, pp. 427–434 (2005)
Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using Factual Density to Measure Informativeness of Web Documents. In: Proceedings of the 19th Nordic Conference on Computational Linguistics (NODALIDA). Linkoping University Electronic Press, Oslo (2013)
Justeson, J.S., Katz, S.M.: Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1) (1995)
Kambhatla, N.: Combining lexical, syntactic and semantic features with maximum entropy models. In: Proceedings 22nd ACL (2004)
Lent, B., Agarawal, R., Srikant, R.: Discovering trends in text databases. In: Proceedings KDD (1997)
Piskorski, J., Yangarber, R.: Multi-source, Multilingual Information Extraction and Summarization, Theory and Applications of Natural Language Processing. In: Poibeau, T., et al. (eds.) Information Extraction: Past, Present and Future. Introductory Survey, Springer, Heidelberg (2012)
Singh, V.K., Piryani, R., Uddin, A.: An eBook-based eResource Recommender System. In: Proceedings 5th International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India. LNCS. Springer (2013c)
Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment Analysis of Movie Reviews and Blog Posts: Evaluating SentiWordNet with different Linguistic Features and Scoring Schemes. In: Proceedings of 2013 IEEE International Advanced Computing Conference. IEEE Press, Ghaziabad-India (2013a)
Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment Analysis of Movie Reviews: A new Feature-based Heuristic for Aspect-Level sentiment classification. In: Proceedings of the International Multi Conference on Automation, Computing, control, Communication and Compressed Sensing. IEEE Press (2013b)
Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: Proceedings 48th ACL, pp. 118–127 (2010)
Zhila, A., Gelbukh, A.: Comparison of Open Information Extraction for English and Spanish. In: 19th Annual International Conference Dialog 2013, Bekasovo, Russia, pp. 714–722 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Uddin, A., Piryani, R., Singh, V.K. (2014). Information and Relation Extraction for Semantic Annotation of eBook Texts. In: Thampi, S., Abraham, A., Pal, S., Rodriguez, J. (eds) Recent Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 235. Springer, Cham. https://doi.org/10.1007/978-3-319-01778-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-01778-5_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01777-8
Online ISBN: 978-3-319-01778-5
eBook Packages: EngineeringEngineering (R0)