Abstract
The World Wide Web is a repository of huge data that are the web pages. The web pages are acquired using a query given by the user. The web pages may sometimes be unstructured and unequal. The main objective of the study is information extraction and summarization using ontology. The system proposes a new method named as Structural Semantic Domain Ontology (SSDO) for effective information retrieval. The proposed system automatically extracts the unstructured information from the repository and stores it in the search buffer. The information extraction will be performed using domain ontology. The main disadvantage of the existing system is, the information which is extracted from various sources is not aligned properly. The system may fail to know, where the exact information is located on the website. The current proposal overcomes the above problem by adopting the technologies that are named as pair alignment, top-down alignment, and loop structure algorithms. The proposed system will acquire things such as if the user needs to know any data, then the user will type the detail known as a label. Then the web page will extract the information with a proper description and additional details.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Suresh Babu A, Premchand P, Govardhan A. Record-level information extraction from a web page based on visual features. Int J Comput Technol Electron Eng. (IJCTEE). 2012;2 2:99–105. ISSN 2249-6343.
Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: SIGMOD 2003, pp. 337–348, San Diego, CA, 9–12 June 2003.
Su W, Wang J, Lochovsky FH. ODE: ontology-assisted data extraction. ACM Trans Database Syst. 2009;34 2. Article 12, Publication date: June 2009.
Chen K, Zhang F, He FL. Extracting data records based on global schema. Appl Mech Mater. (AMM). 2010;20–23:553–558.
Bing L, Lam W, Gu Y. Towards a unified solution: data record region detection and segmentation. In: CIKM’11, Glasgow, Scotland, UK, 24–28 Oct 2011.
Su W, Wang J, Lochovsky FH, Liu Y. Combining tag and value similarity for data extraction and alignment. IEEE Trans Knowl. Data Eng. 2012;24 7:1186–1200.
Deepika J. Non-duplicate data extraction in web databases by combining tag and value similarity. Int J Adv Inform Sci Technol. (IJAIST). 2013;9 9:16–22. ISSN: 2319:2682.
Jude Victor M, John Aravindhar D, Dheepa V. Web data extraction and alignment. Int J Sci Res. (IJSR). 2013;2 3:129–132. India Online ISSN: 2319‐7064.
Manonmani K, Kalidass M. Automated data extraction and arrangement using segentation based tag and value resemblance analysis. Int J Comput Sci Manag Res. 2013;2 4:2211–2216. ISSN 2278-733X.
da Costa MG, Zhiguo J. Web structure mining: an introduction. In: Proceedings of the 2005 IEEE International Conference on Information Acquisition, Hong Kong and Macau, China, 27 June–July 3 2005.
Oro E, Ruffolo, M. Sila: a spatial instance learning approach for deep webpages. Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.
Ruiz EJ, Hristidis V, Ipeirotis PG. Facilitating document annotation using content and querying value. IEEE Trans Knowl Data Eng. 2014;26 2:336–349.
Vinod Kumar R, Kumar Somayajula SP. Automatic template extraction from heterogeneous web pages. Int J Adv Res Comput Sci Softw Eng. 2012;2 8:408–418. ISSN: 2277 128X,.
Baldonado M, Chang C-CK, Gravano L, Paepcke A. The stanford digital library metadata architecture. Int J Digit Libr. 1997;1:108–21.
Bruce KB, Cardelli L, Pierce BC. Comparing object encodings. In: Abadi M, Ito T editors. Theoretical aspects of computer software. Lecture notes in computer science, vol. 1281. Berlin: Springer; 1997. pp. 415–438.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Deepa, R., Chezian, R.M. (2016). An Involuntary Data Extraction and Information Summarization Expending Ontology. In: Dash, S., Bhaskar, M., Panigrahi, B., Das, S. (eds) Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing, vol 394. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2656-7_6
Download citation
DOI: https://doi.org/10.1007/978-81-322-2656-7_6
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2654-3
Online ISBN: 978-81-322-2656-7
eBook Packages: EngineeringEngineering (R0)