An Involuntary Data Extraction and Information Summarization Expending Ontology

Deepa, R.; Chezian, R. Manicka

doi:10.1007/978-81-322-2656-7_6

R. Deepa⁶ &
R. Manicka Chezian⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 394))

2574 Accesses

Abstract

The World Wide Web is a repository of huge data that are the web pages. The web pages are acquired using a query given by the user. The web pages may sometimes be unstructured and unequal. The main objective of the study is information extraction and summarization using ontology. The system proposes a new method named as Structural Semantic Domain Ontology (SSDO) for effective information retrieval. The proposed system automatically extracts the unstructured information from the repository and stores it in the search buffer. The information extraction will be performed using domain ontology. The main disadvantage of the existing system is, the information which is extracted from various sources is not aligned properly. The system may fail to know, where the exact information is located on the website. The current proposal overcomes the above problem by adopting the technologies that are named as pair alignment, top-down alignment, and loop structure algorithms. The proposed system will acquire things such as if the user needs to know any data, then the user will type the detail known as a label. Then the web page will extract the information with a proper description and additional details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Impact of Ontology on Databases

Web Data Extraction and Integration System for Search Engine Results

SemCrawl: Framework for Crawling Ontology Annotated Web Documents for Intelligent Information Retrieval

References

Suresh Babu A, Premchand P, Govardhan A. Record-level information extraction from a web page based on visual features. Int J Comput Technol Electron Eng. (IJCTEE). 2012;2 2:99–105. ISSN 2249-6343.
Google Scholar
Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: SIGMOD 2003, pp. 337–348, San Diego, CA, 9–12 June 2003.
Google Scholar
Su W, Wang J, Lochovsky FH. ODE: ontology-assisted data extraction. ACM Trans Database Syst. 2009;34 2. Article 12, Publication date: June 2009.
Google Scholar
Chen K, Zhang F, He FL. Extracting data records based on global schema. Appl Mech Mater. (AMM). 2010;20–23:553–558.
Google Scholar
Bing L, Lam W, Gu Y. Towards a unified solution: data record region detection and segmentation. In: CIKM’11, Glasgow, Scotland, UK, 24–28 Oct 2011.
Google Scholar
Su W, Wang J, Lochovsky FH, Liu Y. Combining tag and value similarity for data extraction and alignment. IEEE Trans Knowl. Data Eng. 2012;24 7:1186–1200.
Google Scholar
Deepika J. Non-duplicate data extraction in web databases by combining tag and value similarity. Int J Adv Inform Sci Technol. (IJAIST). 2013;9 9:16–22. ISSN: 2319:2682.
Google Scholar
Jude Victor M, John Aravindhar D, Dheepa V. Web data extraction and alignment. Int J Sci Res. (IJSR). 2013;2 3:129–132. India Online ISSN: 2319‐7064.
Google Scholar
Manonmani K, Kalidass M. Automated data extraction and arrangement using segentation based tag and value resemblance analysis. Int J Comput Sci Manag Res. 2013;2 4:2211–2216. ISSN 2278-733X.
Google Scholar
da Costa MG, Zhiguo J. Web structure mining: an introduction. In: Proceedings of the 2005 IEEE International Conference on Information Acquisition, Hong Kong and Macau, China, 27 June–July 3 2005.
Google Scholar
Oro E, Ruffolo, M. Sila: a spatial instance learning approach for deep webpages. Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.
Google Scholar
Ruiz EJ, Hristidis V, Ipeirotis PG. Facilitating document annotation using content and querying value. IEEE Trans Knowl Data Eng. 2014;26 2:336–349.
Google Scholar
Vinod Kumar R, Kumar Somayajula SP. Automatic template extraction from heterogeneous web pages. Int J Adv Res Comput Sci Softw Eng. 2012;2 8:408–418. ISSN: 2277 128X,.
Google Scholar
Baldonado M, Chang C-CK, Gravano L, Paepcke A. The stanford digital library metadata architecture. Int J Digit Libr. 1997;1:108–21.
Article Google Scholar
Bruce KB, Cardelli L, Pierce BC. Comparing object encodings. In: Abadi M, Ito T editors. Theoretical aspects of computer software. Lecture notes in computer science, vol. 1281. Berlin: Springer; 1997. pp. 415–438.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Department of Computer Science, NGM College, Pollachi, Coimbatore, India
R. Deepa & R. Manicka Chezian

Authors

R. Deepa
View author publications
You can also search for this author in PubMed Google Scholar
R. Manicka Chezian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Deepa .

Editor information

Editors and Affiliations

Electrical and Electronics Engineer, SRM Engineering College, Kattankulathur, Tamil Nadu, India
Subhransu Sekhar Dash
Electrical & Electronics Engineering, Velammal Engineering College, Chennai, Tamil Nadu, India
M. Arun Bhaskar
Dept Electrical & Electronics Engg, IIT Delhi, New Delhi, India
Bijaya Ketan Panigrahi
Indian Statistical Institute, Kolkata, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deepa, R., Chezian, R.M. (2016). An Involuntary Data Extraction and Information Summarization Expending Ontology. In: Dash, S., Bhaskar, M., Panigrahi, B., Das, S. (eds) Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing, vol 394. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2656-7_6

Download citation

DOI: https://doi.org/10.1007/978-81-322-2656-7_6
Published: 06 February 2016
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2654-3
Online ISBN: 978-81-322-2656-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Involuntary Data Extraction and Information Summarization Expending Ontology

Abstract

Access this chapter

Similar content being viewed by others

Impact of Ontology on Databases

Web Data Extraction and Integration System for Search Engine Results

SemCrawl: Framework for Crawling Ontology Annotated Web Documents for Intelligent Information Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Involuntary Data Extraction and Information Summarization Expending Ontology

Abstract

Access this chapter

Similar content being viewed by others

Impact of Ontology on Databases

Web Data Extraction and Integration System for Search Engine Results

SemCrawl: Framework for Crawling Ontology Annotated Web Documents for Intelligent Information Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation