Skip to main content

An Involuntary Data Extraction and Information Summarization Expending Ontology

  • Conference paper
  • First Online:
Artificial Intelligence and Evolutionary Computations in Engineering Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 394))

  • 2574 Accesses

Abstract

The World Wide Web is a repository of huge data that are the web pages. The web pages are acquired using a query given by the user. The web pages may sometimes be unstructured and unequal. The main objective of the study is information extraction and summarization using ontology. The system proposes a new method named as Structural Semantic Domain Ontology (SSDO) for effective information retrieval. The proposed system automatically extracts the unstructured information from the repository and stores it in the search buffer. The information extraction will be performed using domain ontology. The main disadvantage of the existing system is, the information which is extracted from various sources is not aligned properly. The system may fail to know, where the exact information is located on the website. The current proposal overcomes the above problem by adopting the technologies that are named as pair alignment, top-down alignment, and loop structure algorithms. The proposed system will acquire things such as if the user needs to know any data, then the user will type the detail known as a label. Then the web page will extract the information with a proper description and additional details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Suresh Babu A, Premchand P, Govardhan A. Record-level information extraction from a web page based on visual features. Int J Comput Technol Electron Eng. (IJCTEE). 2012;2 2:99–105. ISSN 2249-6343.

    Google Scholar 

  2. Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: SIGMOD 2003, pp. 337–348, San Diego, CA, 9–12 June 2003.

    Google Scholar 

  3. Su W, Wang J, Lochovsky FH. ODE: ontology-assisted data extraction. ACM Trans Database Syst. 2009;34 2. Article 12, Publication date: June 2009.

    Google Scholar 

  4. Chen K, Zhang F, He FL. Extracting data records based on global schema. Appl Mech Mater. (AMM). 2010;20–23:553–558.

    Google Scholar 

  5. Bing L, Lam W, Gu Y. Towards a unified solution: data record region detection and segmentation. In: CIKM’11, Glasgow, Scotland, UK, 24–28 Oct 2011.

    Google Scholar 

  6. Su W, Wang J, Lochovsky FH, Liu Y. Combining tag and value similarity for data extraction and alignment. IEEE Trans Knowl. Data Eng. 2012;24 7:1186–1200.

    Google Scholar 

  7. Deepika J. Non-duplicate data extraction in web databases by combining tag and value similarity. Int J Adv Inform Sci Technol. (IJAIST). 2013;9 9:16–22. ISSN: 2319:2682.

    Google Scholar 

  8. Jude Victor M, John Aravindhar D, Dheepa V. Web data extraction and alignment. Int J Sci Res. (IJSR). 2013;2 3:129–132. India Online ISSN: 2319‐7064.

    Google Scholar 

  9. Manonmani K, Kalidass M. Automated data extraction and arrangement using segentation based tag and value resemblance analysis. Int J Comput Sci Manag Res. 2013;2 4:2211–2216. ISSN 2278-733X.

    Google Scholar 

  10. da Costa MG, Zhiguo J. Web structure mining: an introduction. In: Proceedings of the 2005 IEEE International Conference on Information Acquisition, Hong Kong and Macau, China, 27 June–July 3 2005.

    Google Scholar 

  11. Oro E, Ruffolo, M. Sila: a spatial instance learning approach for deep webpages. Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.

    Google Scholar 

  12. Ruiz EJ, Hristidis V, Ipeirotis PG. Facilitating document annotation using content and querying value. IEEE Trans Knowl Data Eng. 2014;26 2:336–349.

    Google Scholar 

  13. Vinod Kumar R, Kumar Somayajula SP. Automatic template extraction from heterogeneous web pages. Int J Adv Res Comput Sci Softw Eng. 2012;2 8:408–418. ISSN: 2277 128X,.

    Google Scholar 

  14. Baldonado M, Chang C-CK, Gravano L, Paepcke A. The stanford digital library metadata architecture. Int J Digit Libr. 1997;1:108–21.

    Article  Google Scholar 

  15. Bruce KB, Cardelli L, Pierce BC. Comparing object encodings. In: Abadi M, Ito T editors. Theoretical aspects of computer software. Lecture notes in computer science, vol. 1281. Berlin: Springer; 1997. pp. 415–438.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Deepa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Deepa, R., Chezian, R.M. (2016). An Involuntary Data Extraction and Information Summarization Expending Ontology. In: Dash, S., Bhaskar, M., Panigrahi, B., Das, S. (eds) Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing, vol 394. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2656-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2656-7_6

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2654-3

  • Online ISBN: 978-81-322-2656-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics