Loading [MathJax]/extensions/TeX/euler_ieee.js
CBDIR: Fast and effective content based document Information Retrieval system | IEEE Conference Publication | IEEE Xplore

CBDIR: Fast and effective content based document Information Retrieval system


Abstract:

The continuing growth of information overflow has made it hard to obtain valuable information on the web. In this trend, the need for effective Information Retrieval (IR)...Show More

Abstract:

The continuing growth of information overflow has made it hard to obtain valuable information on the web. In this trend, the need for effective Information Retrieval (IR) technique has been increased. Although document data contain much more abundant information, users can retrieve necessary information only from the title and description in conventional web services. In order to meet the demands for fast and accurate retrieval of valuable information, we propose a fast and effective content-based document information retrieval system that retrieves the information from the actual content of a document. The proposed method is based on a topic model of Latent Dirichlet Allocation that is used to extract major keywords for a given document. The main contributions of our system are the increased flexibility, effectiveness, and fast retrieval of information. Our system can easily communicate with existing web service through the standard JSON format. In addition, we increase the speed of information retrieval by using NoSQL based database system with inverted indexing and B-tree based indexing. We validate the performance of our system on real data collected from the SlideShare service. The proposed system shows better retrieval performance over the existing IR system.
Date of Conference: 28 June 2015 - 01 July 2015
Date Added to IEEE Xplore: 27 July 2015
Electronic ISBN:978-1-4799-8679-8
Conference Location: Las Vegas, NV

Contact IEEE to Subscribe

References

References is not available for this document.