Skip to main content

Filtering and Sophisticated Data Processing for Web Information Gathering

  • Conference paper
Rough Sets and Intelligent Systems Paradigms (RSEISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4585))

  • 1212 Accesses

Abstract

Mismatch and overload are two fundamental issues regarding the efficiency of Web information gathering. To provide a satisfactory solution, this paper presents a Web information gathering system that encapsulates two phases: the filtering and sophisticated data processing. The objective of the filtering is to quickly filter out most irrelevant data in order to avoid mismatch. The phase of the sophisticated data processing can use more sophisticated techniques without carefully considering time complexities. The second phase is for solving the problem of the information overload.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE95, pp. 3–14

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  3. Chang, C.H., Hsu, C.C.: Enabling concept-based relevance feedback for information retrieval on the WWW. IEEE Transactions on Knowledge and Data Engineering 11(4), 595–609 (1999)

    Article  Google Scholar 

  4. Garofalakis, M.N., Rastogi, R., Seshadri, S., Shim, K.: Data mining and the Web: past, present and future. In: WIDM99, pp. 43–47

    Google Scholar 

  5. Hammouda, K.M., Kamel, M.: SPhrase-based document similarity based on an index graph model. In: ICDM02, pp. 203–210

    Google Scholar 

  6. Hull, D.A., Roberston, S.: The TREC-8 filtering track final report. In: TREC-8 (1999)

    Google Scholar 

  7. Jones, K.S.: Information retrieval and artificial intelligence. Artificial Intelligence 114(1-2), 257–281 (1999)

    Article  MATH  Google Scholar 

  8. Li, Y., Zhang, C., Zhang, S.: Cooperative strategy for Web data mining and cleaning. Applied Artificial Intelligence 17(5-6), 443–460 (2003)

    Article  Google Scholar 

  9. Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)

    Article  Google Scholar 

  10. Li, Y., Yang, W., Xu, Y.: Multi-Tier Granule Mining for Representations of Multidimensional Association Rules. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 953–958. Springer, Heidelberg (2006)

    Google Scholar 

  11. Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)

    Article  MathSciNet  Google Scholar 

  12. Li, Y., Zhong, N.: Mining Rough Association from Text Documents. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 368–377. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Li, Y., Zhong, N.: Rough Association Rule Mining in Text Documents for Acquiring Web User Information Needs. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI06, pp. 226–232 (2006)

    Google Scholar 

  14. Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: ICDM03. 3rd IEEE International Conference on Data Mining, pp. 593–596 (2003)

    Google Scholar 

  15. Madria, S.M., Bhowmick, S.S., Ng, W.K., Lim, E.-P.: Research issues in Web data mining. In: Zaki, M.J., Ho, C.-T. (eds.) Large-Scale Parallel Data Mining. LNCS (LNAI), vol. 1759, pp. 303–312. Springer, Heidelberg (2000)

    Google Scholar 

  16. Mobasher, B., Dai, H., Luo, T., Sun, Y., Zhu, J.: Combining Web usage and content mining for more effective personalization. In: International Conference on Ecommerce and Web Technologies (2000)

    Google Scholar 

  17. Pal, S.K., Talwar, V.: Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)

    Article  Google Scholar 

  18. Robertson, S., Hull, D.A.: The TREC-9 filtering track final report, TREC-9 (2000)

    Google Scholar 

  19. Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - From yesterday’s news to today’s language resources. In: International Conference on Language Resources and Evaluation (2002)

    Google Scholar 

  20. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage pattern from Web data. SIGKDD Explorations 1(2), 1–12 (2000)

    Article  Google Scholar 

  21. Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: SDM2003, pp. 166–177 (2003)

    Google Scholar 

  22. Wu, S.-T., Li, Y., Xu, Y.: Deploying Approaches for Pattern Refinement in Text Mining. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1157–1161. Springer, Heidelberg (2006)

    Google Scholar 

  23. Wu, S.-T., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern taxonomy extraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI04), China, pp. 242–248 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marzena Kryszkiewicz James F. Peters Henryk Rybinski Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Zhong, N., Zhou, X., Wu, ST. (2007). Filtering and Sophisticated Data Processing for Web Information Gathering. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_85

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73451-2_85

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73450-5

  • Online ISBN: 978-3-540-73451-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics