Filtering and Sophisticated Data Processing for Web Information Gathering

Li, Yuefeng; Zhong, Ning; Zhou, Xujuan; Wu, Sheng-Tang

doi:10.1007/978-3-540-73451-2_85

Yuefeng Li¹,
Ning Zhong²,
Xujuan Zhou¹ &
…
Sheng-Tang Wu¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4585))

Included in the following conference series:

International Conference on Rough Sets and Intelligent Systems Paradigms

1212 Accesses

Abstract

Mismatch and overload are two fundamental issues regarding the efficiency of Web information gathering. To provide a satisfactory solution, this paper presents a Web information gathering system that encapsulates two phases: the filtering and sophisticated data processing. The objective of the filtering is to quickly filter out most irrelevant data in order to avoid mismatch. The phase of the sophisticated data processing can use more sophisticated techniques without carefully considering time complexities. The second phase is for solving the problem of the information overload.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE95, pp. 3–14
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Chang, C.H., Hsu, C.C.: Enabling concept-based relevance feedback for information retrieval on the WWW. IEEE Transactions on Knowledge and Data Engineering 11(4), 595–609 (1999)
Article Google Scholar
Garofalakis, M.N., Rastogi, R., Seshadri, S., Shim, K.: Data mining and the Web: past, present and future. In: WIDM99, pp. 43–47
Google Scholar
Hammouda, K.M., Kamel, M.: SPhrase-based document similarity based on an index graph model. In: ICDM02, pp. 203–210
Google Scholar
Hull, D.A., Roberston, S.: The TREC-8 filtering track final report. In: TREC-8 (1999)
Google Scholar
Jones, K.S.: Information retrieval and artificial intelligence. Artificial Intelligence 114(1-2), 257–281 (1999)
Article MATH Google Scholar
Li, Y., Zhang, C., Zhang, S.: Cooperative strategy for Web data mining and cleaning. Applied Artificial Intelligence 17(5-6), 443–460 (2003)
Article Google Scholar
Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)
Article Google Scholar
Li, Y., Yang, W., Xu, Y.: Multi-Tier Granule Mining for Representations of Multidimensional Association Rules. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 953–958. Springer, Heidelberg (2006)
Google Scholar
Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)
Article MathSciNet Google Scholar
Li, Y., Zhong, N.: Mining Rough Association from Text Documents. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 368–377. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, Y., Zhong, N.: Rough Association Rule Mining in Text Documents for Acquiring Web User Information Needs. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI06, pp. 226–232 (2006)
Google Scholar
Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: ICDM03. 3rd IEEE International Conference on Data Mining, pp. 593–596 (2003)
Google Scholar
Madria, S.M., Bhowmick, S.S., Ng, W.K., Lim, E.-P.: Research issues in Web data mining. In: Zaki, M.J., Ho, C.-T. (eds.) Large-Scale Parallel Data Mining. LNCS (LNAI), vol. 1759, pp. 303–312. Springer, Heidelberg (2000)
Google Scholar
Mobasher, B., Dai, H., Luo, T., Sun, Y., Zhu, J.: Combining Web usage and content mining for more effective personalization. In: International Conference on Ecommerce and Web Technologies (2000)
Google Scholar
Pal, S.K., Talwar, V.: Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)
Article Google Scholar
Robertson, S., Hull, D.A.: The TREC-9 filtering track final report, TREC-9 (2000)
Google Scholar
Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - From yesterday’s news to today’s language resources. In: International Conference on Language Resources and Evaluation (2002)
Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage pattern from Web data. SIGKDD Explorations 1(2), 1–12 (2000)
Article Google Scholar
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: SDM2003, pp. 166–177 (2003)
Google Scholar
Wu, S.-T., Li, Y., Xu, Y.: Deploying Approaches for Pattern Refinement in Text Mining. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1157–1161. Springer, Heidelberg (2006)
Google Scholar
Wu, S.-T., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern taxonomy extraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI04), China, pp. 242–248 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane, QLD 4001, Australia
Yuefeng Li, Xujuan Zhou & Sheng-Tang Wu
Department of Information Engineering, Maebashi Institute of Technology, 460-1 Kamisadori-Cho, Maebashi-City 371-0816, Japan
Ning Zhong

Authors

Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Xujuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Tang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Marzena Kryszkiewicz James F. Peters Henryk Rybinski Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Zhong, N., Zhou, X., Wu, ST. (2007). Filtering and Sophisticated Data Processing for Web Information Gathering. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_85

Download citation

DOI: https://doi.org/10.1007/978-3-540-73451-2_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics