Abstract
Mismatch and overload are two fundamental issues regarding the efficiency of Web information gathering. To provide a satisfactory solution, this paper presents a Web information gathering system that encapsulates two phases: the filtering and sophisticated data processing. The objective of the filtering is to quickly filter out most irrelevant data in order to avoid mismatch. The phase of the sophisticated data processing can use more sophisticated techniques without carefully considering time complexities. The second phase is for solving the problem of the information overload.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE95, pp. 3–14
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Chang, C.H., Hsu, C.C.: Enabling concept-based relevance feedback for information retrieval on the WWW. IEEE Transactions on Knowledge and Data Engineering 11(4), 595–609 (1999)
Garofalakis, M.N., Rastogi, R., Seshadri, S., Shim, K.: Data mining and the Web: past, present and future. In: WIDM99, pp. 43–47
Hammouda, K.M., Kamel, M.: SPhrase-based document similarity based on an index graph model. In: ICDM02, pp. 203–210
Hull, D.A., Roberston, S.: The TREC-8 filtering track final report. In: TREC-8 (1999)
Jones, K.S.: Information retrieval and artificial intelligence. Artificial Intelligence 114(1-2), 257–281 (1999)
Li, Y., Zhang, C., Zhang, S.: Cooperative strategy for Web data mining and cleaning. Applied Artificial Intelligence 17(5-6), 443–460 (2003)
Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)
Li, Y., Yang, W., Xu, Y.: Multi-Tier Granule Mining for Representations of Multidimensional Association Rules. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 953–958. Springer, Heidelberg (2006)
Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)
Li, Y., Zhong, N.: Mining Rough Association from Text Documents. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 368–377. Springer, Heidelberg (2006)
Li, Y., Zhong, N.: Rough Association Rule Mining in Text Documents for Acquiring Web User Information Needs. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI06, pp. 226–232 (2006)
Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: ICDM03. 3rd IEEE International Conference on Data Mining, pp. 593–596 (2003)
Madria, S.M., Bhowmick, S.S., Ng, W.K., Lim, E.-P.: Research issues in Web data mining. In: Zaki, M.J., Ho, C.-T. (eds.) Large-Scale Parallel Data Mining. LNCS (LNAI), vol. 1759, pp. 303–312. Springer, Heidelberg (2000)
Mobasher, B., Dai, H., Luo, T., Sun, Y., Zhu, J.: Combining Web usage and content mining for more effective personalization. In: International Conference on Ecommerce and Web Technologies (2000)
Pal, S.K., Talwar, V.: Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)
Robertson, S., Hull, D.A.: The TREC-9 filtering track final report, TREC-9 (2000)
Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - From yesterday’s news to today’s language resources. In: International Conference on Language Resources and Evaluation (2002)
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage pattern from Web data. SIGKDD Explorations 1(2), 1–12 (2000)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: SDM2003, pp. 166–177 (2003)
Wu, S.-T., Li, Y., Xu, Y.: Deploying Approaches for Pattern Refinement in Text Mining. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1157–1161. Springer, Heidelberg (2006)
Wu, S.-T., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern taxonomy extraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI04), China, pp. 242–248 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Zhong, N., Zhou, X., Wu, ST. (2007). Filtering and Sophisticated Data Processing for Web Information Gathering. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_85
Download citation
DOI: https://doi.org/10.1007/978-3-540-73451-2_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)