Abstract
In this chapter, we propose a method applying Web search results to the document classification for the purpose of enriching the amount of the training corpus. For the query that will be submitted to a Web search engine, the proposed method generates the Web query based on the matching score between words in documents and the category. Experimental results show that the Web query based on the higher ranked words can improve the document classification performance while the Web query based on the lower ranked words makes worse the document classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nyberg K, Raiko T, Tinanen T, Hyvnen E (2010) Document classification utilising ontologies and relations between documents. In: Proceedings of the 8th workshop on mining and learning with graphs, Washington, pp 86–93
Ayyasamy RK, Tahayna B, Alhashmi S, Eu-gene S, Egerton S (2010) Mining wikipedia knowledge to improve document indexing and classification. In: 10th international conference on information science, signal processing and their applications, pp 806–809
Ferreiraa R, Freitasa F, Britob P, Meloa J, Limaa R, Costab E (2013) RetriBlog: an architecture-centered framework for developing blog crawlers. Expert Syst Appl 40(4):1177–1195
Park S, Kim CW, An DU (2009) E-mail classification and category re-organization using dynamic category hierarchy and PCA. J Inf Commun Convergence Eng 7(3):351–355
Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 1(1):4–20
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208
Lu G, Huang P, He L, Cu C, Li X (2010) A new semantic similarity measuring method based on web search engines. WSEAS Trans Comput 9(1):1–10
Jialei Z, Hwang CG, Jung GD, Choi YK (2011) A design of K-XMDR search system using topic maps. J Inf Commun Convergence Eng 9(3):287–294
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu
Berger A, Pietra SD, Pietra VD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Lim JH, Hwang YS, Park SY, Rim HC (2004) Semantic role labeling using maximum entropy model. In: Shared task of the fourteenth conference on computational natural language learning
Samarawickrama S, Jayaratne L (2011) Automatic text classification and focused crawling. In: Sixth international conference on digital information management (ICDIM), pp 143–148
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: 14th international conference on machine learning, pp 412–420
Seki K, Mostafa J (2005) An application of text categorization methods to gene ontology annotation. In: 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 138–145
Kihl T, Chang J, Park SY (2012) Application tag system based on experience and pleasure for hedonic searches. Convergence Hybrid Inf Technol Commun Comput Inf Sci 310:342–352
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1A3013405).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Park, SY., Chang, J., Kihl, T. (2013). Application of Web Search Results for Document Classification. In: Jung, HK., Kim, J., Sahama, T., Yang, CH. (eds) Future Information Communication Technology and Applications. Lecture Notes in Electrical Engineering, vol 235. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6516-0_32
Download citation
DOI: https://doi.org/10.1007/978-94-007-6516-0_32
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6515-3
Online ISBN: 978-94-007-6516-0
eBook Packages: EngineeringEngineering (R0)