Three New Feature Weighting Methods for Text Categorization

Xue, Wei; Xu, Xinshun

doi:10.1007/978-3-642-16515-3_44

Wei Xue²⁰ &
Xinshun Xu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6318))

Included in the following conference series:

International Conference on Web Information Systems and Mining

3015 Accesses

Abstract

Feature weighting is an important phase of text categorization, which computes the feature weight for each feature of documents. This paper proposes three new feature weighting methods for text categorization. In the first and second proposed methods, traditional feature weighting method tf×idf is combined with “one-side” feature selection metrics (i.e. odds ratio, correlation coefficient) in a moderate manner, and positive and negative features are weighted separately. tf×idf+CC and tf×idf+OR are used to calculate the feature weights. In the third method, tf is combined with feature entropy, which is effective and concise. The feature entropy measures the diversity of feature’s document frequency in different categories. The experimental results on Reuters-21578 corpus show that the proposed methods outperform several state-of-the-art feature weighting methods, such as tf×idf, tf×CHI, andtf×OR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: European of Conference on Machine Learning, Chemnitz, pp. 137–142 (1998)
Google Scholar
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems 12, 252–277 (1994)
Article Google Scholar
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text categorization. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, Stockholm, pp. 61–67 (1999)
Google Scholar
Schapier, R.E.: Boostexter: A boosting-based system for text categorization. Machine Learning 39, 135–168 (2000)
Article MATH Google Scholar
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning, pp. 412–520 (1997)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. Computing Surveys 34, 1–47 (2002)
Article Google Scholar
Zheng, Z.H., Wu, X.Y., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter 6, 80–89 (2004)
Article Google Scholar
Zheng, Z.H., Srihari, R., Srihari, S.: A feature selection framework for text filtering. In: 3rd IEEE International Conference on Data Mining, Melbourne, pp. 705–708 (2003)
Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. Studies in Fuzziness and Soft Computing 138, 71–98 (2004)
Google Scholar
Deng, Z.H., Tang, S.W., Yang, D.Q., Li, L.Y., Xie, K.Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004)
Chapter Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Article Google Scholar
Rijsbergen, V.: Information Retrieval. Butterworths, London (1979)
MATH Google Scholar
Mladenic, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Conference on Automated Learning and Discovery, the Workshop on Learning from Text and the Web, Pittsburg (1998)
Google Scholar
Ng, W., Goh, H., Low, K.: Feature selection, perceptron learning, and a usability case study for text categorization. ACM SIGIR Forum 31, 67–73 (1997)
Article Google Scholar
Chang, C., Lin, C.: LibSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/cjlin/libsvm

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Jinan, 250101, China
Wei Xue & Xinshun Xu

Authors

Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xinshun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration, Caritas Francis Hsu College, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang
Department of Computer and Inforamtion Science, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, SAR, China
Zhiguo Gong
School of Computer, Shanghai University, 99 Shangda Road, 200444, Shanghai, China
Xiangfeng Luo
School of Computer, Nanjing University of Posts and Telecommunications, 210003, Nanjing, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, W., Xu, X. (2010). Three New Feature Weighting Methods for Text Categorization. In: Wang, F.L., Gong, Z., Luo, X., Lei, J. (eds) Web Information Systems and Mining. WISM 2010. Lecture Notes in Computer Science, vol 6318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16515-3_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-16515-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16514-6
Online ISBN: 978-3-642-16515-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics