Skip to main content

Feature Selection for Document Retrieval in the Export Control Domain

Feature Selection Based on Chi Square

  • Conference paper
  • First Online:
Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016 (IntelliSys 2016)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 16))

Included in the following conference series:

  • 2677 Accesses

Abstract

It is difficult to classify electronic documents into strategic technology and non-strategic technology. A document retrieval system to find out similar cases efficiently has been developed. However, its performance should be improved as more documents are accumulating on the system. In this paper, we will apply feature selection method based on chi square and try to improve the performance of system.

This work was supported by the Nuclear Safety Research Program through the Korea Foundation of Nuclear Safety (KOFONS), granted financial resource from the Nuclear Safety and Security Commission (NSSC), Republic of Korea (No. 1305014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Reactor Coolant System, Reactor Vessel, Safety Injection and Shutdown Cooling System, Steam Generator, In-core Instrument, Control Element Assembly, Fuel Assembly, Chemical and volume control system, Control System, Component Cooling Water System, Turbine and Generator System, Feedwater System, Steam System, Condenser System, Air System, Diesel Generator System, Reactor Containment Building, Radwaste System, Instrument System, Plant Protection System, Drain System, Electric Power System, Fuel Handling and Transfer System, Other Systems.

References

  1. Jae-woong, T., Choul-woong, S., Dong-hoon, S.: The role of text mining. Transactions of the Korean Nuclear Society Autumn Meeting (2015)

    Google Scholar 

  2. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. Int. J. 24(5), 513–523 (1988)

    Article  Google Scholar 

  3. Andrew Ng. CS229 lecture notes (2012). http://cs229.stanford.edu/notes/cs229-notes5.pdf

  4. Quinlan, J.R.: Constructing Decision Tree in C4.5: Programs for Machine Learning, pp. 17–26. Morgan Kaufman Publishers (1993)

    Google Scholar 

  5. Feature selection for unbalanced class distribution and Naive Bayes

    Google Scholar 

  6. Schutze, H., Hull, D., Pedersen, J.: A comparison of classifiers and document representations for the routing problem. In: International ACM SIGIR Conference on Research (1995)

    Google Scholar 

  7. Moh’d, A.: Chi square feature extraction based SVMs Arabic language text categorization system. J. Comput. Sci. 3(6), 430–435 (2007)

    Article  Google Scholar 

  8. Christine, L., Christophe, M., Mathias, G.: Entropy based feature selection for text categorization. In: ACM Symposium on Applied Computing, TaiChung, Taiwan, March 2011, pp. 924–928. ACM (2011)

    Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)

    Google Scholar 

  11. Azam, N.: Yao. J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)

    Article  Google Scholar 

  12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  13. Yang, Y., Pederson, O.J.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)

    Google Scholar 

  14. Dessi, N., Pes, B.: Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst. Appl. 42, 4632–4642 (2015)

    Article  Google Scholar 

  15. Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and Naive Bayes. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 258–267 (1999)

    Google Scholar 

  16. Diener-West, M.: Use of the Chi-Square Statistic (2008)

    Google Scholar 

  17. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)

    Article  Google Scholar 

  18. Uysal, A., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl. Based Syst. 36, 226–235 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jae-woong Tae .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Tae, Jw., Yoon, Sh., Shin, Dh. (2018). Feature Selection for Document Retrieval in the Export Control Domain. In: Bi, Y., Kapoor, S., Bhatia, R. (eds) Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-56991-8_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56991-8_76

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56990-1

  • Online ISBN: 978-3-319-56991-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics