Abstract
This paper describes our newly developed Automated Industry and Occupation Coding System (AIOCS). The main function of the system is to classify natural language responses of survey questionnaires into equivalent numeric codes according to the standard code book from the Korean National Statistics Office (KNSO). We implemented the system using a range of automated classification techniques, including hand-crafted rules, a maximum entropy model, and information retrieval techniques, to enhance the performance of automated industry/occupation coding task. The result is a Web-based AIOCS available for public services via the Web site of KNSO. Compared with the previous system developed in 2005, the new Web-based system decreases coding cost with a higher speed and shows significant performance enhancement in production rate and accuracy. Furthermore, it facilitates practical uses through an easy Web user interface.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, B., Creecy, R.H., et al.: On Error Control of Automated Industry and Occupation Coding. Journal of Official Statistics 9(5), 729–745 (1993)
Takahashi, K.: A Supporting System for Coding of the Answers from an Open-ended Question: An Automatic Coding System for SSM Occupation Data by Case Frame. Sociological Theory and Methods 15(1), 149–164 (2000)
Takahashi, K., Takamura, H., Okumura, M.: Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 269–279. Springer, Heidelberg (2005)
Lim, H.S., Lee, W.K.H., et al.: An Automatic Code Classification System by Using Memory-Based Learning and Information Retrieval Technique. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 577–582. Springer, Heidelberg (2005)
Kolodner, J.: Case-Based Reasoning. Morgan Kaufmann, San Mateo (1993)
Mitchell, T.: Decision Tree Learning. In: Mitchell, T. (ed.) Machine Learning, pp. 52–78. McGraw-Hill, New York (1997)
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
Ratnaparkhi: A Maximum Entropy Model for Part-of-speech Tagging. In: Proc. of the Empirical Methods in Natural Language Processing, pp. 133–142 (1996)
Ratnaparkhi: A Simple Introduction to Maximum Entropy Models for Natural Language Processing, Technical Report 97-08, Institute for Research in Cognitive Science, Univ. of Pennsylvania (1997)
Korean Standard Industry Classification, Korea National Statistics Office (2000)
Korean Standard Occupation Classification, Korea National Statistics Office (2000)
Vilares, M., Ribadas, F.J., Vilares, J.: Phrase Similarity through the Edit Distance. In: Proc. of Database and Expert Systems Applications 2004. LNCS, vol. 31080, pp. 306–317. Springer, Heidelberg (2004)
Melz, R., Ryu, P.-M., Choi, K.-S.: Compiling large language resources using lexical similarity metrics for domain taxonomy learning. In: 5th Int. Conf. on Language Resources and Evaluation (2006)
Baeza-Yates, R., Ribeiro, B.: Modern Information Retrieval. Addison-Wesley, Reading (1998)
An Indexing Engine, Apache Lucene, http://lucene.apache.org/
Java package for training and using maximum entropy models, OpenNLP MaxEnt, http://maxent.sourceforge.net/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, Y., Yoo, J., Myaeng, SH., Han, DC. (2008). A Web-Based Automated System for Industry and Occupation Coding. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-85481-4_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85480-7
Online ISBN: 978-3-540-85481-4
eBook Packages: Computer ScienceComputer Science (R0)