Abstract
Large-scale annotated corpora are a prerequisite for developing high-performance age regression models. However, such annotated corpora are sometimes very expensive and time-consuming to obtain. In this paper, we aim to reduce the annotation effort for age regression via active learning. The key idea of our active learning approach is first to divide the whole feature space into several disjoint feature subspaces and then leverage them to learn a committee of regressors. Given the committee of regressors, we apply a query by committee (QBC) method to select unconfident samples in the unlabeled data for manual annotation. Empirical studies demonstrate the effectiveness of the proposed approach to active learning for age regression.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Peersman, C., Daelemans, W., Vaerenbergh, L.V.: Predicting age and gender in online social networks. In: Proceedings of 3rd International Workshop on Search and Mining User-generated Contents SMUC, pp. 37–44 (2011)
Nguyen, D., Smith, N.A., Rose, C.P.: Author age prediction from text using liner regression. In: Proceedings of 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123 (2011)
Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How Old Do You Think I Am?”: a study of language and age in Twitter. In: Proceedings of AAAI Conference on Weblogs and Social Media, pp. 439–448 (2013)
Nguyen, D., Trieschnigg, D., Dogruöz, A.S., Gravel, R.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING, pp. 1950–1961 (2014)
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of International Conference on Machine Learning, pp. 148–156 (1996)
Zhou, Z.H., Li, M.: Semi-supervised regression with co-training. In: Proceedings of IJCAI, pp. 908–913 (2005)
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selecting sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (2001)
Mackinnon, I., Warren, R.: Age and geographic inferences of the live journal social network. In: Proceedings of ICML, pp. 176–178 (2006)
Rosenthal, S.: Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations. In: Proceedings of ACL, pp. 763–772 (2011)
Olsson, F.: A literature survey of active learning machine learning in the context of natural language processing. SICS Technical report (2009)
Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, vol. 39, no. 2, pp. 127–131 (2010)
Li, S.S., Xue, Y.X., Wang, Z.Q., Zhou, G.D.: Active learning for cross-domain sentiment classification. In: Proceedings of IJCAI, pp. 2127–2133 (2013)
Burbidge, R., Rowland J.J., King R.D.: Active learning for regression based on query by committee. In: Proceedings of Intelligent Data Engineering and Automated Learning (IDEAL), pp. 209–218 (2007)
Vapnik, V.N.: The Nature of Statistical Learning Theory, pp. 988–999. Springer, New York (1995)
Sassano, M.: An empirical study of active learning with support vector machines for Japanese word segmentation. In: Proceedings of Meeting of the Association for Computational Linguistics, pp. 505–512 (2002)
Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Cameron, A., Windmeijer, F.: R-squared measures for count data regression models with applications to health-care utilization. J. Bus. Econ. Stat. 14(2), 209–220 (1993)
Acknowledgments
This research work has been partially supported by two NSFC grants, No. 61375073 and No. 61273320, one the State Key Program of National Natural Science of China No. 61331011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Chen, J., Li, S., Dai, B., Zhou, G. (2016). Active Learning for Age Regression in Social Media. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)