Abstract
Personality is a fundamental component of an individual’s affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Argamon, S., Dhawle, S., Koppel, M., Pennebaker, J.W.: Lexical predictors of personality type. In: Proceedings of the 2005 Joint Annual Meeting of the Interface and the Classification Society of North America (2005)
Costa, P.T., McCrae, R.R.: Neo PI-R Professional Manual. In: Psychological Assessment Resources, Odessa, FL (1992)
Eid, M., Diener, E.: Intraindividual variability in affect: Reliability, validity, and personality correlates. Journal of Personality and Social Psychology 76(4), 662–676 (1999)
Estival, D., Gaustad, T., Pham, S.B., Radford, W., Hutchinson, B.: Author profiling for english emails. In: 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pp. 262–272 (2007)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Gill, A.J., Nowson, S., Oberlander, J.: What are they blogging about? personality, topic and motivation in blogs. In: ICWSM 2009 (2009)
Gütlein, M.: Large scale attribute selection using wrappers. Master’s thesis, Albert-Ludwigs-Universitat, Freiburg (2006)
Hall, M.A., Smith, L.: Practical feature subset selection for machine learning. In: Proc. 21st Australian Computer Science Conference, Perth, Australia, pp. 181–191. Springer, Heidelberg (1998)
Herring, S., Scheidt, L., Bonus, S., Wright, E.: Weblogs as a bridging genre. Information, Technology & People 18(2), 142–171 (2005)
Kramer, A.D.I., Fussell, S.R., Setlock, L.D.: Text analysis as a tool for analyzing conversation in online support groups. In: Extended Abstracts of the 2004 Conference on Human Factors and Computing Systems, pp. 1485–1488 (2004)
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30, 457–500 (2007)
Mehl, M.R., Gosling, S.D., Pennebaker, J.W.: Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology 90(5), 862–877 (2006)
Nowson, S.: The Language of Weblogs: A study of genre and individual differences. PhD thesis, University of Edinburgh (2006)
Nowson, S., Oberlander, J.: Identifying more bloggers: Towards large scale personality classification of personal weblogs. In: Proceedings of the International Conference on Weblogs and Social (2007)
Nowson, S., Oberlander, J., Gill, A.J.: Weblogs, genres and individual differences. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1666–1671 (2005)
Oberlander, J., Gill, A.J.: Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes 42(3), 239–270 (2006)
Oberlander, J., Nowson, S.: Whose thumb is it anyway? Classifying author personality from weblog text. In: Proceedings of COLING/ACL-2006: 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (2006)
Pennebaker, J.W., Francis, M.E.: Linguistic Inquiry and Word Count, 1st edn. Lawrence Erlbaum, Mahwah (1999)
Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296–1312 (1999)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Reeves, B., Nass, C.: The media equation: how people treat computers, television, and new media like real people and places. Cambridge University Press, New York (1996)
Schutte, N.S., Malouff, J.M.: University student reading preferences in relation to the big five personality dimensions. Reading Psychology an International Quarterly 25(4), 273–295 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)
Yarkoni, T.: Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality 44, 363–373 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Iacobelli, F., Gill, A.J., Nowson, S., Oberlander, J. (2011). Large Scale Personality Classification of Bloggers. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_71
Download citation
DOI: https://doi.org/10.1007/978-3-642-24571-8_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24570-1
Online ISBN: 978-3-642-24571-8
eBook Packages: Computer ScienceComputer Science (R0)