Abstract
Email is a ubiquitous communication tool and constitutes a significant portion of social interactions. In this paper, we attempt to infer the personality of users based on the content of their emails. Such inference can enable valuable applications such as better personalization, recommendation, and targeted advertising. Considering the private and sensitive nature of email content, we propose a privacy-preserving approach for collecting email and personality data. We then frame personality prediction based on the well-known Big Five personality model and train predictors based on extracted email features. We report prediction performance of 3 generative models with different assumptions. Our results show that personality prediction is feasible, and our email feature set can predict personality with reasonable accuracies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology 58(6), 802–822 (2007)
Bellotti, V., Ducheneaut, N., Howard, M., Smith, I.: Taking email to task: the design and evaluation of a task management centered email tool. In: CHI 2003, pp. 345–352 (2003)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: Proc. of CEAS 2004 (2004)
Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: Proc. of EMNLP 2004, pp. 309–316 (2004)
Dredze, M., Brooks, T., Carroll, J., Magarick, J., Blitzer, J.: FernandoPereira: Intelligent email: reply and attachment prediction. In: Proc. of the 13th IUI, pp. 321–324 (2008)
Ducheneaut, N., Bellotti, V.: E-mail as habitat: an exploration of embedded personal information management. Interactions 8, 30–38 (2001)
Ehrenberg, A.L., Juckes, S.C., White, K.M., Walsh, S.P.: Personality and self-esteem as predictors of young people’s technology use. Cyberpsychology & Behavior 11(6), 739–741 (2008)
Hamburger, Y., Ben-Artzi, E.: The relationship between extraversion and neuroticism and the different uses of the internet. Computers in Human Behavior 6(4) (July 2000)
Jakobwitz, S., Egan, V.: The dark ‘triad’ of psychopathy and normal personality traits. Personality and Individual Differences 40(0), 331–339 (2006)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers (2001)
John, O.P., Robins, R.W., Pervin, L.A.: Handbook of Personality: Theory and Research. 3rd edn. The Guilford Press (2010)
Kenny, D.A., Horner, C., Kashy, D.A., Chu, L.C.: Consensus at zero acquaintance: Replication, behavioral cues, and stability. Journal of Personality and Social Psychology, 88–97 (1992)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of ICML 2001, pp. 282–289 (2001)
Lam, D., Rohall, S.L., Schmandt, C., Stern, M.K.: Exploiting e-mail structure to improve summarization. In: Proc. of CSCW 2002 (2002)
Lepri, B., Mana, N., Cappelletti, A., Pianesi, F., Zancanaro, M.: Modeling the personality of participants during group interactions. In: Houben, G.-J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 114–125. Springer, Heidelberg (2009)
Muldner, K., Burleson, W., VanLehn, K.: “Yes!”: Using tutor and sensor data to predict moments of delight during instructional activities. In: De Bra, P., Kobsa, A., Chin, D. (eds.) UMAP 2010. LNCS, vol. 6075, pp. 159–170. Springer, Heidelberg (2010)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proc. of the 43rd ACL, pp. 115–124 (2005)
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count (LIWC2001). Lawrence Erlbaum Associates, Mahwah (2001)
Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proc. of EMNLP 2009, pp. 248–256 (2009)
Shaw, E., Stroz, E.: Warmtouch: assessing the insider threat and relationship management. In: Parker, T., Devost, M., Sachs, M., Shaw, E., Stroz, E. (eds.) Cyber Adversary Characterization: Auditing the Hacker Mind, Syngress Publishing (2004)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. of NAACL 2003, 173–180 (2003)
Tsoumakas, G., Katakis, I.: Multi label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2005)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23(7), 1079–1089 (2011)
Whittaker, S., Bellotti, V., Gwizdka, J.: Email in personal information management. Communications of the ACM 49(1), 68–73 (2006)
Wiktionary: a multilingual, web-based free dictionary (2013), http://www.wiktionary.org (retrieved)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT-EMNLP, pp. 347–354 (2005)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of ICML 1997, 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, J., Brdiczka, O., Liu, J. (2013). Understanding Email Writers: Personality Prediction from Email Messages. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds) User Modeling, Adaptation, and Personalization. UMAP 2013. Lecture Notes in Computer Science, vol 7899. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38844-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-38844-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38843-9
Online ISBN: 978-3-642-38844-6
eBook Packages: Computer ScienceComputer Science (R0)