Skip to main content

Using Sociolinguistic Inspired Features for Gender Classification of Web Authors

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Archakis, A., Kondyli, M.: Introduction to sociolinguistic issues. Nisos(in Greek), Athens (2004)

    Google Scholar 

  2. Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identification from text. The International Journal of Digital Forensics & Incident. Response 8(1), 78–88 (2011)

    Google Scholar 

  3. Soler, J., Wanner, L.: How to Use Less Features and Reach Better Performance in Author Gender Identification. Proceedings of LREC 2014 (2014)

    Google Scholar 

  4. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4), 401–412 (2002)

    Article  Google Scholar 

  5. Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday 12(9) (September 2007)DOI=Http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2003

    Google Scholar 

  6. Ansari, Y.Z., Azad, S.A., Akhtar, H.: Gender Classification of Blog Authors. In: International Journal of Sustainable Development and Green Economic. Volume 2. (2013) ISSN no.:2315–4721

    Google Scholar 

  7. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’11), Stroudsburg, PA, USA, Association for Computational Linguistics (2011) 1301–1309

    Google Scholar 

  8. Kobayashi, D., Matsumura, N., Ishizuka, M.: Automatic Estimation of Bloggers’ Gender. In: Proceedings of International Conference on Weblogs and Social Media, Boulder: Omnipress (2007)

    Google Scholar 

  9. Zhang, C., Zhang, P.: Predicting gender from blog posts. Technical report, University of Massachusetts Amherst, USA (2010)

    Google Scholar 

  10. Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 conference on Empirical Methods in natural Language Processing (EMNLP’10). (2010) 207–217 DOI=http:/www.aclweb.org/anthology/D10-1021

    Google Scholar 

  11. Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting Age and Gender in Online Social Networks. In: Proceedings of the \(3^{rd}\) International Workshop on Search and Mining User-Generated Contents (SMUC’11), Glasgow, UK (2011) 37–44

    Google Scholar 

  12. Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender Attribution: Tracing Stylometric Evidence Beyond Topic and Genre. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning (Portland, USA, 9 - 24 June, 2011), Stroudsburg, PA, USA, Association for Computational Linguistics (2011) 78–86

    Google Scholar 

  13. Holmgren, J., Shyu, E.: Gender classification of facebook posts. (2013)

    Google Scholar 

  14. Rangel, F., Rosso, P.: Use of Language and Author Profiling: Identification of Gender and Age. In: Proceedings of the Tenth International Workshop on Natural Language Processing and Cognitive Science, Marseille, France (October 2013)

    Google Scholar 

  15. Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M.F., Davalos, S., Teredesai, A., De Cock, M.: Age and Gender Identification in Social Media. Proceedings of CLEF 2014 Evaluation Labs (2014)

    Google Scholar 

  16. Gordon, E.: Sex, speech, and stereotypes: Why women use prestige speech forms more than men. Language in society 26(1), 47–63 (1997)

    Article  MathSciNet  Google Scholar 

  17. Cameron, D.: Gender, Language, and Discourse: A Review Essay. Signs: Journal of women, Culture and Society 23(4) (1998) 945–973

    Google Scholar 

  18. Cameron, D.: Language, Gender, and Sexuality: Current Issues and New Directions. Applied linguistics 26(4) (2005) 482–502 DOI=10.1093/applin/ami027

    Google Scholar 

  19. Bucholtz, M.: You da man: Narrating the racial other in the production of white masculinity. Journal of sociolinguistics 3(4), 443–460 (1999)

    Article  Google Scholar 

  20. Bucholtz, M., Liang, A.C., Sutton, L.A.: Reinventing identities: The gendered self in discourse. Oxford University Press, New York (1999)

    Google Scholar 

  21. Fishman, P.M.: Interaction: The work women do. In: Language, Gender and Society, Rowley, Mass.: Newbury House (1983) 89–102

    Google Scholar 

  22. Lakoff, R.: Talking Power: The Politics of Language. Basic Books, New York (1990)

    Google Scholar 

  23. Lakoff, R.: Language and Women’s Place. Harper and Row, New York (1975)

    MATH  Google Scholar 

  24. Alami, M., Sabbah, M., Iranmanesh, M.: Male-Female Discourse Difference in Terms of Lexical Density. Research Journal Of Applied Sciences, Engineering and Technology. 5, 5365–5369 (2013)

    Google Scholar 

  25. Eckert, P.: Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual review of Anthropology. 41, 87–100 (2012)

    Article  Google Scholar 

  26. Moore, E., Podesva, R.: Style, indexicality, and the social meaning of tag questions. In: Language in Society. Volume 38, Cambridge Univ Press (2009) 447–485

    Google Scholar 

  27. Bucholtz, M.: From ’Sex Differences’ to Gender Variation in Sociolinguistics. In: University of Pennsylvania Working Papers in Linguistics (Papers from NWAV 30). Volume 8, University of Pennsylvania, Department of Linguistics (2002) 33–45

    Google Scholar 

  28. Bucholtz, M.: Theories of Discourse as Theories of Gender: Discourse Analysis in Language and Gender Studies. In: The Handbook of Language and Gender, Oxford Blackwell (2003) 43–68

    Google Scholar 

  29. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)

    Article  Google Scholar 

  30. Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of Language Resources and Evaluation (LREC). 6, 417–422 (2006)

    Google Scholar 

  31. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems), San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasiliki Simaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Simaki, V., Aravantinou, C., Mporas, I., Megalooikonomou, V. (2015). Using Sociolinguistic Inspired Features for Gender Classification of Web Authors. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_66

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics