Skip to main content

Two Simple and Domain-independent Approaches for Early Detection of Anorexia

  • Chapter
  • First Online:
Early Detection of Mental Health Disorders by Social Media Monitoring

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1018))

  • 622 Accesses

Abstract

In this chapter, we describe the participation of our research team in the eRisk addressing the two editions of the early anorexia detection task. We used two domain-independent approaches to address this task. The first approach is based on a temporal-aware document representation, whereas the second one consists of a simple, interpretable, and novel text classification model specially designed for addressing early risk detection scenarios. Regarding the obtained results, in the first edition, we achieved the best ERDE\(_5\) value among all participant models using the first approach, whereas with the second one, the best precision (0.91). Besides, using the latter approach, in the second edition, we were able to achieve the best values for both ERDE\(_5\) and ERDE\(_{50}\), and also promising results in terms of the ranking-based metrics, obtaining the best values, consistently, across all four rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The name gv stands for “global value”.

  2. 2.

    As it turns out, for text classification, people would normally direct our attention only to certain “keywords” (filtering out all the rest) and explain why these words were important in their reasoning process.

  3. 3.

    Readers interested in knowing how these equations were determined are invited to read Sect. 3 of the original paper [2].

  4. 4.

    Any function \(f:2^{\mathbb {R}^n}\mapsto \mathbb {R}^n\) could be used as a summary operator. In this example, vector addition was used for \(\oplus _1\) but not for \(\oplus _0\) to highlight this fact.

  5. 5.

    A live demo is provided at http://tworld.io/ss3 where the interested readers can try out the model online. Along with the classification result, the demo provides an interactive visual explanation as the one suggested here. We believe explanations like these are vital when models’ predictions could affect people’s lives since it allows human experts to inspect the reasons behind the classifications and validate them [Las accessed date: April 2021].

  6. 6.

    For instance, in the first edition [13] the release of the evaluation data was chunk-by-chunk whereas, in the second edition [14], user content was released post by post. Additionally, a new set of evaluation metrics was used.

  7. 7.

    Note that, unlike UNSLB and UNSLC, UNSLA used FTVT with \(n=0\) and, therefore, the actual representation was identical to a standard CSA representation—with no temporal chunk-based enrichment of the positive class.

  8. 8.

    Which simply consisted of a summation of word values learned from the available training data (Eq. 5).

  9. 9.

    This hyperparameter configuration was originally discovered with the eRisk 2017 early depression detection dataset by applying a grid search to minimize the ERDE\(_{50}\) metric with the training set using a 4-fold cross-validation [2].

  10. 10.

    Using the training data, starting from 0 and incrementally, different values were tested from which 0.3 was finally selected for obtaining the best ERDE\(_{50}\).

  11. 11.

    This is probably caused by the final model heavily prioritizing recall over precision, affecting their harmonic mean which ultimately affected the \(F_{latency}\). For instance, among our 5 models, UNSL#4 obtained the best recall (0.92) but the lowest precision value (0.31).

  12. 12.

    The ERDE\(_o\) measure is calculated with the cost of false positives (cfp) being considerably lower than that of false negatives (cfn). Note that giving more importance to recall than to precision is reasonable since, in early risk detection tasks, every single undetected (positive) user is a life at risk.

  13. 13.

    The probability with which a particular sequence of words occurs will never be greater than the probability of each individual word.

  14. 14.

    For each of the users 2000 posts, not only was it necessary to send a request to the server to obtain the post, but also 5 more requests to send the response of each model. Therefore, for teams with 5 models like UNSL, completing the task required sending a total of \(2000 + 2000*5 = 12000\) requests to the server. Therefore, if the connection latency was, for instance, 3 s, approximately 10 h of the total time would be consumed only by communication.

  15. 15.

    Much of these 8 s corresponded to communication latency, since they include the latency of receiving the post, processing time, and the latency of sending the response.

  16. 16.

    We coded our script in plain Python 2.7 and only using built-in functions and data structures; no external library was used (such as NumPy). Additionally, to run our script we used one of the author’s laptop which had standard technical specifications (Intel Core i5, 8GB of DDR4 RAM, etc.).

  17. 17.

    In this task, a perfect ranking is a ranking where all 73 at-risk users are located in the first 73 positions.

  18. 18.

    Which would explain why our models obtained the best ERDE values despite having classified all at-risk users, on average, after having processed only their first 2 posts.

  19. 19.

    That is, if our models were not able to obtain better classification results, it was not due to a poor estimation of the risk level of the users, but due to the policy used to decide when to classify them based on that estimation.

References

  1. Aragón, M. E., López-Monroy, A. P., & Montes-y Gómez, M. (2019). INAOE-CIMAT at eRisk 2019: Detecting signs of anorexia using fine-grained emotions. In Working Notes of CLEF, CEUR Workshop Proceedings, Lugano, Switzerland.

    Google Scholar 

  2. Burdisso, S. G., Errecalde, M., and Montes-y Gómez, M. A text classification framework for simple and effective early depression detection over social media streams. Expert Systems with Applications 133 (2019), 182–197.

    Article  Google Scholar 

  3. Burdisso, S. G., Errecalde, M., & Montes-y Gómez, M. (2019). UNSL at eRisk 2019: a unified approach for anorexia, self-harm and depression detection in social media. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.

    Google Scholar 

  4. Burdisso, S. G., Errecalde, M., & Montes-y Gómez, M. (2020). \({\tau }\)-SS3: A text classifier with dynamic n-grams for early risk detection over text streams. Pattern Recognition Letters, 138, 130–137.

    Article  Google Scholar 

  5. Errecalde, M. L., Villegas, M. P., Funez, D. G., Ucelay, M. J. G., & Cagnina, L. C. (2017). Temporal variation of terms as concept space for early risk prediction. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Dublin, Ireland.

    Google Scholar 

  6. Funez, D. G., Ucelay, M. J. G., Villegas, M. P., Burdisso, S. G., Cagnina, L. C., Montes-y Gómez, M., & Errecalde, M. L. (2018). UNSL’s participation at eRisk 2018 lab. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Avignon, France.

    Google Scholar 

  7. Gonzalez, A., Clarke, S., & Kohn, M. (2007). Eating disorders in adolescents. Australian Family Physician, 36, 8.

    Google Scholar 

  8. Hay, P. (2020). Current approach to eating disorders: a clinical update. Internal Medicine Journal, 50, 24–29.

    Article  Google Scholar 

  9. Kakhi, S., and McCann, J. Anorexia nervosa: diagnosis, risk factors and evidence-based treatments. Progress in Neurology and Psychiatry 20 (2016), 24–29.

    Article  Google Scholar 

  10. Li, Z., Xiong, Z., Zhang, Y., Liu, C., and Li, K. Fast text categorization using concise semantic analysis. Pattern Recognition Letters 32, 3 (2011), 441–448.

    Article  Google Scholar 

  11. López-Monroy, A. P., Montes-y Gómez, M., Escalante, H. J., Villasenor-Pineda, L., and Stamatatos, E. Discriminative subprofile-specific representations for author profiling in social media. Knowledge-Based Systems 89 (2015), 134–147.

    Article  Google Scholar 

  12. Losada, D. E., & Crestani, F. (2016). A test collection for research on depression and language use. In N. Fuhr, P. Quaresma, T. Gonçalves, B. Larsen, K. Balog, C. Macdonald, L. Cappellato & N. Ferro (Eds.),Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2016) (pp. 28–39). Springer International Publishing.

    Google Scholar 

  13. Losada, D. E., Crestani, F., & Parapar, J. (2017). eRisk 2017: CLEF lab on early risk prediction on the internet: Experimental foundations. In G. J. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappellato, & N. Ferro, (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2017) (pp. 346–360). Springer International Publishing.

    Google Scholar 

  14. Losada, D. E., Crestani, F., & Parapar, J. (2018). Overview of eRisk: Early risk prediction on the internet. In P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Y. Nie, L. Soulier, E. SanJuan, L. Cappellato & N. Ferro, (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2018) (pp. 343–361). Springer International Publishing.

    Google Scholar 

  15. Losada, D. E., Crestani, F., and Parapar, J. (2019). Overview of eRisk 2019 early risk prediction on the internet. In F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. E. Losada, G. Heinatz Bürki, L. Cappellato & N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2019) (pp. 340–357). Springer International Publishing.

    Google Scholar 

  16. Mohammadi, E., Amini, H., & Kosseim, L. (2019). Quick and (maybe not so) easy detection of anorexia in social media posts. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.

    Google Scholar 

  17. Ragheb, W., Azé, J., Bringay, S., & Servajean, M. (2019). Attentive multi-stage learning for early risk detection of signs of anorexia and self-harm on social media. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.

    Google Scholar 

  18. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.

    Article  Google Scholar 

  19. Sadeque, F., Xu, D., & Bethard, S. (2018). Measuring the latency of depression detection in social media. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 495–503).

    Google Scholar 

  20. Termorshuizen, J., Watson, H., & Thornton, LM, et al. (2020). Early impact of COVID-19 on individuals with self-reported eating disorders: A survey of 1,000 individuals in the United States and the Netherlands. International Journal of Eating Disorders, 53, 1780–1790.

    Google Scholar 

  21. Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Avignon, France.

    Google Scholar 

  22. Vikram, P. (2005). Gender in Mental Health Research. Gender and Health Research Series: World Health Organization.

    Google Scholar 

  23. Zhang, Y., Jin, R., & Zhou, Z.-H. (2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1(1–4), 43–52.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Burdisso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Burdisso, S., Cagnina, L., Errecalde, M., Montes-y-Gómez, M. (2022). Two Simple and Domain-independent Approaches for Early Detection of Anorexia. In: Crestani, F., Losada, D.E., Parapar, J. (eds) Early Detection of Mental Health Disorders by Social Media Monitoring. Studies in Computational Intelligence, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-04431-1_7

Download citation

Publish with us

Policies and ethics