Two Simple and Domain-independent Approaches for Early Detection of Anorexia

Burdisso, Sergio; Cagnina, Leticia; Errecalde, Marcelo; Montes-y-Gómez, Manuel

doi:10.1007/978-3-031-04431-1_7

Sergio Burdisso^5,6,
Leticia Cagnina^5,7,
Marcelo Errecalde⁵ &
…
Manuel Montes-y-Gómez⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1018))

622 Accesses

Abstract

In this chapter, we describe the participation of our research team in the eRisk addressing the two editions of the early anorexia detection task. We used two domain-independent approaches to address this task. The first approach is based on a temporal-aware document representation, whereas the second one consists of a simple, interpretable, and novel text classification model specially designed for addressing early risk detection scenarios. Regarding the obtained results, in the first edition, we achieved the best ERDE\(_5\) value among all participant models using the first approach, whereas with the second one, the best precision (0.91). Besides, using the latter approach, in the second edition, we were able to achieve the best values for both ERDE\(_5\) and ERDE\(_{50}\), and also promising results in terms of the ranking-based metrics, obtaining the best values, consistently, across all four rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The name gv stands for “global value”.
2.
As it turns out, for text classification, people would normally direct our attention only to certain “keywords” (filtering out all the rest) and explain why these words were important in their reasoning process.
3.
Readers interested in knowing how these equations were determined are invited to read Sect. 3 of the original paper [2].
4.
Any function \(f:2^{\mathbb {R}^n}\mapsto \mathbb {R}^n\) could be used as a summary operator. In this example, vector addition was used for \(\oplus _1\) but not for \(\oplus _0\) to highlight this fact.
5.
A live demo is provided at http://tworld.io/ss3 where the interested readers can try out the model online. Along with the classification result, the demo provides an interactive visual explanation as the one suggested here. We believe explanations like these are vital when models’ predictions could affect people’s lives since it allows human experts to inspect the reasons behind the classifications and validate them [Las accessed date: April 2021].
6.
For instance, in the first edition [13] the release of the evaluation data was chunk-by-chunk whereas, in the second edition [14], user content was released post by post. Additionally, a new set of evaluation metrics was used.
7.
Note that, unlike UNSLB and UNSLC, UNSLA used FTVT with \(n=0\) and, therefore, the actual representation was identical to a standard CSA representation—with no temporal chunk-based enrichment of the positive class.
8.
Which simply consisted of a summation of word values learned from the available training data (Eq. 5).
9.
This hyperparameter configuration was originally discovered with the eRisk 2017 early depression detection dataset by applying a grid search to minimize the ERDE\(_{50}\) metric with the training set using a 4-fold cross-validation [2].
10.
Using the training data, starting from 0 and incrementally, different values were tested from which 0.3 was finally selected for obtaining the best ERDE\(_{50}\).
11.
This is probably caused by the final model heavily prioritizing recall over precision, affecting their harmonic mean which ultimately affected the \(F_{latency}\). For instance, among our 5 models, UNSL#4 obtained the best recall (0.92) but the lowest precision value (0.31).
12.
The ERDE\(_o\) measure is calculated with the cost of false positives (cfp) being considerably lower than that of false negatives (cfn). Note that giving more importance to recall than to precision is reasonable since, in early risk detection tasks, every single undetected (positive) user is a life at risk.
13.
The probability with which a particular sequence of words occurs will never be greater than the probability of each individual word.
14.
For each of the users 2000 posts, not only was it necessary to send a request to the server to obtain the post, but also 5 more requests to send the response of each model. Therefore, for teams with 5 models like UNSL, completing the task required sending a total of \(2000 + 2000*5 = 12000\) requests to the server. Therefore, if the connection latency was, for instance, 3 s, approximately 10 h of the total time would be consumed only by communication.
15.
Much of these 8 s corresponded to communication latency, since they include the latency of receiving the post, processing time, and the latency of sending the response.
16.
We coded our script in plain Python 2.7 and only using built-in functions and data structures; no external library was used (such as NumPy). Additionally, to run our script we used one of the author’s laptop which had standard technical specifications (Intel Core i5, 8GB of DDR4 RAM, etc.).
17.
In this task, a perfect ranking is a ranking where all 73 at-risk users are located in the first 73 positions.
18.
Which would explain why our models obtained the best ERDE values despite having classified all at-risk users, on average, after having processed only their first 2 posts.
19.
That is, if our models were not able to obtain better classification results, it was not due to a poor estimation of the risk level of the users, but due to the policy used to decide when to classify them based on that estimation.

References

Aragón, M. E., López-Monroy, A. P., & Montes-y Gómez, M. (2019). INAOE-CIMAT at eRisk 2019: Detecting signs of anorexia using fine-grained emotions. In Working Notes of CLEF, CEUR Workshop Proceedings, Lugano, Switzerland.
Google Scholar
Burdisso, S. G., Errecalde, M., and Montes-y Gómez, M. A text classification framework for simple and effective early depression detection over social media streams. Expert Systems with Applications 133 (2019), 182–197.
Article Google Scholar
Burdisso, S. G., Errecalde, M., & Montes-y Gómez, M. (2019). UNSL at eRisk 2019: a unified approach for anorexia, self-harm and depression detection in social media. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.
Google Scholar
Burdisso, S. G., Errecalde, M., & Montes-y Gómez, M. (2020). \({\tau }\)-SS3: A text classifier with dynamic n-grams for early risk detection over text streams. Pattern Recognition Letters, 138, 130–137.
Article Google Scholar
Errecalde, M. L., Villegas, M. P., Funez, D. G., Ucelay, M. J. G., & Cagnina, L. C. (2017). Temporal variation of terms as concept space for early risk prediction. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Dublin, Ireland.
Google Scholar
Funez, D. G., Ucelay, M. J. G., Villegas, M. P., Burdisso, S. G., Cagnina, L. C., Montes-y Gómez, M., & Errecalde, M. L. (2018). UNSL’s participation at eRisk 2018 lab. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Avignon, France.
Google Scholar
Gonzalez, A., Clarke, S., & Kohn, M. (2007). Eating disorders in adolescents. Australian Family Physician, 36, 8.
Google Scholar
Hay, P. (2020). Current approach to eating disorders: a clinical update. Internal Medicine Journal, 50, 24–29.
Article Google Scholar
Kakhi, S., and McCann, J. Anorexia nervosa: diagnosis, risk factors and evidence-based treatments. Progress in Neurology and Psychiatry 20 (2016), 24–29.
Article Google Scholar
Li, Z., Xiong, Z., Zhang, Y., Liu, C., and Li, K. Fast text categorization using concise semantic analysis. Pattern Recognition Letters 32, 3 (2011), 441–448.
Article Google Scholar
López-Monroy, A. P., Montes-y Gómez, M., Escalante, H. J., Villasenor-Pineda, L., and Stamatatos, E. Discriminative subprofile-specific representations for author profiling in social media. Knowledge-Based Systems 89 (2015), 134–147.
Article Google Scholar
Losada, D. E., & Crestani, F. (2016). A test collection for research on depression and language use. In N. Fuhr, P. Quaresma, T. Gonçalves, B. Larsen, K. Balog, C. Macdonald, L. Cappellato & N. Ferro (Eds.),Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2016) (pp. 28–39). Springer International Publishing.
Google Scholar
Losada, D. E., Crestani, F., & Parapar, J. (2017). eRisk 2017: CLEF lab on early risk prediction on the internet: Experimental foundations. In G. J. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappellato, & N. Ferro, (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2017) (pp. 346–360). Springer International Publishing.
Google Scholar
Losada, D. E., Crestani, F., & Parapar, J. (2018). Overview of eRisk: Early risk prediction on the internet. In P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Y. Nie, L. Soulier, E. SanJuan, L. Cappellato & N. Ferro, (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2018) (pp. 343–361). Springer International Publishing.
Google Scholar
Losada, D. E., Crestani, F., and Parapar, J. (2019). Overview of eRisk 2019 early risk prediction on the internet. In F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. E. Losada, G. Heinatz Bürki, L. Cappellato & N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction (Cham, 2019) (pp. 340–357). Springer International Publishing.
Google Scholar
Mohammadi, E., Amini, H., & Kosseim, L. (2019). Quick and (maybe not so) easy detection of anorexia in social media posts. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.
Google Scholar
Ragheb, W., Azé, J., Bringay, S., & Servajean, M. (2019). Attentive multi-stage learning for early risk detection of signs of anorexia and self-harm on social media. In Working Notes of CLEF 2019, CEUR Workshop Proceedings, Lugano, Switzerland.
Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
Article Google Scholar
Sadeque, F., Xu, D., & Bethard, S. (2018). Measuring the latency of depression detection in social media. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 495–503).
Google Scholar
Termorshuizen, J., Watson, H., & Thornton, LM, et al. (2020). Early impact of COVID-19 on individuals with self-reported eating disorders: A survey of 1,000 individuals in the United States and the Netherlands. International Journal of Eating Disorders, 53, 1780–1790.
Google Scholar
Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia. In Working Notes of CLEF 2018, CEUR Workshop Proceedings, Avignon, France.
Google Scholar
Vikram, P. (2005). Gender in Mental Health Research. Gender and Health Research Series: World Health Organization.
Google Scholar
Zhang, Y., Jin, R., & Zhou, Z.-H. (2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1(1–4), 43–52.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Nacional de San Luis (UNSL), San Luis, Argentina
Sergio Burdisso, Leticia Cagnina & Marcelo Errecalde
Idiap Research Institute, Rue Marconi 19, 1920, Martigny, Switzerland
Sergio Burdisso
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
Leticia Cagnina
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Puebla, Mexico
Manuel Montes-y-Gómez

Authors

Sergio Burdisso
View author publications
You can also search for this author in PubMed Google Scholar
Leticia Cagnina
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Errecalde
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Burdisso .

Editor information

Editors and Affiliations

Faculty of Informatics, University of Lugano, Lugano, Switzerland
Fabio Crestani
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, A Coruña, Spain
Javier Parapar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Burdisso, S., Cagnina, L., Errecalde, M., Montes-y-Gómez, M. (2022). Two Simple and Domain-independent Approaches for Early Detection of Anorexia. In: Crestani, F., Losada, D.E., Parapar, J. (eds) Early Detection of Mental Health Disorders by Social Media Monitoring. Studies in Computational Intelligence, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-04431-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-04431-1_7
Published: 15 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04430-4
Online ISBN: 978-3-031-04431-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Two Simple and Domain-independent Approaches for Early Detection of Anorexia