Identifying COVID-19 survivors living with post-traumatic stress disorder through machine learning on Twitter

The COVID-19 pandemic has disrupted people’s lives and caused significant economic damage around the world, but its impact on people’s mental health has not been paid due attention by the research community. According to anecdotal data, the pandemic has raised serious concerns related to mental health among the masses. However, no systematic investigations have been conducted previously on mental health monitoring and, in particular, detection of post-traumatic stress disorder (PTSD). The goal of this study is to use classical machine learning approaches to classify tweets into COVID-PTSD positive or negative categories. To this end, we employed various Machine Learning (ML) classifiers, to segregate the psychotic difficulties with the user’s PTSD in the context of COVID-19, including Random Forest Support Vector Machine, Naïve Bayes, and K-Nearest Neighbor. ML models are trained and tested using various combinations of feature selection strategies to get the best possible combination. Based on our experimentation on real-world dataset, we demonstrate our model’s effectiveness to perform classification with an accuracy of 83.29% using Support Vector Machine as classifier and unigram as a feature pattern.


Research contributions
The present study achieves the following four-fold contributions: • The dataset of more than 3.96 Million Tweets has been constructed from the users who mentioned on their Twitter timeline that they were COVID positive at some point between March 2020 and November 2021.• The resulting dataset has been filtered and manually annotated following International statistical Classifica- tion of Diseases (ICD)-11 28 guidelines.• The proportion of users was quantified being PTSD positive or negative based on the data filtration criteria which gives us a better understanding of users' posting behavior after they were diagnosed with COVID.• Finally, a machine learning based classification model has been proposed to effectively classify the tweets of users as either PTSD positive or negative.
Rest of the paper is organized as follows.Section "Post-traumatic stress disorder (PTSD)" discusses the PTSD and its diagnosis along with the guidelines adopted for data filtering and annotation.Section "Literature review" sheds light on the state of the art on the topic while section "Methodology" explains the proposed methodology for the study along with a brief description of our chosen classification algorithms.In section "Data extraction", we discuss our approaches of data extraction, filtration and annotation along with our findings based on the data.And finally in section "Conclusion", we conclude our findings and mention our future directions.

Post-traumatic stress disorder (PTSD)
PTSD is a type of anxiety disorder that can develop in individuals who have experienced a traumatic event, such as a car accident, war, physical, emotional, or sexual abuse, a natural disaster, or any other life-altering experience that impacts their biological or psychological state.The WHO and the American Psychiatric Association (APA) both recognize PTSD as a legitimate condition, and diagnostic criteria are provided in ICD and the Diagnostic and Statistical Manual of Mental Disorders (DSM), as well as related health problems 29 .PTSD is a conglomerate of symptoms affecting multiple domains and it is described as "the complex somatic, cognitive, affective, and behavioral effects of psychological trauma" 30 .Considering lack of physical symptoms in most cases of PTSD and the stigma attached to mental illness, a lot of times, PTSD is diagnosed in people after months of struggling with it.The fact that there is no blood test or an imaging test that can help diagnose PTSD right away is also a barrier to effective treatment being offered at an earlier stage.Population struggling with PTSD are late to be identified and they mostly come to light when they start to struggle at work, have difficulties in their relationships with others, or become addicted to drugs or alcohol to self-medicate to numb their symptoms.Once the contact is made with a psychiatrist, a thorough history of the traumatic event, the symptoms related to it, and in many cases collateral history is necessary to make the right diagnosis.
A cross-sectional study carried out on nurses exposed to COVID in China found incidence of PTSD to be 16.8%, with highest scores in avoidance symptoms 31 .Our aim in this study is to be able to cut short this lengthy process of diagnosis of PTSD by recognizing those who have had COVID and might be suffering from PTSD using their tweets.By identifying this population and predicting that they might have PTSD, they can be offered proper evaluation and optimal treatment We acknowledge the complexity of post-traumatic stress disorder (PTSD) and recognize that its diagnosis extends beyond language patterns alone.Factors such as context and personal history are integral to understanding and assessing PTSD.It's important to emphasize that the severity and impact of the traumatic incident also play a significant role in determining the presence of PTSD symptoms.
PTSD, classified as an anxiety disorder, often emerges following exposure to a traumatic event, which may involve actual or threatened death.We posit that COVID-19 presents a potential trigger for PTSD due to the profound trauma associated with the experience, coupled with the pervasive fear of mortality.Despite potential limitations, our computational analysis serves as a valuable screening tool to identify individuals at risk of

Literature review
The field of mental health detection has been the focus of numerous studies utilizing various datasets and modeling techniques to develop reliable models for detecting mental health issues.In such study by Joshi et al. 32 , a combination of deep learning and conventional machine learning algorithms was used to detect mental health issues through social media posting and behavioral features.The first stage of classification involved considering 13 behavioral features to classify users, while in the second stage, a behavioral feature called DL_score was created using a word2vec model to classify tweets.The model was trained on nearly 12 million tweets for tweet classification.Their model achieved an accuracy of 89%, with the deep learning feature extraction helping to accurately classify users as normal or non-normal, while also reducing the false positive rate.
During current pandemic, 36.6 million users tweeted almost 41.3 million COVID-19 related tweets in 2020 33 .Based on COVID-19 related tweets, the keywords like 'corona' , '#Corona' , 'covid19' etc. tweets are collected from the profile description and tweets of the users to look for the signs of depression.Among 2575 twitter users, 200 are randomly selected from the classified depression set of users and 86% are labeled as positive.Almost 1402 depression users tweeted the tweets that are chosen, posted in three months of time span.Transformer-based models such as BERT, RoBERTa, and XLNet are applied to identify depression users to monitor depression trend during COVID-19.
A study by Sekulic et al. 34 proposed a Hierarchical Attention Network (HAN) for the detection of mental disorders.This model is comprised of a word sequence encoder, a layer at the word-level attention, a sentence encoder, and a layer at sentence-level attention.Initially, users with a self-reported diagnosis of nine mental disorders were identified, and the model was trained on their posts, which were modeled as sentences.The HAN outperformed baseline models in detecting depression, anxiety, ADHD, and bipolar disorders, but performed inadequately for PTSD, autism, eating disorders, and schizophrenia.With the attention mechanism provided by the HAN, important words or phrases were easily identified and deemed relevant for classification.
The study utilized lists of n-grams derived from tweets of users diagnosed with depression or PTSD, which were used to train a classifier to rank tweets of other users as positive or negative for depression or PTSD 35 .The dataset consisted of tweets from 327 random Twitter users, out of which 246 users reported a PTSD diagnosis and had at least 25 tweets.The tweets related to each condition were randomly selected, and the first eight million words of tweets were used in the training data.The features were selected based on their frequency of occurrence, with n-grams that occurred 50 times more in a single condition being included.This selective approach to feature selection helped to improve the results and provide greater insight into the identification of mental illness via social media posts.
To identify individuals who may be experiencing depression, a group of Twitter users who had self-reported their diagnosis of depression via tweets were selected using the Twitter streaming API 36 , with regular expressions and data acquisition techniques being used over a four-month period.To ensure a balanced dataset, authors selected equal number of positive and negative instances representing depressive and non-depressive tweets respectively from 600 randomly selected users to perform the experiments.Features for emotions were extracted, and strength scores were assigned to create emotion-based features, while time-series analysis was applied, and descriptive statistics were selected as temporal features.The resulting model achieved an accuracy of 87.27% on emotion features alone, outperforming baseline models 18,37,38 .When different temporal features were used, the accuracy was improved with 89.77%, and when both, i.e., emotion and temporal features were combined, the accuracy increased to 91.81%.These findings suggest that basic emotions can be used to identify individuals who may be experiencing depression on Twitter.
To conduct their research, the authors utilized a widely recognized dataset in the fields of computational linguistics and clinical psychology 39 , which comprised of three types of Twitter users: those who self-reported a diagnosis of depression, those who self-reported having PTSD, and a control group of users matched in terms of demographics 40 .The dataset consisted of 3000 tweets, which were manually reviewed to eliminate irrelevant information.The authors then conducted a qualitative analysis to identify instances of misclassification in their approach, discovering that some false positives arose from the use of language that displayed anger or frustration, while other false positives were linked to music, bands, or artists associated with the positive class.The authors emphasized the limitations of using similar machine learning systems and the importance of not relying solely on automated classifiers to determine an individual's mental health status on social media platforms.
In another study by 43 , the classification of mental illness from social media texts using deep learning and transfer learning was investigated.The authors aimed to develop a machine learning model to identify the presence of mental illness in text data from social media platforms.The model was trained on a dataset of social media texts annotated for mental illness and evaluated using multiple metrics.The results showed that the transfer learning approach outperformed traditional deep learning methods in terms of accuracy in classifying mental illness in social media texts.This study highlights the potential of deep learning and transfer learning for mental health screening and intervention through social media platforms.
A number of studies have been conducted using machine learning algorithms to predict PTSD and depression in various populations.For example, Reece et al. used RF algorithm to analyze 243,000 Twitter posts related to PTSD and achieved an AUC score of 0.89 in predicting the disorder 12 .
Another study conducted by Leightley et al. focused on identifying PTSD among military personnel in the UK by applying machine learning techniques.They achieved an accuracy of 97% with RF 44 .Papini et al. used gradient-boosted decision trees to predict PTSD in 110 patients with the disorder and 231 trauma-exposed controls, achieving an accuracy of 78% 48 .Similarly, Conrad et al. applied RF using Conditional Interference (RF-CI) and Least Absolute Shrinkage and Selection Operator (LASSO) to predict PTSD survivors of a civil war in Uganda, with RF achieving the highest accuracy of 77.25% 45 .
Marmar et al. used RF to predict PTSD with an accuracy of 89.1% with an AUC of 0.954 from audio recordings of warzone-exposed veterans 46 .Vergyri et al. used Gaussian backend (GB), decision trees (DT), neural network (NN) classifiers, and boosting to predict PTSD from audio recordings of war veterans, obtaining an overall accuracy of 77% 47 .
According to 42 , a noteworthy investigation was conducted to detect PTSD among cancer survivors using Twitter data.The researchers utilized a convolutional neural network (CNN) to learn the representations of the input tweets containing the keywords "cancer" and "PTSD" to identify cancer survivors with PTSD.The results demonstrated that the proposed CNN was effective in detecting PTSD among cancer survivors and outperformed the baselines.The authors suggested that it is crucial to evaluate and treat PTSD in cancer survivorship care, and social media can act as an early warning system for PTSD in cancer survivors.The study emphasizes the importance of early detection and treatment of PTSD in cancer survivors.
Our research builds on earlier studies by focusing on PTSD in people who have survived COVID-19, aiming to better understand the psychological effects of the pandemic.To address the mental health needs that haven't been met for these survivors, we have used a unique approach.As shown in Table 1, we compare our study to previous work to highlight how our investigation is different.Unlike other research that looks at various groups, we specifically analyze how COVID-19 has impacted mental health using information gathered from Twitter.Our method successfully pinpointed PTSD in 83.29% of cases, proving to be a valuable tool in understanding how this global crisis affects mental well-being.

Methodology
After thoroughly reviewing state-of-the-art techniques, we have proposed a classification framework as shown in Fig. 1.
The stages of our proposed system are as follows: (i) data extraction and filtering (ii) data annotation based on ICD-11 guidelines (iii) preprocessing and splitting data into train and test dataset (iv) extraction of features and (v) training and evaluation of our ML model.

Data extraction
The first step of data extraction is to identify those users who, mentioned on twitter that they were covid positive.
We collected the data including tweets from Twitter using official Twitter API for academic research with the search query "#Covidpositive OR #Covidsurvivor OR #CovidFree OR #CovidRecovered OR #ConqueredCovid OR #DefeatedCovid OR #OvercameCovid".The data was collected from 01-March-2020 till 30-November-2021.We ended up with 90,330 usernames who posted tweets using either of these hashtags during this period.However, the unique usernames were 70,646.We applied the sample size (n) calculation formula provided by 49 given below as (1), on population (N) of 70,646.
where e is the margin of error and we choose it to be 5%.By applying Eq. 1, we got ≈ 177 users.We randomly choose 177 users from the previously extracted data, and we extracted the tweets timeline of these users using the aforementioned timeline.We were able to extract 3,958,836 tweets ( ≈ 3.96 Million) out of which 2,155,577 Tweets were in English language ( ≈ 2.15 Million).Furthermore, the focus of the model was solely on the text content of the tweets for classification purposes, without relying on any demographic information.This data is visualized in Fig. 2 for both, years and categories.
To filter the data, we used a set of keywords inline with the ICD-11 guidelines.The breakdown of tweets against each keyword is mentioned in the Table 2.
To further understand the posting behavior of users about previously mentioned PTSD categories, we have visualized the flow of users, in Fig. 3, across three intervals of seven months each from our selected timeline.
The intervals include data from (i) March 2020-September 2020 (ii) October 2020-April 2021 (iii) May 2021-November 2021.We see a large fraction of users contributing to Avoidance category, and then the second most contribution is to Non PTSD Tweets, i.e., which did not belong to any of the previously mentioned categories.In fact, these two remains the only categories about which most users have posted and they have further continued posting about either of them.The only small presence is of Hyperarousal in second interval.The interesting fact to be noted is that a large fraction of users have remained in their respective categories across all intervals, however, a small number of users switched their categories in second interval and approximately the same number of users came back in their initial category.We can say that majority of the users were found to not have PTSD symptoms after they were diagnosed with Covid.
The total of tweets after filtering was 89,647 as per the breakdown mentioned in Table 2. To make sure that we do not have too much and too low representation of one particular keywords, we chose to calculate the 5th and 96th percentile of these numbers to remove the too low and too high, respectively, occurrence of tweets against keywords.The 5th percentile is 3.3 for this data, and 96th percentile is 9715.92.Based on these values, we will exclude values lower than 5th percentile, i.e, Hypervigilant greater than 96th percentile, i.e, Low, therefore, the final set of tweets were 16,704, used for the annotation of data.This breakdown of data for categories mentioned in Table 2 and after removing the outliers in aforementioned categories, is shown in Fig. 4. Since the change is   www.nature.com/scientificreports/ 5.After implementing step four, we utilized a Porter stemmer to perform text stemming.This step is vital in reduce or minimize the dimensions of the features since a word can exist in multiple forms with different meanings in natural language (e.g., singular and plural).By stemming the words, we reduced them to their base form.6. Steps 1-5 were repeated for both classes.

Classification algorithms
In this article classical machine learning algorithms are used for classification problems.All the four algorithms are discussed in this section.
• Support Vector Machine (SVM) is a commonly used technique for text categorization 50,51 .It employs multi- dimensional hyperplanes to accurately differentiate between different labels or classes 52 .SVMs are particularly useful in high-dimensional spaces, making them the most practical classifier for such scenarios.Additionally, SVMs offer fair predictive performance even with small datasets due to their relative simplicity and versatility in handling a wide range of classification problems.SVMs are widely used in brain disorder research utilizing multivoxel pattern analysis (MVPA) due to their simplicity and lower risk of overfitting.In recent times, SVMs have been applied in precision psychiatry, particularly in the diagnosis and prognosis of brain diseases like Alzheimer's, schizophrenia, and depression 53 .• Naïve Bayes (NB) is a machine learning algorithm that utilizes probability to classify data.It calculates the likelihood of a given piece of text belonging to a particular class based on the computed class labels.The classifier has been successfully employed in several studies for text classification [54][55][56][57] .We chose this classifier for its ease of use and superior performance in earlier studies 58,59 .The algorithm performs a sequence of probabilistic computations to determine the best-fitted classification for a given piece of data.Suppose x is a set of n attributes, such that X = x 1 , x 2 , x 3 , ..., x n where X represents the evidence, and H represents the hypothesis that the data sample X belongs to a certain class C. The likelihood that the hypothesis H holds given the evidence X can be computed using Equation 2. The Bayes theorem explains the preceding logic as follows: • K-Nearest Neighbors (KNN) is an instance-based machine learning classifier based on the concept of simi- larity.It determines a class's similarity to a feature using the Euclidean equation and the value of K.The algorithm stores all cases and uses a similarity score to identify new examples.The similarity between a new text and the training data is recognized and calculated, and the texts with the highest similarity are chosen.Finally, the class is identified using K neighbors.However, when K is a large value, the computation required to determine the most suitable class becomes difficult 60,61 .• Random Forest (RF) is a supervised learning approach which was proposed by Ho in 1995 62 .It involves constructing multiple decision trees that work in unison, with decision trees serving as the building blocks.During pre-processing, nodes are selected for the decision trees.A random subset of features is used to determine the best feature, and a decision tree is created based on the input vector to classify new objects.Every decision tree is used for classification, and the algorithm assigns tree votes to each class.The class with the most votes from all the decision trees in the forest is selected as the final classification.RF has many advantages over other classifiers such as SVM and NB.For example, RF can handle noisy and missing data, it is robust to overfitting and can work well with high-dimensional data 58 .RF can also provide information about the relative importance of the features used in classification, which is useful in feature selection and understanding the underlying data structure.In the field of text classification such as sentiment analysis, categorization of news posts, spam filtering etc., RF has been widely used with significant results 58,63 .Additionally, RF has been applied in feature selection for text classification by using the Gini impurity index, which measures the importance of a feature by the reduction in the impurity of the resulting classification tree 62 .

Experimental evaluation
We performed our experiments on a specific dataset collected through the process mentioned in section "Data extraction".Here in this section we discuss the details about annotating that data and the evaluation metrics we used to evaluate the results of our proposed model.

Data annotation and metrics of evaluation
We used ICD-11 criteria for diagnosing PTSD.Being infected with COVID-19 was identified as a triggering event, and then we looked for symptoms under three core domains outlined in ICD-11 28 including re-experiencing, hyperarousal, and avoidance behavior.Apart from these three core domains, we also looked at other affective or mood symptoms, its impact, and the treatment availed by the population being studied.Once tweet timelines were extracted once they were identified as "Covid Positive" according to the criteria mentioned in section "Data extraction" PTSD keywords mentioned in Table 2 were used further to filter the most relevant tweets according to the ICD-11 criteria.
• Tweets which had both their COVID-19 status as well as one of the PTSD keywords mentioned were con- sidered as "PTSD Positive".• All those tweets that mentioned PTSD keywords but in relation to any other event rather than COVID- 19  were not taken into consideration and were deemed "PTSD Negative".
(2) P(H|X) = (P(P|H)P(H))/P(X) In addition to these metrics, the Area Under the Curve (AUC) is commonly used as a performance metric to evaluate the classifier's ability to distinguish between positive and negative classes.AUC represents the area under the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.
The equation for AUC calculation can be written as: where TPR i is the True Positive Rate at the ith threshold, FPR i is the False Positive Rate at the ith threshold, and N is the number of thresholds.
In the next section, results are reported and discussed by executing the proposed framework on our dataset.

Experiments and comparison of classifiers
Let us consider the following feature patterns mentioned in Table 4 against which we have computed our results using previously mentioned classifiers.
Using the above feature patterns in combination with aforementioned classifiers in section "Classification algorithms", we performed our experimentation and evaluated the results based on evaluation metrics mentioned in section "Data annotation and metrics of evaluation".In Table 5, we reported our findings using NB classifier.
NB achieved the maximum accuracy of 81.86% with UxBxT as feature pattern.Meanwhile, it is notable that accuracy is > 81% in all combinations where U is present.Otherwise, the performance have declined.
In Table 6, the results computed using kNN are mentioned. ( Table 4. Feature patterns with their abbreviations.

U× B
The product of two sets U and B using the Cartesian method

B× T
The product of two sets B and T using the Cartesian method

T× Q
The product of two sets T and Q using the Cartesian method

U× B× T
The product of three sets U, B and T using the Cartesian method

U× B× T× Q
The product of four sets U, B, T and using the Cartesian method Similar to NB, kNN achieved highest accuracies with U or its combination with other feature patterns, achieving accuracy > 74% in its combinations.The maximum was 76.61% with U.
Tables 7 and 8 report the findings by SVM and RF, respectively.Among all the classifiers we have used in this study, SVM has outperformed other three with highest classification accuracy of 83.29%.Meanwhile, NB comes second with 81.86%, whereas RF and kNN are at third and fourth place with 80.67% and 76.61% accuracy respectively.The findings by SVM and NB are consistent with those by RF and kNN in terms of better accuracy with feature patterns U or its cartesian product with another pattern.All classifiers have performed better with U or UxB or UxBxT or UxBxTxQ.When U is not among the feature patterns, the accuracy has declined among all classifiers.While it is evident that our preferred classification model had 83.29% accuracy, it is important to note where the model misclassified the tweets.As mentioned before in the guidelines that we followed for annotation, states that the tweets which have both, i.e., PTSD related keywords and information related to Covid-19 were labelled as PTSD Positive and others as PTSD Negative.And we encountered instances where tweets containing keywords related to PTSD were labeled as PTSD Negative if they were not specifically related to Covid-19.This approach occasionally resulted in misclassifications, particularly false positives where tweets were incorrectly identified as PTSD positive.
Findings reported in Tables 5, 6, 7 and 8 are visualized in Fig. 5 for better comparison of results.The best performing algorithm i.e., SVM, which got highest accuracy of i.e., 83.29% with U, it turns out that it did not perform so well with B and BxT where the accuracy declined significantly.Similarly, as reported in 58,59 , U gives us low computational cost because of less number of features produced as a result of applying TF-IDF, it is a preferred choice for the model to be used for final act of classification.

Conclusion
In this study, we performed our analysis to understand the post COVID-19 mental health dynamics and tweets consumption of COVID-19 positive users.We identified them by using a set of hashtags reflecting the positive diagnosis of Covid.We then extracted their Twitter timelines and performed our analysis on more than 3.96 Million pieces of content produced between March 2020 and November 2021.Our findings suggest that post circulation related to "Other Affective & Biological Symptoms related to PTSD" category is higher than other categories.However, we noticed that a large fraction of users shifted their behavior from "Avoidance" to "Non PTSD Related" and vice versa.We used ICD-11 guidlines to filter and annotate our tweets and developed a machine learning based classification model to segregate our tweets into either PTSD positive or PTSD negative.We got our best results with SVM on unigram as feature pattern with 83.29% accuracy.We also acknowledge that our study's concentration on English-language tweets may restrict the usefulness of our model for other languages or platforms with different ways of expression.We're taking this into account for future research plans.In future, we further aim to extend this work by (i) extending the dataset of PTSD Positve tweets and (ii) extracting all the replies/comments on them to (iii) create a model to effectively understand/classify the sentiments of users on those posts.

Figure 3 .Figure 4 .
Figure 3. Flow of users across PTSD categories over time.

Figure 5 .
Figure 5. Accuracy comparison of all classifiers.

Table 1 .
Comparison of results with previous studies.

Table 2 .
Tweets and engagement count of english and other languages.Set of keywords to filter the tweets.

Table 5 .
Results obtained by NB.Significant values are given in bold.

Table 6 .
Results obtained by kNN.Significant values are given in bold.

Table 7 .
Results obtained by SVM.Significant values are given in bold.

Table 8 .
Results obtained by RF.Significant values are given in bold.