Sentiment Analysis based Emotion Extraction for COVID-19 Using Crawled Tweets and Global Statistics for Mental Health

The unpredictable and crucial challenges that occurred because of the COVID-19 pandemic disease have taken a gradual upsurge impacting over 213 countries across the globe. Different countries have taken several measures to get control over it like Lockdown, Curfews, Travel ban, etc. but still the cases were increasing and the situation was getting worse globally during some period of time. The impacts on the financial, social, and physical aspects of several citizens resulted in their psychological and mental health issues. In this work, we have quantitatively analyzed the depression, stress, and suicide cases during the period of COVID-19 globally and especially, in India. The global data including tweets (collected using a Scraper) is used for analysis. The data have been analyzed on Tableau and; sentiment analysis for extracting emotions in tweets has been performed using Python. Tweets are analyzed to extract the emotion of people in terms of Fear, Sadness, Anger, and Happiness. With total collected Tweets of 819678 from Jan 2020 to March 2022, it is found that people are more into Fear and Sadness with 59.3% and 28.9% scores respectively.


Introduction
The COVID-19 crisis has led many people towards unemployment, pay losses, and insecurity of their earnings. The SARS-CoV-2 COVID-19 was first reported in the Wuhan city in China and its sudden spread captured many countries globally (Fei et al., 2020). According to the report titled, "Rebooting 2020: A story of COVID-19 and shifting Perceptions" [20], with the breakout of this disease, the issues related to mental health also have risen up drastically. According to the Psychiatric Society of India, a 20% increase in mental-health-related cases has been observed due to relationship stress, work, and financial conditions. The challenges of maintaining the living standard, taking care of the family, fulfilling the needs of family members, and all that would constrain resources have increased the level of stress for an individual. As the disease has not yet been eradicated or even been effectively contained (at the time of writing this article) and there is still a continuous rise in the number of cases globally, this is indeed pushing this stress to a higher level resulting in mental health problems. The growing concerns of job loss, working from home, fear to contract the virus, etc. gradually raised the cases of anxiety, stress, and depression.

Introduction
The COVID-19 crisis has led many people towards unemployment, pay losses, and insecurity of their earnings. The SARS-CoV-2 COVID-19 was first reported in the Wuhan city in China and its sudden spread captured many countries globally (Fei et al., 2020). According to the report titled, "Rebooting 2020: A story of COVID-19 and shifting Perceptions" [20], with the breakout of this disease, the issues related to mental health also have risen up drastically. According to the Psychiatric Society of India, a 20% increase in mental-health-related cases has been observed due to relationship stress, work, and financial conditions. The challenges of maintaining the living standard, taking care of the family, fulfilling the needs of family members, and all that would constrain resources have increased the level of stress for an individual. As the disease has not yet been eradicated or even been effectively contained (at the time of writing this article) and there is still a continuous rise in the number of cases globally, this is indeed pushing this stress to a higher level resulting in mental health problems. The growing concerns of job loss, working from home, fear to contract the virus, etc. gradually raised the cases of anxiety, stress, and depression.
During the early days when COVID-19 first started spreading in India, the early steps taken by the Government of India got a lot of appreciation but along with the same, the people who were working as laborers, or were on per-day wage, got affected immediately. Especially, those who were living in other states of India (than their own states), had to go back to their home because of having no money to pay rent and even to survive day to day. It affected them and their families a lot and many issues rose up during the first lockdown period. Then, the industries and other sectors started thinking about providing "work from home" options to the employees which has become a new normal by this time. In many countries of the world, this is more or less the same trend observed. Even, Saudi Arabia's economy faced a lot of crises during this pandemic situation, which affected a lot of foreign workers' earnings who faced the choice of having no earnings (or little earning) or going back to their homes in India, Pakistan, Bangladesh, Philippines, and so on. Eventually, their mental health got significantly deteriorated. According to World Health Organization [28], there is a requirement of investing in mental health issues to control the cases [15]. The lack of resources in hospitals made it very difficult for healthcare officials to take action on time for COVID-19 patients. What once was unthinkable, very soon became a realitywe started seeing people not only with financial damage but also with mental damage. In some other parts of Asia, especially like in Thailand, the Philippines, and Indonesia, there were reports of a steady increase in mental health-related problems among people (Issue Brief, 2020). Figure 1 shows how the pandemic can affect the mental health of a person. It becomes a need for common people that job-holding people would get support from their employers in terms of finance and would not get their wages cut or termination from the job, and so on. However, the COVID-19 pandemic [4] has scared many individuals globally, creating various types of 'divides' among people, and there is a need to take action so that people get enough strength to handle such situations effectively. In this paper, we address these issues and provide our analysis based on practical situations and actual data.
Even though in the existing literature, there are several works about sentiment analysis from Twitter or other Social media during COVID-19, some of the points that differentiate our work from those are listed below: -In order to collect the tweets in real-time, a snscrape module [25] [17] of Python (an open-source library) has been utilized for scraping the tweets. However, in many other works, the pre-collected Tweets have been used.
-The type of sentiment analysis done in our work is Fine-Grained Sentiment Analysis that not only works in finding out positive, negative, and neutral polarities but also extracts emotions.
-Multiple analyses have been done in this work focusing on technical as well as psychological aspects. After this introduction, the rest of the paper is organized as follows: Section 2 describes various types of depression for the general knowledge of the readers. Section 3 reports the past and recent observational analysis related papers. Then, the methodology for data collection and used tools are reported in Section 4. Section 5 presents the results and analysis and finally, Section 6 concludes the paper with possible future works.
It is well known that the Wuhan city in China is reported to be the epicenter of the SARS-CoV-2 COVID-19 [10]. The study of history is important to understand how the pandemics can affect various aspects of life hence, initially, we conducted some studies on the deadliest pandemic so far, which was the Bubonic Plague [27]. As estimated, that plague caused nearly 200 million deaths worldwide. Then, the Spanish Flu, Plague of Justinian took nearly 50 million lives, and then came many others like HIV/AIDS (human immunodeficiency virus/acquired immunodeficiency syndrome), Swine Flu, and Asian Flu which proved to be very contagious. The study on Epidemic and Pandemics in 2005 [13] presents different cases of diseases that occurred in history. The author explores the background, time, place, reason, effects of pandemics on human life, and how they changed the standard of living of humans from time to time. The book contains 50 different notable pandemics and epidemics in history and studies the adverse effects. Some unsolved historical issues have also been discussed in the book.
Irrespective of the previously recorded cases, the outbreak of COVID-19 has put healthcare professionals in a very much difficult situation [7]. Even with so much claimed progress in the fields of medicine and epidemiology, even the top scientists and medical professionals found it extremely difficult to contain its spread or even to decide what to do for several months (which has turned into years by this time!). Never before, we had such a connected world where an individual can travel to the other part of the world so quickly. Hence, this connectivity also caused the widespread of this virus within such a quick time.
A survey has discovered a large number of publications (within a short span of time) from Scopus and Embase, which are done to understand the impact of the disease and present some observational studies. The preliminary study shows that there are very common symptoms of depression, and anxiety [23] and reported stress in the range of 8% to 16% which are quite common reactions to the pandemic of COVID-19 [19]. The researchers concluded that mental health problems have risen up (according to the study performed). Also, it is reported that the frontline health workers got affected mentally at a very high rate due to the COVID-19 pandemic. A research work by Snehil and Swapnajeet in 2020 [22] performs an analysis of different literature with keyword searches like epidemics, infection, syndrome, COVID-19, etc. The study has been done on 127 articles that were related to SARS (Severe Acute Respiratory Syndrome) and other deadly pandemics. Another observational study has been conducted by recruiting 485 participants from MTurk [9] to understand the adversity related to pandemics, flexibility based on psychology, and demographic characteristics. The results of the study show that transdiagnostic interventions can be helpful during this pandemic situation of COVID-19. It should be noted that a "transdiagnostic process" is the generic name given to a mechanism that is evident across disorders, and which is either a risk factor or could be a maintaining factor for the disorder.
A study has been conducted in the form of a survey from people of different categories like age, age group; contact with COVID-19 patients, etc., and replies from people [14]. This study captures three parameters to study: infection risk, the challenge at work, and social change. To study the resilience of people in different situations, work has been done in 2022 [8] by Christiaan and his co researchers. Resilience is studied in this work amongst the general population, with various psychiatric conditions, amongst healthcare workers, and at a societal level. Indeed, a psychiatric disorder in the healthcare workforce has become one of the common concerns for people during COVID-19 [26].
An observational study has been done by researchers to check the mental health of physicians due to COVID-19. They have collected data from physicians. Interestingly, it has been found that married physicians have a low level of stress in comparison to unmarried ones. The mean score observed in females was higher than in males [1]. The study on the mental health of Chinese people has been conducted by collecting data from 7,236 participants to check the disorders, symptoms of depression, quality of sleep [12], and the overall prevalence of various factors. The study shows 35.1% for anxiety disorder, 20.1% for symptoms of depression, and 18.2% for the low quality of sleep [29]. Digital approaches to tackle mental health issues are also available like smartphone applications, chatbots, and so on [6]. There are different semantic ontologies to handle the data with vocabularies in a controlled manner to work with semantic analysis of the pandemic situation [24]. Another work to study the effect of COVID-19 on children has been done by a researcher [16] which is comprised of a framework designed to handle the mental health of children. [11] The study considered twelve variables to handle four factors. Other notable recent works on psychological impacts and recommendations of what is to be done are reported analyzed and reported by different researchers. [3] [18].   Figure 2 depicts the steps (flowchart) that we have followed for the analysis of obtained data related to the pandemic. As shown in the figure, there are mainly three stages for the flow: Identification, Analysis, and Representation. Identification includes collection of data through Google Trends and the Tweets on Covid. Analysis is done by first performing pre-processing on tweets by tackling missing data, changing CASE of text, Duplicacy removal and performing Stemming. Then, further analysis is done as Emotion extraction from the tweets. Another analysis is done on Tableau for understanding mental health of people during COVID-19. We have collected data from multiple sources. Especially, the collection of data has been done manually by the collaborators, who used various newspapers and news channels as the sources. Again, data has been collected with the help of Scraper (Towards, 2021) for the collection of tweets. These tweets have been collected monthly during the period January 2020 to March 2022. The attributes for the collected data are shown in the diagram presented in Figure  3. Lastly, the search interest of people has been analyzed by using Google Trends to understand the changes in their mental health. The search interest of a person shows the particular area of interest which they want to search. For the same Google Trends, search has been utilized in this work and analyzed on the basis of parameters like Stress, Anxiety, Depression, Therapy, and mental Health (for checking the Search Interest of People). This is done in Analysis-II as described in Section 4.

Results and Analysis
Our work is concentrated on analyzing data available on the Internet. It should be noted that the data was not taken from any particular website but the collection involved multiple different types of resources.

Analysis: Tweets on Coronavirus
The tweets have been collected using a Scraper with search parameter of coronavirus. Figure 4 shows the number of tweets that occurred while searching for the keyword "coronavirus" from January 2020 to June 2020 and July 2020

Represe ntation
Pre-processing of Tweets to December 2020. The data collected from Twitter using Python Scraper has been analyzed on Tableau [5]. The collection of tweets extracted is analyzed with the help of Python-based Sentiment Analysis [21] to check the emotion and sentiments of people towards COVID-19 (i.e., related to stress and anxiety in this case).
(a) (b)  People became highly conscious regarding mental health issues during this time of the COVID-19 crisis and even the number of search interests on Google [2] related to stress, anxiety, depression, mental health, therapy, etc. increased a lot. For our work, we have also analyzed the search interests of people related to stress, anxiety, and depression. This has shown some interesting results which are discussed later in this paper. As noted before, the collected tweets have been analyzed with the help of text2data [21].  Table 2 shows a sample table of collected tweets by using Python Scraper and Table 3 shows a sample file of extracted emotion on the collected tweets. The Word Cloud is created to check the frequently occurring terms/words during the COVID-19 crisis which are shown in Figure 5. It is evident from the figure that people's mental health got affected especially by the COVID-19 related terms and situation. As for the tweets, our analysis shows various emotions of people. Figure 6 shows the number of tweets collected from January 2021 to June 2021 and July 2021 to December 2021 for the keyword search of "coronavirus". Figure 7 shows the number of tweets collected from January 2022 to March 2022. Other tweets related to lockdown and COVID-19 crises show that people's mindset has changed significantly during this pandemic. Even tweets related to thoughts and terms like 'suicidal', 'depressive', 'sad', etc. increased indicating the need for some kind of mental health therapy for a large section of people.
We have analyzed collected tweets by performing Emotion Extraction using Sentiment Analysis and collected values for different sentiment-based emotions including Sadness, Fear, Anger, and Happiness. People's search interests for different words like depression, anxiety, stress, mental therapy, and job vacancies have been analyzed through the Google database. The analysis clearly shows that the people are highly in fear and sad about the COVID-19 situation. People feel anxious since the beginning of this pandemic which has affected people mentally also. The factor of unknown and then measures for restricted movements really affected a large group of people. The average of each category is taken in total to analyze the overall sentiment of people and it is visible in Figure 8 that people have more fear and sadness in the COVID-19 situation which is worth focusing on. can hamsters infect humans with covid a new study says these furry pets reintroduced the delta variant to hong kong n nthey are the only known animal aside from minks that can spread the novel coronavirus to people n nreport by covid n n 4 nyc to provide free home delivery of covid treatments mayor adams n nnew yorkers with covid are eligible for free home deliveries of antiviral treatments mayor n nadd your highlights n n covid coronavirus 5 the problem in indonesia has been reluctance to do extensive lockdowns which has led to one of the highest death tolls in asia the official death toll from coronavirus is far too low can hamsters infect humans with covid a new study says these furry pets reintroduced the delta variant to hong kong n nthey are the only known animal aside from minks that can spread the novel coronavirus to people n nreport by covid n n Fear 3 nyc to provide free home delivery of covid treatments mayor adams n nnew yorkers with covid are eligible for free home deliveries of antiviral treatments mayor n nadd your highlights n n covid coronavirus Sad 4 the problem in indonesia has been reluctance to do extensive lockdowns which has led to one of the highest death tolls in asia the official death toll from coronavirus is far too low   Figure 9 shows the category-wise average scores of Fear, Sad, Anger, and Happiness in collected tweets of people and it is clearly noticeable that Fear and Sad got relatively greater chunks as compared to Angry and Happy.

Conclusion and Future Scope
COVID-19 crisis has greatly impacted almost every key sector globally and affected almost every person's life (except a few people in remote islands or remote villages). The human mind is mysterious and it reacts according to different situations. The analysis of tweets on COVID-19 shows that people are more into fear and sadness than ever before. Interestingly, in a connected world in this era, this common mood has become prevalent almost every place on the face of the earth.
Consequently, the mental health of the people of these countries has been greatly affected. Future work in the field will be taken up to analyze how effectively the different countries handled the pandemic situations. We would like to analyze how the policies adopted by different governments contributed to improving the mental health of their people. The utilization of Artificial Intelligence (AI) and Machine Learning (ML) in the field can be proven to be helpful and the same can be utilized for the analysis of pandemic situations in a more precise manner. It will be interesting to know how technological advancement in the world helped tackle difficult medical situations. Along with the same, we plan to analyze tweets to detect sarcasm during the COVID-19 situation. The sarcasm detection in sentiment analysis can be highly effective to optimize the Sentiment Analysis. Our work is concentrated on analyzing data available on the Internet. It should be noted that the data was not taken from any particular website