Data in Brief

were used. The retweets were excluded using the negation operator, and the tweet ids and user ids were extracted and shared with public. Approximately, 1.79 percent (43047 number) of tweets were geotagged. To visualize the geotagged tweets, the longitude and latitude of the bounding box coordinates were averaged. This work will help researchers shed light on the news, patterns, and on-going discussions of Monkeypox on social media, identify hotspots, and help contain the Monkeypox virus.

This data was gathered using Twitter developer's academic researcher API.The full archive search endpoint that returns all the tweets available with a certain query was used to gather all the tweets, except the retweets gathered with keywords Monkeypox or "monkey pox" or "viruela dei mono" or "variole du singe" or "variola do macoco", from May first to December twenty-fifth, 2022.A number of 2400202 tweet ids and user ids were shared with the public.Data format: Raw Filtered (Retweets are excluded) Description of data collection: One limitation to this dataset is that it was gathered from May first to December twenty-fifth 2022.Tweets posted in the future cannot be included in this dataset.Another limitation is that due to Twitter developers' privacy policy agreement only tweet ids and user ids can be shared with the public.To acquire the actual tweets and other metadata the tweets ids need to be hydrated.Data source location: The dataset includes all the geotagged and non-geotagged tweets posted in any language from any country and location.Data accessibility: The dataset is available at Mendeley: The dataset includes only tweet ids and user ids in compliance with Twitter developer's term of use and privacy policy [1] .To retrieve the actual tweets and other metadata, create data, number of retweets, number of likes, etc, the tweet ids have to be hydrated.DocNow is one user-friendly software that hydrates tweet ids [2] .After installation, DocNow should be authorized using the Twitter API key generated for your Twitter developer's account.
Next, the file containing the tweet ids is uploaded to the software.By default, the tweets and their metadata are returned in .json.However, it can be set to return in other formats such as .csv,as well [3] .

Value of the Data
• The COVID-19 pandemic has created havoc throughout the world.After more than two years, just when the Non-Pharmaceutical Interventions (NPI) are being lifted, and the world needs to recover from the damages caused, a new virus, Monkeypox, emerges in more than 20 countries, and threatens the globe to a new pandemic.
• NPIs have canceled or postponed many surgeries, diagnostic tests (e.g.cancer, MRI, and CT scans) and procedures (e.g.orthoptics, pediatrics, and dentals), causing a great number of patients to fall out of their timeline [4] .Moreover, the number of patients from chronic diseases such as diabetes, hypertension, and cardiovascular disease have increased [5 , 6] .
Mental health disorder has escalated in adults, as well as children and adolescents, especially in healthcare workers [7][8][9] .Worst of all, global economy is facing a recession, substantially in lower and lower-middle income countries [10] .The world cannot bear another catastrophe.
• It is critical to contain the Monkeypox virus and extinguish the menace.Twitter has previously been successful in early warning systems for outbreaks [11] , trend prediction [12] , hotspot identification [13] , and misinformation and fake news detection [14] .This dataset could help researchers advance studies concerning Monkeypox and provide further insights to bring the outbreak under control [15] .
• Researchers from Data Science, Computer Science, Social Science, Mathematics and Statistics, Medicine and even Economy can use Twitter data further to understanding misinformation/disinformation regarding Monkeypox [16] , stigmatization of Africans and LGBTQ + for spreading Monkeypox [17] , understanding topics of public concern regarding Monkeypox [18] , and predicting the trends of Monkeypox [19] .
• The results of the studies could be used by decision-makers to inform more targeted policies, and health officials to provide better services suitable for all communities especially vulnerable and marginalized populations.
• Social media platforms such as Twitter are increasingly being used by public to discuss their opinions, concerns, and experiences.This dataset could help researchers understand the popularity of Twitter posts over time, locations and hotspots where people are more concerned, the discussed topics at their hotspots, and sentiments/emotions of the topics of concern.
• Previously, a Twitter dataset was prepared for Monkeypox in June 2022 [20] .However, the dataset includes 68934 tweets and is gathered with RapidMiner [21] , not Twitter API, and does not include all the tweets available with the utilized keywords.This dataset includes 2400202 tweets gathered with a Twitter API academic researcher account that contains all the tweets available with the keywords used from May 1 to December 25, 2022.Thus, it could provide better insights on popular discussion and help studies regarding Monkeypox concerns be less prone to error.

Data Description
Each line in the file Monkeypox_May1_to_Dec25_2022.csv is associated with a defferent tweet and includes two columns, TweetID and AuthorID which represent the tweet id and the user id.The file includes 2400202 lines in total.To access the actual tweets and their metadata, the tweet ids need to be hydrated.One software that can hydrate the tweets is DocNow hydrator [2] .After installing the software, in order to use it, one must have a Twitter account.Using your Twitter account, you get a Twitter API key that is used to authorize the hydrator.When the hydrator is authorized, the file containing the tweet ids is feed into it.In the add tab the "select Tweet ID File" should be selected to upload the file.Next, a name is set for the hydrator file and "Add dataset" is clicked.Finally, by clicking on start button the hydration process begins.The files are saved in a .jsonfile by default.However, it is possible to save the files in .csvformat as well [3] .
The tweets belong to 69 different languages.Roughly, 81.82 percent (1963797 number) of the tweets are in English.Table 1 presents the ten languages that include a higher portion of the tweets with examples.2 shows the ten countries which had the highest percentage of the geotagged tweets.More information on the geotagged tweets is available at [22] .
Twitter as one of the most popular social media platforms is capable of providing researchers with information to understand the global situation better, and help reduce the number of cases.
Therefore, in this work, a dataset containing all the tweets posted since May first to December

Table 1
The portion of the tweets belonging to each language with examples twenty-fifth 2022 is presented.This dataset can be updated in the future and help researchers overcome various issues regarding the current Monkeypox outbreak.

Experimental Design, Materials and Methods
Twitter API academic researcher account returns all the tweets available with a certain query and allows the user to retrieve ten million tweets per month.The full archive search of the Twitter API academic researcher account was used to retrieve the tweets.This endpoint accepts   a query as input which includes a set of keywords and returns all the tweets and their metadata that match the keywords.Since European countries are the hotspots for current Monkeypox virus, the keywords used to build the query included Monkeypox and its equivalents in Spanish, French, and Romanian, i.e., Monkeypox or "monkey pox" or "viruela dei mono" or "variole du singe" or "variola do macoco".In addition, the retweets were excluded using the negation operator, -is:retweet.The tweets were gathered from May first to December twenty-fifth 2022, and 2400202 number of tweets were retrieved.Other than the actual text, the metadata obtained included tweet id, conversation id, in reply to user id and in reply to username (in case of the tweet being a reply), created at, type (i.e.tweet, replied to, or quoted), language, retweets count, reply count, like count, geo id, geo-country, geo-province/city, geo-coordinates, author id, author name, author username, author description, author-reported location, author hashtags, created account at, follower count, following count, tweet count, and image URL.However, due to Twitter developers' privacy policy agreement, only the tweet ids and user ids are shared with the public [1] .Therefore, in order to use the dataset, the tweets need to be hydrated [3] .Our dataset includes all the geotagged and non-geotagged tweets posted in any language and from any country.

About 1 .
79 percent (43047 number) of tweets are geotagged.The longitude and latitude of geotagged tweets were estimated by averaging the longitude and latitude of their bounding box coordinates.Fig. 1 which was created using ArcGis Online visualizes the location of the tweets.Approximately, 1.03 percent (24650) of the geotagged tweets were from the United States.Table

Table Subject :
© 2023 The Author(s).Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Health and medical science, Infectious Diseases Specific subject area: This dataset contains tweets related to current Monkeypox outbreak.It is primarily released to help researchers contain the outbreak.Basically it is classified under Health and medical science, Infectious Diseases.But it can also be useful to scientists from areas such as Data Science, Computer Science, Social Science, Mathematics and Statistics, and even Economy.

Table 2
The portion of the tweets belonging to each country.