Dataset for modeling Beck’s cognitive triad to understand depression

This article presents data to model Beck’s cognitive triad to understand the subjective symptoms of depression, such as negative view of self, future, and world. The Cognitive Triad Dataset (CTD) comprises 5886 messages, 600 from the Time-to-Change blog, 580 from Beyond Blue personal stories, and 4706 from Twitter. The data were manually labeled by skilled annotators. This data is divided into six categories: self-positive, world-positive, future-positive, self-negative, world-negative, and future-negative. The Cognitive Triad Dataset was evaluated on two subtasks: aspect detection and sentiment classification on given aspects. The dataset will aid in the comprehension of Beck’s Cognitive Triad Inventory (CTI) items in a person’s social media posts.


Specifications
Health psychology Specific subject area Beck's cognitive theory Type of data Text How data was acquired The data from Tweeter was extracted using the Twitter API. Data from the Time-to-Change blog and Beyond Blue personal stories are manually collected. Data format Raw and analyzed. Parameters for data collection The Tweeter API was utilized to capture tweets using filter keywords related to cognitive triad aspects. The keywords related to self, future, and world include {"I", "myself", "me"}, {"future", "from now", "look forward", "turn out", "am going to", "are going to", "won't", "will"}, and {"world", "globe", "people", "he", "she", "it", "they", "nobody", "others", "obstacle"} respectively. Description of data collection The data from Tweeter was extracted using the Twitter API. The filter keywords related to cognitive triad aspects were used in the Tweeter API to capture tweets. The data from the Time-to-Change blog were manually collected. The GitHub code was used to generate simulated data that resembles cognitive patterns found in the Beyond Blue personal stories. The data were manually labeled by skilled annotators.

Value of the Data
• Patients may under-or over-report their symptoms during traditional clinical interviews, depending on the actual or perceived implications for a mental health disorder diagnosis. Intelligent mental disorder understanding systems trained with CTD can overcome these limitations and effectively test for depression. • The CTD presents 6-ary cognitive triad labels to understand the CTI-items associated with statements in a person's social media messages. 6-ary labels include self-negative, futurenegative, world-negative, self-positive, future-positive, and world-positive. • The data can be utilized to train a sentiment analysis model, which can then be used for initial screening of depression based on the client's recent interactions with the clinical chatbot or their social media data. • The labeled text data can be used to train machine learning models for sentiment analysis and aspect detection tasks. The aspect-based sentiment classification model on CTD can assist psychologists in identifying the cognitive triad aspect-sentiment pairs {(self, negative), (world, negative), (future, negative)} from the social media messages of the individual.

Data Description
Beck [2] determined three factors responsible for depression: faulty information processing (errors in logic), cognitive triad (negative thinking about world, self, and future), and negative self-schemas. Critical evaluation of Beck's theory is provided in Alloy et al. [3] and Butler et al. [4] . This section highlights the cognitive triad which, can be modeled using sentiment analysis. The Cognitive Triad Inventory (CTI) comprises items [5] related to a view of the self, the world, and the future, as shown in Table 1 .
The Cognitive Triad Dataset is used to understand the CTI-items associated statements in a person's social media messages. 6-ary classes include C6 = {self-negative (sneg), world-negative  (wneg), future-negative (fneg), self-positive (spos), world-positive (wpos), future-positive (fpos)}. We collected data from Tweeter, Time-to-Change blog, and Beyond Blue personal stories and used the majority vote for our dataset with the gold standard. The statistics for the 6-ary dataset is provided in Table 2 . For cognitive aspect detection, CTD classes are reduced to ternary classes {self, world, future}. CTD statistics for cognitive aspects are given in Table 3 . For sentiment classification, CTD classes are decreased to binary classes {positive, negative}. Table 4 shows the CTD statistics for sentiment classification. Word clouds for self-negative, world-negative, futurenegative, self-positive, world-positive, and future-positive labels are provided in Figs. 1-6 . A word cloud is a depiction of text data in which the size of each word signifies its frequency or relevance.

Experimental Design, Materials and Methods
The cognitive triad dataset is evaluated for aspect detection and sentiment classification using popular machine learning and deep learning models. Data were preprocessed by deleting duplicate Tweets, incomplete Tweets, and Tweets shorter than four words, removing punctuations and stop words from the text, and deconstructing multi-word hashtags into individual words. In the preliminary work, Decision Tree, Random Forest, Naive Bayes, SVM [6] , and RNN-Capsule [7] models are evaluated for aspect extraction and sentiment classification on the cognitive triad dataset. The baseline machine learning models are implemented using scikit-learn. The RNNcapsule model is implemented using PyTorch and run on a single GPU (NVIDIA GeForce RTX 3080 Ti). By default, we trained the model for 28 epochs with a batch size of 32. We employed pre-trained GloVe for the word embedding. In numerous trials, we chose the best validation performance and presented the testing performance in experimental results. Table 5 compares various models on CTD for aspect extraction task. The results of accuracy and an F1-score are very close for Random Forest and Support Vector Machine. The RNN Capsule model has a maximum accuracy of 96.17% and an F1-score of 96.02%. Table 6 provides the comparison of various models on CTD for the sentiment classification task. The results of accuracy and F1-score are very close for Decision Tree and Support Vector Machine. The Random Forest model has the highest accuracy of 81.58% and an F1-score of 81.56% among machine learning models. The RNN Capsule model has a maximum accuracy of 88.87% and an F1-score of 88.55% for the sentiment classification task. Table 7 gives the performance of various models on CTD for sentiment classification task on the self aspect. The results of accuracy and F1-score are very close for Random Forest and Support Vector Machine. The RNN Capsule model has a maximum accuracy of 83.67% and an F1-score of 83.72% for the sentiment classification task on the self aspect. Table 8 provides the performance of various models on CTD for sentiment classification task on the future aspect. The Random Forest model has the highest accuracy of 83.62% and an F1-score of 84.11% among machine learning models. The RNN Capsule model has a maximum accuracy of 90.06% and an F1-score of 89.89% for the sentiment classification task on the future aspect. Table 9 gives the performance of various models on CTD for sentiment classification task on the world aspect. The Random Forest model has the maximum accuracy of 86.60% and an F1-score of 86.59% for the sentiment classification task on the world aspect. Table 10 provides the performance of aspect based sentiment classification on cognitive aspect, sentiment classes. The Support Vector Machine has the highest accuracy of 60.54% and an F1-score of 60.58% among machine learning models. The RNN Capsule model has a maximum accuracy of 85.71% and an F1-score of 85.84% for the sentiment classification task.

Ethics Statement
The data presented in this article is being distributed in accordance with the Twitter developer policy ( https://developer.twitter.com/en/developer-terms/policy ), Beyond Blue terms of use ( https://www.beyondblue.org.au/general/terms-of-use ), and Time-to-Change privacy policy ( https://www.time-to-change.org.uk/privacy-policy ).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.