Modified arithmetic optimization algorithm with Deep Learning based data analytics for depression detection

: Depression detection is the procedure of recognizing the individuals exhibiting depression symptoms, which is a mental illness that is characterized by hopelessness, feelings of sadness, persistence and loss of interest in day-to-day activities. Depression detection in Social Networking Sites (SNS) is a challenging task due to the huge volume of data and its complicated variations. However


Introduction
Computational Intelligence (CI) is a subfield of Artificial Intelligence (AI) that concentrates on the development of intelligent algorithms and models accomplished through learning from data, adapting to changing environments and solving complex problems.The field of CI encompasses various techniques such as Machine Learning (ML), Swarm Intelligence (SI), Neural Networks (NNs), Fuzzy Logic (FL) and Genetic Algorithms (GAs).Psychological analysis refers to the process of extracting the psychological data from text-based data.In general, the text data is utilized to get rid of the judgement forming, emotions and opinions.Having views or opinions toward certain products or any subject is common human psychology that defines what one thinks of the topic or products [1].These days, with the penetration of internet technology and social media, the way of sharing one's own opinions and expressing different emotions have drastically changed [2].
The public use social media sites, blogs and product review and recommendation websites to share their thoughts on various topics including political parties, products and movies on significant matters.Well-known social media platforms namely Reddit, Facebook and Twitter are the most used platforms in terms of sharing reviews and opinions about a product or a topic.Organizations and business firms utilize people-based psychological feedback to increase a product's quality and value [3].Depression is often described as "a state of mind that expresses mood disorders like loss of appetite, depression, anxiety, unhappiness, lack of concentration, being bored and so on".It may badly affect life while at extreme situations, it eventually leads to suicidal activities.Depression remains the major cause of various health problems and affects people worldwide [4].As discussed earlier, all individuals are prone to encounter depression and many do not get treatment since they are unaware of their depression status.Likewise, family members and friends have absolutely no idea about an individual's depression status.Those individuals who are at risk of depression commonly express themselves through social networking sites [5].Social networking is a kind of interaction that fails to depend on facial expressions and eye contact, but it is expressed by commenting on pictures or messages.Therefore, social media data may help in detecting depression among people depending on the negative opinions posted by them.
Social media data is generated in huge volumes on a daily basis and is challenging to streamline [6].Sentiment Analysis (SA) or opinion mining is a text analysis method that analyzes the thoughts of human beings concerning organizations and the structures present in such entities.In SA, a feature can be a thing discussed by the persons about policies, services, events, products, individuals or entities [7].The integration of features and respective sentiment words can produce high-quality, precise and meaningful SA outcomes.To segregate the text into negative, neutral or positive categories, the ML methods have been implemented in sentiment classification tasks [8].In general, testing and training datasets are utilized in ML approaches.The training dataset is used to learn about the files whereas the testing datasets are utilized for validating the accomplishments of the ML methods [9].ML methods encompass semi-supervised, supervised and unsupervised approaches.Among these, the unsupervised approaches are used for expression datasets but possess less predictive ability than the supervised approaches.Supervised approaches require data that are scarce and on well-known associates for training [10].Though the semi-supervised approaches are trained with some communication data, its prediction outcomes are more opaque than the supervised algorithms.
In the current study, we develop the Modified Arithmetic Optimization Algorithm with Deep Learning for Depression Detection in Twitter Data (MAOADL-DDTD) technique.In the presented MAOADL-DDTD technique, the noise in the tweets is preprocessed through different ways.In addition to this, the Glove word embedding technique is also used to extract the features from the preprocessed data.For depression detection, the Sparse Autoencoder (SAE) model is used.The MAOA is used for optimum hyperparameter tuning of the SAE approach in order to increase the performance of the SAE model, which in turn helps in accomplishing enhanced detection performance.The novel part of the proposed framework is the MAOA-based parameter tuning that potentially resulted in improved accuracy and effectiveness in identifying the depression patterns in Twitter data.The MAOADL-DDTD method was simulated using the benchmark dataset and the outcomes established the superior performance of the proposed method.

Related works
Mbarek et al. [11] addressed the suicide prevention issue by identifying the suicidal profiles on social networks, especially Twitter.First, the author analyzed the twitter profiles and extracted different features from the data including account features relevant to the profile and parts relevant to the tweets.Second, the author presented the technique based on ML methods for detecting the suicidal profiles using the twitter data.Then, the author validated the proposed model using a profile dataset comprising persons who committed suicide earlier.In a study conducted earlier [12], a depression analysis and suicidal ideation recognition mechanism was presented to forecast the suicidal activities that reinforce the extent of depression.The researchers utilized the ML methods to detect the suicidal thoughts of a depressed twitter user from their tweets.They tested and trained the classifiers for distinguishing whether a user is depressed or not by utilizing feature extraction in its actions within the tweets.Priya et al. [13] predicted stress, anxiety and depression levels of a person utilizing ML methods i.e., five dissimilar Machine Learning algorithms.After that, the classes are imbalanced in the confusion matrix.
Alaskar and Ykhlef [14] framed a method that could categorize the tweets based on the depression features selected by the healthcare professionals.The author applied supervised ML methods for extracting the tweets with the most depression features.Then, for the identification of optimal settings for the proposed method, the author assessed the accuracy among the implemented supervised ML approaches.In literature [15], the authors aimed at detecting the depressed user posts using the ML methods.In this study, the NLP classified the mistreatment bidirectional encoder representations from the transformers BERT formula in order to observe the depression in an efficient and convenient manner.Tadesse et al. [16] used NLP approaches and ML methods to train the data and evaluate the effectiveness of the presented technique.
Safa et al. [17] presented a new multimodal structure for the prediction of depression indicators from user profiles and introduced an automatic method to assess and gather the tweets depending on the self-reported statement.The author utilized automated image tagging, n-gram language models, bag-of-visual words and LIWC dictionaries.In the study conducted earlier [18], a new structure was proposed for efficiently finding the posts relevant to anxiety-and-depression, while preserving the semantic and contextual words utilized in the entire corpus at the time of implementing the BERT.Vayadande et al. [19] modeled an ML technique utilizing distant supervision so as to detect depression on twitter.The training data verified the twitter posts, involved with emojis that are categorized as noisy labels on a database.This study used multiple methods such as XGBoost, SVM, NB, LR and RF to differentiate the tweets as non-depressive and depressive.
Amanat et al. [20] presented a productive system by executing the LSTM approach, comprising two hidden states and a huge bias with Recurrent Neural Network (RNN) with two dense layers.This system was proposed to predict the depression from text as it is useful in protecting the individuals from mental disorders and suicidal affairs.Sardari et al. [21] introduced a structure utilizing endwise CNN-based Autoencoder (CNN-AE) approach for learning extremely-relevant and discriminative features in raw sequential audio data.The aim was to identify the depressed people with high accuracy.Zogan et al. [22] examined explainable Multi-Aspect Depression Detection with Hierarchical Attention Network (MDHAN) for automatic detection of the depressed users on social network and explain the predictive model.

The proposed model
In the current study, a novel MAOADL-DDTD methodology has been developed for automated depression detection in twitter data.In the proposed MAOADL-DDTD technique, four subprocesses are involved namely, data preprocessing, Glove-based word embedding, SAE-based depression detection and MAOA-based parameter tuning.Figure 1 portrays the workflow of the MAOADL-DDTD methodology.

Data preprocessing
Text pre-processing is performed to clean the text information in such a way that the data is ready to be modeled for the next stage or process [23].The steps of the pre-processing technique are briefed below: Tokenization: In this phase, the phrases, symbols, words and other main entities (tokens) are separated from the text for additional investigation.This tokenization process breaks down the series of characters in a sentence (text) into units of words.Furthermore, the words or features are carefully chosen so that it is not valid.In such cases, punctuation marks and any other entities that are not letters are removed.
Transform Case: All the letters in the words are changed from uppercase to lowercase.Filter Stopword: A significant word is taken from the token result.Now, both wordlist (saving significant words) and stoplist (discarding less significant words) algorithms are used.
Generate N-grams: It is a fusion of adjectives that frequently appears to specify a sentiment of textual data with only one word.Trigrams have three words whereas Bigrams contain two words.
Stemming: This process is required to minimize the document's terms.Further, it is also used to group others words with similar meaning and a base word with different forms since they get various affixes.

Word embedding
In this work, the Glove word embedding technique is used.The relation of a word in the text, the context of the word in the text and words with similar semantics are captured by word embedding process [24].In the word embedding technique, the words with similar meanings have the same representation.Every word is signified as a real-valued vector in the predetermined vector space.Instead of considering the local meaning, the GloVe word embedding technique considers the global context of the words.It is a pre-trained word embedding model that integrates the context window and global matrix factorization local models.A co-occurrence matrix X is created at this stage using a vocabulary of 400,000 words that are prevalently used.During the construction of X database, a context window is initialized to differentiate the left context from the right one, and the weighting function is reduced for the context window in such a way that the word pairs are  words apart from contributing 1/ to the overall count.Thus, the much-distant word pairs contribute less to the applicable data about word relationships.
The cost function is defined by Eq (1), In Eq (2),  and  show the vector representation of   and   correspondingly,   shows the set of context words of   ,  denotes the vocabulary size and    indicates the co-occurrence matrix.  shows a particular word in the message.  ,  ̃ are biases and (  ) denotes a weighting function that must adhere to the following properties.
(0) = 0.If  is regarded as a continuous function, it must vanish as  → 0 fast enough that the lim →0  () 2  is finite.a) () must be a non-decreasing value such that a rare co-occurrence is not over-weighted.b) () must be comparatively smaller for a larger value of  and so, the frequent co-occurrence is not over-weighted.
The model performance weakly depends on the cut-off and is set as  max = 100 for each experiment.

Depression detection using SAE Model
For the depression detection process, the SAE approach is utilized in this study.AE is an unsupervised NN method that contains encoding and decoding layers; it is generally present in the architecture's input, output and Hidden Layers (HL), as represented in Figure 2 [25].It maps the data to high-level parameter space for practical expression.Further, the feature data is recreated and improved.AE can moderate the size of the data and make sure the data features have invariance and integrity.This phenomenon ensures the quality of input feature instances to the succeeding networks that are trained by typical dimensionality reduction approaches like PCA and ICA.SAE is highly reliable with AE with respect to structural composition.SAE improves a sparsity limit to the loss function of AE; moreover, only a few HL nodes can be 'active'; therefore, the total AE network develops the sparse.Considering that the HL activation function utilizes sigmoid, if the outcome of HL is 0, then the node becomes 'inactive', and if the HL output is 1, then the node is 'active'.
Here,  ̂ implies the average sparse activation,   denotes the trained instance and  refers to the number of trained instances;   () denotes the response outcome of the   ℎ node of HL to the  ℎ sample.Generally, the sparsity co-efficient  is fixed at 0.05 or 0.1.Higher the  divergence is, better the difference between  and  ̂ will be.If the  divergence is equivalent to 0, then it denotes that both are totally equivalent.
Here,  denotes the weighted coefficient of the sparse constraints.

Hyperparameter Tuning using MAOA
In the final phase, the MAOA is exploited for optimal hyperparameter tuning of the SAE.AOA is a recent meta-inspired approach based on multiply (M), divide (D), add (A) and subtract (S) arithmetical operators [26].Both add  and subtract  operations consider the dense data for the particle development and can prevent the algorithm from getting stuck in local optima.In the later iteration, it also prevents stagnation; the "multiply  and divide  operations" consider the dispersed data to identify the locations and particle exploration occurs closer to the optimum solution.Both exploration and exploitation processes in every iteration rely on the mathematically-optimized acceleration function ().

𝑀𝑂𝐴(𝑖𝑡𝑒𝑟) = 𝑀𝐼𝑁 + 𝑖𝑡𝑒𝑟 ×
In Eq (6), Max −  corresponds to the iteration counts. and  denote the maximal and minimal values of  , respectively;  shows the existing amount of iterations.The earlier formula is updated and formulated by Eq (7), In Eq (7),  denotes the mathematical optimization probability. represents the sensitive variable and is fixed as 5.
(1) Initialization In the initialization stage, the lower boundary LB and the upper boundary UB of the particle position, the amount of particle dimensions, the number of particles and the maximal iterations are fixed.The location of the particle is randomly initialized, which assesses the optimal value and location and also evaluates  and MOP. (

2) Exploration stage
When  1 >  , then the exploration stage begins. 1 corresponds to a randomly-generated value in the range of 0 and 1.The updated equation for the exploration process is given below.

3) Exploitation stage
When  1 ≤ , the algorithm enters the development stage.The location update equation is given herewith In Eq (9),  3 denotes the random value within [0, 1].
In Eq (10),  is employed for fine-tuning the parameter of the search phase and is fixed at 0.5.Though AOA produces highly competitive output, it has a few drawbacks like slower convergence and in the later stages, it tends to get stuck into the localization of the optimizer search.However, the AOA improves the convergence speed and optimization-seeking ability.
is a crucial parameter that balances the local optimization task with global search.When the iteration count increases,  linearly increases from [0.2-1], thus making it hard to match the reality of the algorithmic optimizer.During the early iteration,  gets linearly increased from 0.2.This phenomenon makes the AOA of not being capable of exploring additional search space and accordingly, inadequate global exploration ability. assumes a larger value in the later iteration, which results in slow convergence and limited local evolution.To resolve these problems,  is reformulated as follows.
To enhance the solution of the problems, the algorithm heavily relies on the optimum performance.In order to enhance the solution accuracy, the study presents the Differential Evolution (DE) technique in AOA to design the MAOA that produces new individual via mutation and crossover and enhance the search capability.The implementation phase is shown below.
Then, the present individual is crossed with mutated individuals to attain the novel individuals and is expressed as follows: In Eq (13),   denotes the crossover probability;  3 represents the arbitrary in the range of 0 and 1.  ⃗ () represents the -ℎ dimension of the existing individual; and  ⃗ () denotes the -ℎ dimension of the variant individual.
Here,  (  ⃗  (it)) and  (  ⃗  (it)) represent the fitness values before and after the mutation, correspondingly The MAOA approach produces the Fitness Function (FF) to accomplish superior classification outcomes.It describes a positive integer to exemplify the good solution of the candidate results.During this case, the mitigation of the classifier's rate of errors is supposed to be FF, as shown in Eq (15).
In this section, the depression detection performance of the MAOADL-DDTD system was validated using the depression tweets' database from Kaggle repository [27].The dataset comprises 10,000 instances under two classes, as provided in Tables 1 and 2 shows a few sample tweets.
In Figure 3, the confusion matrices generated by the MAOADL-DDTD method under the varying number of epochs are shown.The figure shows that the MAOADL-DDTD method categorized the depression and non-depression sample tweets effectually.

Sample Tweets Class Labels
damn taking this personality quiz and realizing have severe depression 1 (Depression) damn louis did pull me out of my depression painting for the first time in months 1 (Depression) my depression is kicking my ass right now so damn tired 1 (Depression) feel like my night is going bad family calling me grumpy and whiny like so damn sorry for having depression 1 (Depression) I'm trying to embrace it my depression needs to take a vacation for scorpio season damn 1 (Depression) the reason we didn't work out years ago was so that we could have the chance today cool 0 (Non-Depression) lol have Mars opposite Pluto and can do a lot of accents on some characters and sounds; it hidden talent have that practice at home lol ve always wanted to do voiceover acting 0 (Non-Depression) same with all of them though it's just that wow so proud of them for never giving up and becoming the healing idols they dreamed to be 0 (Non-Depression) Madoka had a bit of an attitude and is known for a bit of roasting like ena but they both mean well, and akari and minori are almost twins like they both started from the bottom and they made it to the top 0 (Non-Depression) lol idk ever since daryl was smitten with connie he was looking a little cleaner in the last few episodes 0 (Non-Depression) In Table 3 and Figure 4, the overall depression classification outcomes of the MAOADL-DDTD methodology are provided.The outcomes show that the MAOADL-DDTD method distinguished the depression and non-depression tweet samples under several number of epochs.For instance, with 500 epochs, the MAOADL-DDTD technique gained an average   of 98.91%,   of 98.91%,   of 98.91%,   of 98.91% and an   of 98.91%.Meanwhile, with 500 epochs, the MAOADL-DDTD technique accomplished an average   of 98.91%,   of 98.91%,   of 98.91%,   of 98.91% and an   of 98.91%.Moreover, with 500 epochs, the MAOADL-DDTD method yielded an average   of 98.91%,   of 98.91%,   of 98.91%,   of 98.91% and an   of 98.91%.Lastly, with 500 epochs, the MAOADL-DDTD methodology produced an average   of 98.91%,   of 98.91%,   of 98.91%,   of 98.91% and an   of 98.91%.
Figure 5 shows the   outcomes of the MAOADL-DDTD algorithm in _ and _ methods on the test database.The outcomes imply that the MAOADL-DDTD methodology attained enhanced   values over maximal number of epochs.Furthermore, the maximal _ over _   displays that the MAOADL-DDTD system attained its capability on the test database.A comprehensive Precision-Recall (PR) analysis was conducted upon the MAOADL-DDTD methodology using the test database and the outcomes are revealed in Figure 7.The results portray that the MAOADL-DDTD system achieved enhanced PR values.Furthermore, it can be observed that the MAOADL-DDTD method obtained superior PR values on all the class labels.
In Figure 8, a ROC curve of the MAOADL-DDTD algorithm upon the test database is shown.The results define that the MAOADL-DDTD method achieved high ROC values.Additionally, the MAOADL-DDTD system obtained superior ROC values on all the class labels.
Finally, a widespread comparison analysis was conducted between the MAOADL-DDTD method with other approaches and the outcomes are provided in Table 4 and Figure      Along with that, the RF model resulted in further enhanced   of 98.60%,   of 98.50%,   of 99.00% and an   of 99%.However, the MAOADL-DDTD technique outperformed the rest of the models with the maximum   of 99.54%,   of 99.54%,   of 99.54% and an   of 99.54%.These outcomes confirmed the superior performance of the MAOADL-DDTD methodology over other existing methods under different measures.

Conclusions
In the current study, a novel MAOADL-DDTD methodology has been proposed for automated detection of depression from twitter data.In the proposed MAOADL-DDTD algorithm, four subprocesses are involved such as data preprocessing, Glove-based word embedding, SAE-based depression detection and MAOA-based parameter tuning.In the current research work, the MAOA is employed for optimum hyperparameter tuning of the SAE algorithm, which in turn helps in accomplishing better detection performance.The proposed MAOADL-DDTD algorithm was experimentally validated using the benchmark database.The experimental analysis results achieved by the MAOADL-DDTD method shows its promising performance over existing state-of-the-art approaches.Therefore, the MAOADL-DDTD technique has been found to be an effective tool for depression detection in a real-time environment.In the future, hybrid DL classification approaches can improve the outcomes of the MAOADL-DDTD method.A potential limitation of the proposed MAOADL-DDTD model is that it relies on publicly available twitter data, which may not fully represent the diverse and complex expressions of the scenario.Obtaining access to more comprehensive and private datasets could enhance the real-world applicability of the model.While the model incorporates MAOA for a hyperparameter tuning process, further research could explore advanced optimization techniques for fine-tuning the model using deep learning architecture and potentially improve the detection accuracy.In the future, extending the MAOADL-DDTD model to consider multi-modal data sources, such as images or user interactions, could provide a holistic understanding of depression in social media.This in turn enables highly comprehensive and accurate detection and classification.

Use of AI tools declaration
The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 2 .
Figure 2. Architecture of the SAE.

Figure 4 .
Figure 4. Average outcomes of the MAOADL-DDTD approach on varying number of epochs.
9[28].The outcomes indicate that the KNN, NB, CNN and MDL methods attained poor performance over other models.Furthermore, the SVM model gained certainly improved performance with an   of 97.21%,   of 72.40%,   of 63.20% and an   of 60.20%.

Figure 9 .
Figure 9. Comparative analysis outcomes of the MAOADL-DDTD method and other recent systems.

Table 1
Details of database.

Table 3 .
Depression classification outcomes of the MAOADL-DDTD algorithm under varying number of epochs.