“Applying advanced sentiment analysis for strategic marketing insights: A case study of BBVA using machine learning techniques”

In the digital era, understanding public sentiment toward brands on social media is essential for crafting effective marketing strategies. This study applies sentiment analysis on Banco Bilbao Vizcaya Argentaria (BBVA) tweets using advanced machine learning techniques, particularly the eXtreme Gradient Boosting (XGBoost) algorithm, which showed remarkable precision (91.2%) in sentiment classification. This process involved a systematic approach to data collection, cleaning, and preprocessing. The precision of XGBoost highlights its effectiveness in analyzing social media conversations about banking. Additionally, this paper achieved improvements in neutral tweet classification, with accuracy rates at 87-88% and a reduced misclassification rate, enhancing the analysis reliability. The findings not only uncover general sentiments toward BBVA but also provide insight into how these sentiments shift in response to marketing activities and global events. This gives marketers a valuable tool for real-time assessment of campaign effectiveness and brand perception. Ultimately, employing the XGBoost algorithm for sentiment analysis offers BBVA a strategic advantage in understanding and engaging its online audience, demonstrating the significant benefits of using sophisticated machine learning in banking. The study emphasizes the crucial role of data-driven sentiment analysis in developing informed business strategies and improving customer relationships in the banking industry’s competitive landscape.


INTRODUCTION
In the era of digital interconnectedness, the banking sector represented by institutions like Banco Bilbao Vizcaya Argentaria (BBVA) stands at a crossroads of challenges and opportunities.This landscape is especially pronounced in leveraging customer sentiments expressed across social media platforms, highlighting a crucial shift in the approach to analyzing customer feedback and market trends for strategic marketing decisions.The integration of banking with digital communication technologies signals a transformation in how insights into customer preferences and market dynamics are harnessed to drive marketing strategies and business development.
The central challenge arises from the voluminous and unstructured nature of social media data, which traditional manual analysis techniques cannot efficiently or accurately process.This presents a significant scientific inquiry: How can banks like BBVA systematically convert this expansive, unstructured data into structured, actionable insights that inform marketing strategies?The solution lies within the realms of data science and machine learning, where cutting-edge algorithms hold the potential to revolutionize sentiment analysis.This technological advancement allows banks to quickly respond to customer feedback, anticipate market trends, and refine their marketing and service offerings.
However, the integration of machine learning for sentiment analysis within the banking industry is accompanied by intricate challenges.A primary concern is the identification and fine-tuning of algorithms capable of precisely classifying sentiments, with a particular focus on detecting negative feedback to enable timely and proactive marketing interventions.The capacity of these algorithms to manage large volumes of data and their sensitivity to the subtleties of human emotion and expression on social media are pivotal in determining their utility.Thus, the adoption of these technologies is not merely a matter of operational enhancement but a strategic imperative that can provide banks with a competitive edge in marketing intelligence and customer relationship management.

LITERATURE REVIEW
The exploration of social networks transcends mere modern digital phenomena, tracing back to the 1930s with the pioneering concept of sociometry by Jacob Levy Moreno.Moreno (1953) not only laid the foundations for the sociometric movement but also introduced sociometry as a measurable and experimental method to analyze human relationships.This innovative approach, rooted in the principles of interrelation and spontaneous creativity (Moreno, 1955), underscored the complexity and interdependence of individuals within groups, setting a precedent for subsequent social network analyses.
The surge in interest in social network theory during the 1970s brought to light seminal theories such as Milgram's six degrees of separation (Milgram, 1967) and Granovetter's importance of weak ties in social relationships (Granovetter, 1973), which expanded one's understanding of social connectivity.This period also saw scholars like Wasserman and Kass (1995) and Borgatti et al. (2009) developing detailed methodologies to describe social networks as sets of actors connected by various types of relationships, thereby emphasizing the importance of nodes and the diversity of interactions within networks.Boyd and Ellison (2007) characterized the transition to the digital era, marked by the emergence of virtual social networks, notably through platforms like Facebook.The initial modern virtual social network, Sixdegrees (Williams, 2019), highlighted a significant evolution in how individuals connect and communicate.Concurrently, network centrality explored by Zhang and Luo (2017), along with degree centrality (Bródka et al., 2011), closeness centrality (Okamoto et al., 2008), and betweenness centrality (Stephenson & Zelen, 1989), provided deep insights into the structural dynamics of networks.
The use of social networks has also evolved as users gain experience in their use.The "stories" format in social networks, analyzed by Veissi (2017) and Rossmann (2018), represents an adaptation to new ways of sharing content, highlighting the need for detailed analysis in Big Data environments.Anshari et al. (2019) discuss the growing competition in paid advertising and the need for intelligent audience segmentation.Che and Ip (2017) noted how social networks have become influential marketplaces where users search for and buy products influenced by brands.
Consequently, social networks often mirror situations occurring in the physical world and serve as platforms for expression and relationship, triggering human motivations, such as the need for belonging, recognition, and identity construction.2019), plays a crucial role in today's digital era.It involves analyzing information generated on social platforms to guide corporate decision-making.IAB Spain (2021) reports that active social media users in Spain represent 85% of the online population, with a dominant 97% usage rate of mobile devices.These statistics underline the importance of analyzing data generated on these platforms, offering insights into the complex social dynamics unfolding within these digital realms.Here, 81% use these networks for entertainment, 72% for user interaction, and 66% for staying informed.
The application of Big Data methodologies to social media data analysis offers a range of valuable insights.Micu et al. (2017) and Lee and Kim (2020) stressed that audience analysis involves deciphering customer behaviors to enhance strategies across the pre-sales, sales, and post-sales spectrum.Conversation analysis, according to Sun et al. (2022), focuses on recognizing important topics for users, such as objects, individuals, and concepts, with sentiment in user posts often reflecting their emotional dispositions, typically categorized as positive, neutral, or negative.Furthermore, campaign analysis, both online and offline ( The banking sector's engagement with social media exemplifies the practical application of social network analysis in the industry.The American Bankers Association (2023) indicates a strong recognition among banks of the importance of social media for business strategy.This acknowledgment is mirrored in the marketing domain, where a shift toward allocating a larger portion of the marketing budget to social media is evident (Gartner, 2023).The emphasis on understanding consumer needs and interactions with brands on social media platforms (The Financial Brand, 2022) highlights the need for sophisticated data analysis to inform marketing strategies and customer engagement.
The emotion analysis within textual content is conducted on data sets that have been categorized in terms of positive, negative, or neutral sentiments.This process is executed by deploying an array of classification algorithms.Prior to classification, it is crucial to prepare the text through text mining techniques, which encompass the removal of symbols and punctuation, reducing words to their stems, and discarding stop words (Witten et al., 2011).These steps facilitate the creation of a list of significant terms.Subsequently, term frequency and inverse document frequency methods are employed to establish a vector space model.Upon obtaining this model, sentiment analysis is pursued through a classification procedure.
In their sentiment analysis investigation, Singh et al. (2017) examined four cutting-edge machine learning classifiers, including Naïve Bayes (NB), J48, BFTree, and OneR, aiming to optimize the process.They discovered that NB was expedient in learning, and OneR offered promising outcomes, with an accuracy of 91.3% in precision.Huq et al. (2017) employed machine learning algorithms Support Vector Machines (SVM) and k-NN on Twitter data, achieving accuracy rates ranging between 58.39% and 79.99% after extracting features using n-grams.Amolik et al. (2016) executed sentiment analysis by classifying tweets with a Feature Vector and classifiers such as NB and SVM.
Although NB exhibited higher precision in comparison to SVM, the latter proved to be more effective in terms of overall accuracy.Başarslan and Kayaalp (2020) focused on social media posts of users about health topic to categorize sentiments as positive, negative, and neutral, using the NB, SVM, and Artificial Neural Networks algorithms.This comprehensive exploration of social networks from their inception as a sociometric study to the contemporary utilization of Big Data and machine learning reveals a significant evolution in how human connections and interactions are understood and analyzed.Through decades of scholarly investigation, the field has expanded from analyzing the direct interpersonal connections to understanding the vast, complex networks facilitated by digital technology.
Despite the substantial advancements in social network analysis, particularly through the lens of Big Data and machine learning algorithms, a gap remains in the practical application of these advanced techniques in specific sectors, such as banking.The literature underscores a growing need for nuanced tools capable of parsing the intricate web of digital interactions for actionable insights, especially in understanding customer sentiment and behavior on social media platforms (Lee & Kim, 2020).This gap points toward the potential impact of leveraging sophisticated machine learning approaches, like XGBoost, to not only enhance sentiment analysis but also to refine customer engagement strategies and decision-making processes in the banking industry.This review sets the stage for the current study's objective: to rigorously evaluate the efficacy of specific machine learning algorithms, including XGBoost, in the sentiment analysis of BBVA's social media interactions.It is hypothesized that the machine learning algorithm XGBoost will outperform traditional models such as SVM, Random Forest, and Neural Networks in the task of sentiment classification within the banking sector's social media.This superiority is predicated on XGBoost's efficiency in handling the complexities and voluminous data typical of social network analysis, thereby providing more accurate actionable insights, and significantly contributing to the enhancement of customer relations and market engagement.

METHODS
A data-driven approach is used to collect and store the data, which is done using the Twitter API to gather tweets related to Banco Bilbao Vizcaya Argentaria (BBVA) and efficiently stored in a local PostgreSQL database.This process is automated by scheduled scripts.Subsequently, data processing and cleaning are moved onto, where official BBVA accounts' tweets are filtered out by a secondary script and duplicates are removed to ensure data precision.Data analysis and feature engineering are then proceeded to, where a subset of tweets is carefully examined, and a supervised dataset is curated.Through language analysis, linguistic diversity within the dataset is uncovered.Sophisticated feature engineering techniques are applied to refine the data, enhancing the precision of model training.The journey continues into feeling identification and rating, where the power of Natural Language Processing (NLP) is harnessed (Reier Forradellas et al., 2020).This includes sentiment analysis using the Google Cloud Natural Language (CNL) API, where emoticons, capital letter usage, and influential words are considered to gain a holistic view of tweet sentiments.To validate sentiment analysis, precision is rigorously assessed by comparing manual classifications with sentiments derived from the CNL.Additionally, feature selection is undertaken, employing correlation matrices to identify relationships between variables and eliminate redundancy.Finally, the application of Machine Learning models is delved into, evaluating various supervised learning algorithms, including SVM, XGBoost, Random Forest, and Neural Networks (Reier Forradellas et al., 2020).This evaluation assists in choosing the most suitable algorithm for the specific task.As part of the refinement process, Parameter Tuning is also engaged in, systematically optimizing model configurations by analyzing confusion matrices.This comprehensive approach ensures the robustness and precision of the data analysis pipeline.
Access to data is facilitated through the utilization of the Twitter API.For this experimental pilot, the free version of the API has been chosen to create a universally accessible environment suitable for a wide range of organizations and professionals.This version aligns perfectly with the requirements, accommodating both the daily tweet volume from BBVA's account and the 30-day analysis timeframe.For the performance of data retrieval, the R library, rtweet, has been employed, a choice made based on its superior tweet download capacity compared to alternative libraries like twitterR.The script "rTweetScript.R" has been meticulously crafted for this purpose.An authentication process using the appropriate key and token is undergone by this script, a connection is established, and tweets are retrieved using the search_tweets function.This pilot study searched for the term "BBVA" and fetched 5,000 tweets per call.The retrieved data were temporarily stored in a list, and the decision to exclude retweets (include_rts = FALSE) to mitigate redundancy was made.
One of the key project objectives is the automation of the entire process.This pilot study adopted a novel approach using the cronR task scheduler instead of conventional scheduling tools to initiate the primary script daily.The primary script is triggered by this task scheduler, which subsequently activates a chain of auxiliary scripts.This streamlined approach allows for a focus on the crucial tasks of data processing and machine learning.
Following data retrieval, data processing is the subsequent step.Initially, tweets are temporarily housed in a list structure.Subsequently, they are persistently stored in a local PostgreSQL database for further in-depth analysis.After the database creation, a connection to it is established using the DBeaver management tool.The final phase of data storage involves the development of an R script that leverages the RPostgreSQL library to manage database operations.
The functionality of the "rTweetScript.R" script is extended by incorporating the capabilities of the RPostgreSQL library (R Studio Team, 2020).After a database connection is successfully established using a pre-configured driver, the tweet list is systematically saved to the "original_tweets" table using the dbWriteTable function.To ensure the preservation of data integrity and prevent overwriting during subsequent data downloads, the append parameter is judiciously employed.After the execution of "rTweetScript.R," the stored tweets can be conveniently reviewed using the DBeaver tool by users (R Studio Team, 2020).The process of automatically storing tweets in the database is seamlessly managed by the task scheduler.
In the experimental pilot, tweets from individuals, whether they are BBVA customers or not, that mention BBVA are analyzed.As a result, a secondary script, denoted as "firstCleaning.R," interlinked with the primary script, "rTweetScript.R," has been devised (R Studio Team, 2020).This structural configuration unveils the tapestry of interconnected tools and processes orchestrating the experimental pilot, with tasks synergistically functioning with a high degree of automation.The primary objective within the "firstCleaning.R" script is the filtration of tweets emanating from official BBVA accounts (R Studio Team, 2020).These designated accounts are documented within a dedicated table, aptly named "bbva_accounts."For the duration of this pilot initiative, this catalog is vigilantly maintained through manual oversight.Subsequently, in the data cleansing phase, the systematic eradication of duplicate tweets is given precedence.Given the automated nature of data retrieval, conducted in daily increments with the acquisition of the latest 5,000 tweets, the potential for overlapping tweets from consecutive days looms large.Consequently, the upholding of the pristine integrity of the database by methodically purging duplicate records becomes imperative.This critical verification process is seamlessly integrated into the "firstCleaning.R" script (R Studio Team, 2020).
Ultimately, this methodological framework, characterized by its automated task scheduling and meticulous data management, facilitates a nuanced analysis of BBVA-associated tweets.This not only unveils customer sentiments but also provides a strategic edge in marketing analysis, enabling BBVA to engage more effectively with its digital audience and refine its marketing strategies in the competitive banking industry landscape.
This exploratory pilot study uses a robust approach grounded in machine learning and text mining disciplines for analyzing sentiment in tweets about BBVA.This methodology facilitates the categorization of text by the sentiment's polarity (Niu et al., 2005), reflecting agreement or disagreement on a topic (Balahur et al., 2009), distinguishing between favorable and unfavorable news (Ku et al., 2005), and delineating pros and cons (Kim & Hovy, 2006).It encompasses the stages of data collection, storage, processing, analysis, feature engineering, sentiment identification, model application, and parameter tuning.A comprehensive overview of each step in the methodology facilitates a clear understanding of the sentiment analysis process (Figure 1).Each step provides a clear insight into the sentiment analysis process, thereby enhancing marketing strategies through a deeper understanding of public opinion on social media.
Upon completing the foundational phase with R, the project transitions into a Python-executed phase, broadening the scope to include data analysis, feature definition, value extraction, and the construction of a machine learning-based model (Chang, 2018).This transition marks a shift toward a more nuanced analysis, necessary for crafting targeted marketing strategies.
Initially, the model operates without supervision, relying solely on input data devoid of labels.Faced with the choice of continuing with an unsupervised model or manually labeling a dataset for a more structured approach, this study opts for the latter, ensuring a refined analysis conducive to strategic marketing insights.
From an extensive collection of over 10,000 tweets, a subset of 1,000 tweets was analyzed and labeled, forming a supervised dataset.This process not only categorizes sentiments ranging from negative to positive but also allows for the exploration of the linguistic diversity within the tweets, a factor vital for understanding market sentiment nuances.
The data analysis delves into the intricate details of the 60+ fields present in each record.Preliminary exploration showed that 87.3% of the captured tweets are in Spanish, 5.1% in English, with the re-maining 7.5% spanning other languages.The daily influx of tweets is found to fluctuate between 250 and 3,000, averaging around 1,500 tweets.
After the variables for analysis are determined and the labeled training dataset of 1,000 records is established, the methodology extends to feature engineering, where new variables are derived from the data, thereby increasing the precision of model training.This step is critical for identifying key sentiment drivers, which in turn, informs marketing strategies.
The primary goal of the study is discerned to be the sentiment of each tweet.To achieve this, text processing techniques based on Natural Language Processing (NLP), which enable the analysis, interpretation, and comprehension of the content of each tweet, are employed.For this purpose, the Google Cloud Natural Language (CNL) API is utilized.However, the choice of this API is accompanied by an understanding of the challenges inherent in determining sentiment using NLP tools.Many libraries, particularly free ones, are tailored for a specific language, usually English, because their underlying sentiment databases are language specific.While they may be accurate for one language, their effectiveness is diminished for others.This presents a significant challenge for the experimental pilot, as there arises a need to translate almost 94.9% of the tweets.Literal translations are often found to distort the original essence of a tweet.
Before finalizing the choice on the Google API, experiments with several libraries were conducted, but none were found to be satisfactory.NLTK (Natural Language Toolkit), while comprehensive and supporting multiple languages, primarily focuses on English, and its precision drops sig- nificantly for other languages.TextBlob operates slower than NLTK and faces similar language-related challenges.The 'Spanish-sentiment-analysis' library, though designed for Spanish, lacks the extensive data repositories of its predecessors, resulting in less accurate outcomes.
Given these challenges, the decision for the experimental pilot was made to focus on analyzing tweets in both English and Spanish, which make up 92.4% of the total tweets.The procedure involves loading tweets from the database, tokenizing, eliminating stop words, and determining sentiment.
The methodological journey concludes with an update to the database with newly acquired sentiment data, ensuring that marketing strategies are informed by the most current public sentiment.This comprehensive approach, from data collection through to sentiment analysis, underpins the strategic marketing insights derived from the study, highlighting the potential of machine learning techniques to transform social media data into actionable marketing intelligence.

RESULTS
Table 1 illustrates the comparison between the manual classification of the 1,000 training tweets and the sentiment derived using the Cloud Natural Language Google API (CNL).To categorize the sentiment of the tweets, equidistant segments are established.Specifically, values less than -0.3 are indicated to denote a negative sentiment, values between -0.3 and 0.3 are suggested to represent a neutral sentiment, and values greater than 0.3 are signified to imply a positive sentiment.The values of the ordinate axis are indicated to represent the number of tweets.Figures 2, 3, and 4 show the histograms obtained using the feeling obtained from CNL. Figure 5 shows the distribution of the classification result in a confusion matrix.The sentiment analysis provided by the Google API aligns with the manual classification just over half the time, a mere 50%, which is only slightly better than random guessing.This indicates a need for further refinement of the model.To enhance the model's accuracy, the following additional considerations are considered to derive new features: Tweet sentiment concentrations: As depicted in Figure 2, there are two distinct clusters of negative tweets.One is close to -1, and the other is around 0. The latter cluster is attributed to tweets mentioning potential "Corporate Reputation" issues.While these are not direct customer complaints, they convey negativity concerning potential reputational damage.Figures 3 and 4 display expected distributions, with positive sentiments nearing the value of 1 and neutral sentiments centered around 0.
2. BBVA's sponsorship: It is crucial to note that Banco Bilbao Vizcaya Argentaria (BBVA) was the official sponsor of the Professional Soccer League in Spain.In the manual classification, tweets related to this were labeled as neutral.However, the sentiment derived can swing between positive and negative, influenced by match summaries, statistics, sports news, and virtual soccer leagues.
3. Emoticon interpretation: Emoticons often convey sentiments or opinions.The Python Demoji library is employed to analyze texts, returning the emoticons they contain and their respective descriptions (Brad, 2019).All stored tweets are processed using this library, and the results are cross-referenced with pre-defined lists of positive and negative emoticons.Additionally, a linguistic stemming process is applied, especially important for the Spanish language.This process extracts the stem or root of each word in a tweet, representing its declinable forms.This simplification aids in achieving a more precise subsequent classification, considering singular forms for nouns, masculine singular for adjectives, and infinitives for verbs.
As illustrated in Figure 6, these refinements enable the precise classification of close to 3,000 tweets as either positive or negative.This advancement underscores the importance of continuous model optimization and the integration of diverse features to achieve more accurate sentiment analysis results.
Before accepting all the variables of the first selection as good, it is necessary to verify that there is a relationship between them.If there were correlation between variables, it would mean redundant data and, therefore, they could be dispensed with without losing a large amount of information.
In the study, the utilization of a correlation matrix, constructed with the Pandas library and visualized using the Seaborn library, is crucial in elucidating the relationships between variables through correlation coefficients.A coefficient of 1 is always displayed on the matrix's diagonal, which is expected since any variable will invariably have a perfect correlation with itself.Coefficients that are close to 0 are signified to denote little to no linear relationship between the paired variables.In contrast, coefficients nearing 1 are indicated to suggest a strong positive linear relationship, implying that as one variable increases, the other is tended to increase as well.Conversely, coefficients approaching -1 are signified to denote a strong negative linear relationship, meaning that as one variable increases, the other is likely to decrease.When a direct or inverse correlation is observed between two variables, one of them should be discarded to avoid redundancy and ensure the clarity of the analysis.In the matrix, the absence of extreme values is indicated, a sign that the variables are not correlated and therefore, they will be able to be used in the machine learning algorithm.This selection of features allowed reducing the training time, simplifying the model, and reducing the possibility of overfitting.
The model under consideration is based on principles of supervised learning, utilizing a dataset that has been manually labeled for training.Initially, 1,000 labeled tweets were available, but this number has been expanded to 3,000.Of this total, 60% are allocated to the training set, and the remaining 40% to the testing set.Any additional tweets serve as a validation set, providing fresh data that the model has not encountered, allowing for an assessment of classifier performance to be made.
To ensure representativeness and avoid biases, a proportionate distribution of tweets across various categories, such as emoticons (102 tweets), corporate reputation (1,598 tweets), football (667 tweets), and others, should be had by both the training and testing sets.A Python script is developed to enumerate tweets from each category and distribute them: 60% to the training set and 40% to the testing set.The study evaluated the performance of various machine learning algorithms in analyzing sentiments expressed in tweets related to BBVA.The algorithms under scrutiny included Support Vector Machines (SVM), XGBoost, Random Forest, and Neural Networks, with a particular focus on their precision in sentiment classification.Among these, the XGBoost algorithm emerged as the standout, demonstrating a remarkable precision rate of 91.2%.This high precision is not only indicative of the effectiveness of machine learning techniques in parsing the complexities of large-scale social media data but also underscores the nuanced capability of these algorithms to accurately classify sentiments across a spectrum of contexts, including negative and neutral tweets, with precision rates of 94-95% and 87-88%, respectively.Given the narrow precision margins, particular attention was directed toward the algorithms' ability to predict negative sentiments accurately, necessitating immediate intervention for customer service improvement.
Further analysis, detailed in Table 3, delves into the misclassification rates and accuracies for positive, neutral, and negative tweets.
Table 3 highlights a significant disparity in the classification of positive tweets, underscoring the critical importance of selecting an algorithm that maximizes precision across all sentiment categories.XGBoost's superior performance, particularly in accurately predicting negative and positive sentiments, positions it as the optimal algorithm for this study's objectives.
To quantify the enhancements offered by XGBoost over its competitors, Table 4 provides a comparative overview of the performance enhancements of the XGBoost algorithm relative to the other algorithms.
The findings unequivocally support the hypotheses posited at its outset.Specifically, the hypothesis that the XGBoost algorithm would demonstrate superior accuracy in sentiment classification on BBVA's social media platforms, compared to other machine learning algorithms, is confirmed with a precision rate of 91.2%.This not only validates the hypothesis but also emphasizes the critical role of advanced machine learning techniques, like XGBoost, in enhancing sentiment analysis, particularly in the banking sector where real-time customer feedback is invaluable.
Moreover, the results affirm the assumption that employing advanced machine learning techniques for sentiment analysis can significantly enhance strategic decision-making and customer engagement in the banking industry.The high accuracy in negative sentiment classification, achieved by the XGBoost algorithm, illustrates its potential to enable immediate and informed responses to customer feedback, thereby fostering a more dynamic and responsive customer engagement framework fo r BBVA .The superior performance of the XGBoost algorithm can be attributed to its ability to handle large-scale data efficiently, its robustness against overfitting, and its adaptability to various types of data, which are critical attributes for analyzing the vast and diverse datasets typical of social media Aarshay (2016).Neural networks, with their deep learning capabilities, also show promising results, particularly in deciphering the complexities of human language expressed through tweets.This effectiveness is reflective of the continuous advancements in machine learning technologies and their increasing applicability in real-world scenarios, like sentiment analysis on social media platforms.
The continuous evolution of social networks and their significant impact on a company's brand im-age underscore the importance of this study's focus on Twitter.However, as the social media landscape expands to include platforms like Instagram and TikTok, the dynamics of sentiment expression and interpretation may vary.The textual content on Twitter, which is more concise due to character limits, offers a different context for sentiment analysis compared to the visual and video content predominant on Instagram and TikTok.This variation suggests that the algorithms' performance might differ when applied to these other platforms, pointing to the necessity for future research to adapt and test machine learning techniques across diverse social media environments.
Looking ahead, the exploration of sentiment analysis on emerging social media platforms represents a promising avenue for research.Additionally, the integration of multimodal data analysis, which includes text, images, and videos, could offer a more holistic understanding of social media sentiments.Another potential direction involves the development of algorithms that can automatically adjust to the nuances of different languages and dialects, enhancing the accuracy of sentiment analysis in a global context.This approach is particularly relevant for global brands like BBVA, which engage with a diverse customer base across various cultural and linguistic backgrounds.
This study contributes to the growing body of knowledge on the application of machine learning algorithms in sentiment analysis, demonstrating the potential of advanced techniques like XGBoost and neural networks.The findings underscore the necessity for ongoing research to keep pace with the rapid evolution of social media and the complex dynamics of user interactions, ensuring that sentiment analysis remains a powerful tool for understanding and engaging with the digital audience.

CONCLUSION
The aim of this paper was to rigorously evaluate and compare the effectiveness of specific machine learning algorithms for sentiment analysis on Banco Bilbao Vizcaya Argentaria (BBVA)'s social media platforms, with a particular focus on assessing the precision and practical utility of these methods.This investigation has successfully demonstrated the superior effectiveness and precision of machine learning algorithms in conducting sentiment analysis, highlighting their invaluable role in strategic decisionmaking processes related to customer relationship management and market analysis.This study's results offer compelling evidence of the efficacy of machine learning algorithms, especially XGBoost, in accurately analyzing and classifying sentiments expressed in social media content.This insight is instrumental for BBVA and similar financial institutions aiming to leverage social media analytics for improved customer interaction and market responsiveness, marking a significant stride toward the integration of advanced data analytics in strategic business operations.
Among the algorithms tested, the XGBoost algorithm stood out for its exceptional precision, achieving a 91.2% success rate in sentiment classification.This finding not only highlights the capability of machine learning technologies to provide detailed and accurate sentiment analyses on social media platforms but also emphasizes their strategic importance for companies like BBVA.By leveraging these analyses, BBVA can gain significant advantages in understanding and engaging with its digital audience, thereby enhancing business strategies, and fostering improved customer relationships.
In conclusion, this study affirms the pivotal role of advanced machine learning techniques in the realm of social media sentiment analysis, offering profound insights for financial institutions in their pursuit of strategic engagement and market responsiveness.As such, machine learning stands as a cornerstone technology that enables businesses to navigate the complexities of digital communication landscapes effectively, ensuring that they remain attuned to the evolving dynamics of customer sentiment and market trends.
Brandes et al. (2013), García-Galera et al. (2014), and Nieto (2013) highlight this dynamic.They discussed the significant social impact and the growing importance of social networks in the coordination of economic society and as relationship channels within organizations.The evolution of messaging platforms, as pointed out by Nouwens et al. (2017) and Gill et al. (2019), underscores the transition of users from public spaces to private ones, demanding personalization based on detailed information processed through Big Data environments (Sudha & Sheena, 2020).Hootsuite Inc. (2022) predicts future trends in social networks focused on rebuilding user trust, adapting content to specific business objectives, and decoding the purchase funnel.Xu et al. (2020) emphasize the need to humanize brands to establish deeper connections with users, while Isaak and Hanna (2018) and Dickerson et al. (2014) discuss security and transparency challenges on platforms like Facebook and Twitter.Big Data in social networks, highlighted by Bonnet and Westerman (2021), Kumar et al. (2020), and Hariri et al. (

4 .
Capital letter usage: The use of capital letters can indicate strong emotions such as anger, rage, or extreme joy.The model calculates the percentage of capital letters in a tweet to consider its potential sentiment.

Table 1 .
Distribution of tweets based on the manual classification and automatic classification (CNL) of the feeling of the tweets

Table 2 .
Comparison of precision between algorithms

Table 3 .
Comparison of results between algorithms

Table 4 .
Comparison improvement of results of the XGBoost algorithm